WWW.JNEUROSCI.ORG
-
The Journal of Neuroscience Discover www.zeiss.de/sensitivity
 QUICK SEARCH:   [advanced]


     
-


HOME
  |  
SEARCH  |   ARCHIVE  |   SUBSCRIBE  |   CONTACT  |   HELP

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an eLetter
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (13)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wiener, M. C.
Right arrow Articles by Richmond, B. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wiener, M. C.
Right arrow Articles by Richmond, B. J.

 Previous Article  |  Next Article 

The Journal of Neuroscience, March 15, 2003, 23(6):2394

Decoding Spike Trains Instant by Instant Using Order Statistics and the Mixture-of-Poissons Model

Matthew C. Wiener and Barry J. Richmond

Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892-4415


    ABSTRACT

TOP
ABSTRACT
Introduction
Materials and Methods
Results
Discussion
REFERENCES

In the brain, spike trains are generated in time and presumably also interpreted as they unfold in time. Recent work (Oram et al., 1999; Baker and Lemon, 2000) suggests that in several areas of the monkey brain, individual spike times carry information because they reflect an underlying rate variation. Constructing a model based on this stochastic structure allows us to apply order statistics to decode spike trains instant by instant as spikes arrive or do not. Order statistics are time-consuming to compute in the general case. We demonstrate that data from neurons in primary visual cortex are well fit by a mixture of Poisson processes; in this special case, our computations are substantially faster. In these data, spike timing contributed information beyond that available from the spike count throughout the trial. At the end of the trial, a decoder based on the mixture-of-Poissons model correctly decoded about three times as many trials as expected by chance, compared with approximately twice as many as expected by chance using the spike count only. If our model perfectly described the spike trains, and enough data were available to estimate model parameters, then our Bayesian decoder would be optimal. For four-fifths of the sets of stimulus-elicited responses, the observed spike trains were consistent with the mixture-of-Poissons model. Most of the error in estimating stimulus probabilities is attributable to not having enough data to specify the parameters of the model rather than to misspecification of the model itself.

Key words: visual cortex; information; coding; modeling; statistics; timing


    Introduction

TOP
ABSTRACT
Introduction
Materials and Methods
Results
Discussion
REFERENCES

Because responses are presumably interpreted as they unfold in time, our goal is to decode spike trains instant by instant depending on whether a spike arrived. If spike trains are thought of as words in a language, then decoding---figuring out what stimulus elicited a particular observed spike train---can be thought of as looking up words in a neural dictionary. Ideally this dictionary should allow us, at any point in time, to translate a spike train into the best possible guess of which stimulus elicited it.

One approach to constructing a neural dictionary is to make as few assumptions as possible. Another approach, which we take here, is to model spike trains using (and checking) certain assumptions about their structure. Recent work shows that in spike trains from three areas of the monkey brain (the lateral geniculate nucleus, primary visual cortex (V1), and primary motor cortex) spike times appear to have been thrown down at random, with probabilities determined by the firing rate profile over time, the peristimulus time histogram (PSTH) (Oram et al., 1999, 2001; Baker and Lemon, 2000). Oram et al. (1999) and Baker and Lemon (2000) simulated stochastic (but non-Poisson) spike trains with relations among spikes that were indistinguishable from those observed in experimental data. Using these models, it is in principle possible to determine how likely each stimulus is to have elicited every possible spike train. However, compiling a dictionary in this way would require simulating very large data sets, which, although less difficult than gathering enough experimental data, would still be prohibitively time-consuming.

Here we show that for spike trains with the stochastic structure described by Oram et al. (1999), we can use order statistics (Arnold et al., 1992), instead of simulation, to calculate directly how likely each stimulus is to have elicited each spike train. Order statistics give the probabilities of individual spike times in trains with this stochastic structure based on the spike count and the firing rate profile---exactly the parameters used by Oram et al. (1999). Using order statistics, we can update the stimulus probabilities at each instant depending on whether a spike arrives. Once we have these probabilities, we can guess, for example, that the stimulus with the highest probability elicited the spike train. This is something like looking up a word letter by letter and discarding words that do not begin with the letters seen so far.

Although much faster than simulation, decoding using order statistics is, in the general case, still more time-consuming than we would like. However, calculating order statistics is much simpler and faster in the special case in which the spike trains are considered as arising from a mixture of a small number of Poisson processes. We show that our data are in fact well fit by such a model. The recognition of this structure in the data substantially simplifies and speeds decoding of single-neuron spike trains instant by instant.


    Materials and Methods

TOP
ABSTRACT
Introduction
Materials and Methods
Results
Discussion
REFERENCES

Decoding. Decoding, that is, guessing which stimulus elicited a particular spike train, has two steps. We first estimate the probability that each stimulus elicited the spike train and then choose a stimulus based on those estimated probabilities. Here we minimize expected error by choosing the most probable stimulus. If some mistakes are more important than others, a decision rule minimizing an appropriate cost function could be used.

With the decision rule fixed, decoding a spike train as it unfolds in time requires estimating the probability ps(tj|Rj): the probability, estimated at time tj, that stimulus s elicited the response Rj = (r1, . . . rj), where each ri is 1 if a spike occurred at time i and 0 if no spike occurred in time bin i. The presence or absence of a spike is measured in 1 msec bins; time ti is the time in the ith bin. Any bin width can be used as long as there can be at most one spike in any bin. All of our calculations could be recast in continuous time in a straightforward manner.

Using Bayes' rule, estimating ps(tj|Rj) can be transformed into estimating the probability that a spike occurs in the next bin (which may depend on the history of the spike train):
p<SUB>s</SUB>(t<SUB>j</SUB>‖R<SUB>j</SUB>)=<FR><NU>p<SUB>s</SUB>(t<SUB>j − 1</SUB>‖R<SUB>j−1</SUB>)P(r<SUB>j</SUB>=1‖s, R<SUB>j − 1</SUB>)</NU><DE>Z</DE></FR>, (1)
if a spike is fired in bin j, or
p<SUB>s</SUB>(t<SUB>j</SUB>‖R<SUB>j</SUB>)=<FR><NU>p<SUB>s</SUB>(t<SUB>j − 1</SUB>‖R<SUB>j − 1</SUB>)(1−P(r<SUB>j</SUB>=1‖s, R<SUB>j − 1</SUB>))</NU><DE>Z</DE></FR> (2)
if no spike is fired in bin j. P(rj = 1|s, R- 1) is the probability that the a spike appears in the time bin j (rj = 1) if stimulus s has been presented and the response through time t- 1 is R- 1. We use P for the probability of response given stimulus and p for probability of stimulus given response to create a typographical distinction. Z, here and throughout, is a normalizing term; that is, it is the sum, across the appropriate variable, usually the stimulus s, of terms in the numerator of whatever fraction it appears in.

Using Equations 1 and 2, we proceed bin by bin, updating the stimulus probabilities depending on whether a spike occurs in each bin. For efficiency, responses can be decoded in small jumps rather than instant by instant. The probability P(rj . . . rj + k|s) for any response can be calculated using the distribution of first spike times, P(tau 1|s):
P(r<SUB>j + 1</SUB>…r<SUB>j + k−1</SUB>=0, r<SUB>j + k</SUB>=1‖s)=P (&tgr;<SUB>1</SUB>=t<SUB>j + k</SUB>‖s), (3)

P(r<SUB>j + 1</SUB>…r<SUB>j + k</SUB>=0‖s)=1−<LIM><OP>∑</OP><LL>k′ = 1</LL><UL>k</UL></LIM>P (&tgr;<SUB>1</SUB>=t<SUB>j + k′</SUB>‖s), (4)
where the response through time bin j is assumed known. Below we will show how to calculate the distribution of first spike times (Arnold et al., 1992).

We avoid overfitting using three-way cross-validation: the trials were divided into three subsets, with two-thirds used to fit the model and the remaining third used to test decoding. Each third of the data was used twice to fit and once to test; we averaged over the three splits.

Order statistics for spike trains. By considering each spike as the "next first spike," the distribution of first spike times P(tau 1 = tj|s) was calculated for each stimulus using Equations 3 and 4.

For a spike train with n spikes, the (n, k) order statistic describes the distribution of the kth of n draws, after those draws are sorted; in our case, this will be P(tau k|s, n), the time of the kth of n spikes in a train elicited by stimulus s:
P(&tgr;<SUB>k</SUB>=t<SUB>j</SUB>‖s, n)=<FR><NU>n!</NU><DE>(k−1)!(n−k)!</DE></FR> F<SUP>k − 1</SUP><SUB>s</SUB>  (t<SUB>j</SUB>)f<SUB>s</SUB> (t<SUB>j</SUB>) [1−F<SUB>s</SUB> (t<SUB>j</SUB>)]<SUP>n − k</SUP>. (5)
When we focus on the first of n spikes, i.e., k = 1, Equation 5 simplifies to describe the (n, 1) or first order statistic P(tau 1|s, n):
P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s, n)=n f<SUB>s</SUB> (t<SUB>j</SUB>) [1−F<SUB>s</SUB> (t<SUB>j</SUB>)]<SUP>n − 1</SUP>. (6)
In Equations 5 and 6, fs(tj) is the normalized spike density function or firing rate profile over time; Fs(tj) is the corresponding cumulative firing rate profile; and the factor [1 - Fs(tj)]- k is the probability of n - k spikes arriving after time tj. As noted above, we discretize time into 1 msec bins (the same precision with which spike times were recorded).

Figure 1 illustrates estimating the quantities in Equation 6 from data for spike trains elicited by two stimuli (coded black, gray throughout). One stimulus elicits spikes beginning ~25 msec earlier than the other. For each stimulus s, the firing rate profile is fs(t) (Fig. 1B). Fs(t) (Fig. 1C) is the corresponding cumulative probability. From fs(t) and Fs(t), we can calculate P(tau 1|s, n) for any spike count n (Eq. 6; Fig. 1D). As n increases, the weight of the first order statistic shifts left; i.e., the first spike is likely to occur earlier.



View larger version (47K):
[in this window]
[in a new window]
 
Figure 1.   Construction of count-weighted order statistics. Labels correspond to equations in Materials and Methods. x-Axis (except E), Time from stimulus onset (milliseconds), truncated 150 msec after stimulus onset for visibility (although the measured responses extend to 300 msec after stimulus onset); vertical lines show 75, 90, and 95 msec after stimulus onset. x-Axis (E), Spike count. y-Axes: A, trials; B-F, probability. A, Rasters of responses to two stimuli (black, gray throughout). Each row of dots represents a trial; each dot shows a spike time. B, Firing rate profiles, fs(t), estimated using local regression (Loader, 1999) with a data window including 10% of the data. C, Cumulative firing rate profiles, Fs(t) (they do not reach 1 because the x-axis is truncated). D, Order statistics, P(tau 1|s, n), for n = 2, 5, and 10 (Eq. 1). E, Count histograms, &ptilde;s(n), with mixture-of-Poissons approximations used to mitigate the effects of using a small data set. F, Count-weighted order statistics, P(tau 1|s) (Eq. 7).

To avoid requiring the decoder to know in advance how many spikes n will arrive in a particular trial, we average the first order statistics P(tau 1|s, n), weighted with the expected distribution of spike counts &ptilde;s(n) for stimulus s (Fig. 1E; we use &ptilde;s(n) to distinguish from ps(t), estimated probability of stimulus s at time t). This count-weighted first order statistic, P(tau 1|s), gives the expected probability of the first spike occurring at time t for spike density fs(t) and the expected distribution of spike counts for stimulus s:
P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s)=<LIM><OP>∑</OP><LL>n = 1</LL><UL>N<SUB>max</SUB></UL></LIM><A><AC>p</AC><AC>˜</AC></A><SUB>s</SUB> (n)P (&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s, n). (7)
In other words, the count-weighted first order statistic gives the distribution of the next interspike interval. Note that P(tau 1|s) sums to 1 - &ptilde;s(0); the term corresponding to no more spikes (n = 0) is left out.

The second order statistic conditioned on the first spike time is calculated as the first order statistic of a restricted response &ftilde;s, created by left-truncating fs(t) at the time of the previous spike and renormalized to sum to 1. The (n, k + 1) order statistic can be calculated as the (n - k, 1) order statistic of the remaining response (Arnold et al., 1992).

Order statistics show a Markov property such that future spike times depend on past spike times only through the number of spikes already observed and the most recent spike time. This dependence on the number of spikes already observed means that the process is not a renewal process. Figure 2 shows first order statistics calculated at the beginning of a trial and after the first and fifth spikes in the response.



View larger version (30K):
[in this window]
[in a new window]
 
Figure 2.   Probabilities of first spike times calculated at different times during a trial. The four rastergrams in the first row show responses of four stimuli (taken, for this example, from one of the neurons in the experiment using 16 stimuli). Each row represents a trial; each dot in a row represents the time of a spike in that trial. The colors of the rasters correspond to the colors used in the graphs below. The bottom three panels show the first spike time probabilities calculated at stimulus onset, 102 msec after stimulus onset, and 164 msec after stimulus onset; under each plot, vertical lines show the times of spikes in the train being decoded (which comes from the stimulus whose responses are shown in black). The distribution of next spike times is initially similar for the red, green, and blue stimuli, but later in the trial, when the blue stimulus fires much less than the red or green stimulus, the probabilities diverge.

Figure 3 shows the decoding of several spike trains for the two-stimulus example of Figure 1.



View larger version (27K):
[in this window]
[in a new window]
 
Figure 3.   Decoding responses for the example of Figure 1. Labels correspond to equations in Materials and Methods. Black and gray are as in Figure 1. x-Axis, Time (milliseconds); vertical lines, 75, 90, and 95 msec after stimulus onset. y-Axes, Probability. A, Stimulus probabilities, ps(t), from decoding a spike train with a single spike at 75 msec, when only one stimulus ever elicits spikes. B, Decoding a spike train with a single spike at 90 msec. The probability of a black stimulus rises between 70 and 90 msec after onset, because early spikes are more likely from the gray stimulus, and none appear. C, D, Interpretation of a spike 95 msec after stimulus onset depends on whether there was an earlier spike. The decoding algorithm does not look into the future, so B-D are identical until 90 msec after stimulus onset, and B and D are identical until 95 msec after stimulus onset.

Simplifying order statistics using a mixture of Poisson distributions. Equation 7 can be used to calculate the count-weighted first order statistic for any distribution of spike counts. If the spike count distribution is a mixture of a finite number of Poisson distributions, our calculations become much simpler.

For a Poisson process with time-varying mean rate lambda s fs(t), the probability that the first spike occurs at time j simplifies to:
P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s, &lgr;<SUB>s</SUB>)=&lgr;<SUB>s</SUB> f<SUB>s</SUB>(t<SUB>j</SUB>)e<SUP>−&Sgr;<SUP>j − 1</SUP><SUB>k = 0</SUB> &lgr;<SUB>s</SUB> f<SUB>s</SUB>(t<SUB>k</SUB>)</SUP>, (8)
where fs(j) is the spike density function (normalized PSTH) for stimulus s. lambda s is a rate multiplier. Because we use discrete time bins, the first term in Equation 8, which represents the probability of a spike at time t, can be >1 for some combinations of fs and lambda s. To avoid this problem, we use the approximation:
P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s, &lgr;<SUB>s</SUB>)=[1−e<SUP>−&lgr;<SUB>s</SUB>f<SUB>s</SUB>(t<SUB>j</SUB>)</SUP>]e<SUP>−&Sgr;<SUP>j − 1</SUP><SUB>k = 0</SUB> &lgr;<SUB>s</SUB>f<SUB>s</SUB>(t<SUB>k</SUB>).</SUP>, (9)
because e-lambda sfs(tj) is the probability of no spike occurring in time bin j and is guaranteed to be between 0 and 1.

Equation 8 calculates the sum of count-specific first order statistics P(tau 1|s, n) for each n, weighted with the appropriate Poisson probabilities. This simplifies Equation 7 in the case of a simple Poisson distribution of spike counts. However, the distributions of stimulus-elicited spike counts from several brain areas are not adequately modeled by a Poisson distribution (Baddeley et al., 1997; Gershon et al., 1998; Wiener et al., 2001).

A mixture of a finite number of Poisson distributions is more flexible than a single Poisson distribution for modeling stimulus-elicited spike count distributions and almost as simple. To model the distribution of spike counts elicited by stimulus s as a mixture of Poisson distributions with mean rates lambda s,i with weights summing to 1, we add the first order statistics for each mean rate lambda s,i weighted by the probability p(lambda s,i) of observing that rate:
P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s)=<LIM><OP>∑</OP><LL>i</LL></LIM>p (&lgr;<SUB>s, i</SUB>) P(&tgr;<SUB>1</SUB>=t<SUB>j</SUB>‖s, &lgr;<SUB>s, i</SUB>). (10)
In each mixture, the constituent Poisson processes share a single firing rate profile fs(t) and differ only in the rate by which the profile is multiplied. Thus the model is separable: estimation of the firing rate profile does not interact with estimation of the distribution of mean counts. This model provides a good description of our data (see Results).

The calculations in Equations 8 and 10 are conceptually identical to those in Equations 6 and 7. However, because the sum in Equation 6 can be taken analytically in the case of a Poisson distribution, the calculations using a mixture of Poisson distributions are approximately an order of magnitude faster for our data sets. The speedup comes because we calculate order statistics for only a few Poisson means rather than for many spike counts. The two methods differ slightly in how they deal with the discretization of time into bins (the approximation in Eq. 9 is unnecessary when using Eqs. 6, 7). Below we will show that the effect of this difference on the estimated stimulus probabilities is negligible.

Model parameters on subintervals. The decoding procedures outlined above iteratively treat each new spike as the first spike in a shorter response. This requires estimating model parameters for the shorter response from the parameters for the full model. For the multiple Poisson formulation, this is simple and quick (whereas in the general case, it is computationally intensive). The rate function of a Poisson process on a subinterval is the original rate function restricted to that interval. The weights p(lambda s,i|s) are continuously updated using Bayes' rule, because each weight is the probability with which we believe the spike train being decoded comes from the particular process. The Poisson processes differ only by a multiplier of the rate, so our evidence of which process the train comes from is the number of spikes we have seen:
p(&lgr;<SUB>s, i</SUB>) (j)=<FR><NU>P (n(j)‖s, &lgr;<SUB>s, i</SUB>)p (&lgr;<SUB>s, i</SUB>) (0)</NU><DE><LIM><OP>∑</OP><LL>i</LL></LIM>P (n(j)‖s,&lgr;<SUB>s, i</SUB>)p (&lgr;<SUB>s, i</SUB>) (0)</DE></FR>, (11)
where p(lambda s,i)(0) denotes the weights at the beginning of the trial.

Generating surrogate data. To compare the performance of our decoder on recorded data with its performance on data with the structure for which the decoder was designed, we simulate spike trains using a procedure similar to that used for decoding. We calculate the distribution P(tau |s, 1) of next spike times (above) and randomly select a spike time. The simulation also proceeds iteratively: after each spike time is chosen, the distribution for the next is calculated. The probabilities in Equations 7 and 10 do not sum to 1, because they omit the probability of no additional spikes. For implementation, the probability that no more spikes occur in the interval is assigned to an additional bin. When this bin is selected, we consider the spike train under construction to be complete. By construction, such trains have the structure specified by the model.

Surrogate spike trains matched to observed data are generated using the observed spike density and spike count distribution for each stimulus. The same number of trials observed is generated for each stimulus.

Incorporating a refractory period. Real spike train data often depart from the simple mixture-of-Poissons model by becoming either less or more likely to fire for a short time after a spike (exhibiting a refractory or rebound period, respectively). To adjust, we depart slightly from classical order statistics and replace the spike density function f(t) with f(t)r(t - tau ), where tau  is the time of the previous spike.

We estimate r by comparing the observed interspike interval distribution with the interspike interval distribution of a set of surrogate trains. Because interval distribution is related to spike count (more spikes mean shorter intervals), we actually generate more surrogate trains than needed and subselect to match the observed spike count distribution. If the interspike interval in the generated data is significantly different from the observed interspike interval (chi 2 test, p < 0.05), then r(1) is set to gobs(1)/ggen(1), where gobs and ggen are the interspike interval distributions in the observed and generated data. A new set of trials is simulated with this new refractory period (in which the frequency of interspike intervals of length 1 now matches the frequency in the observed data), and the process is repeated for subsequent intervals until the interspike interval distribution of the generated data is indistinguishable from the interspike interval distribution of the experimental data.

Estimating the spike density function from data. The spike density function measures rate variation over time. To estimate the spike density function for a particular stimulus, we create a histogram at 1 msec precision (the experimental sampling rate) across trials for that stimulus and smooth using local regression (Loader, 1999) with a window using 10% of the data. The spike density functions obtained using this window and the optimal smoothing window determined by generalized cross-validation (Craven and Wahba, 1979; Loader, 1999) are similar: median correlation, 0.98; interquartile range (iqr), 0.95-0.99. The smoothed histogram is normalized so that its sum across bins is 1.

Fitting a mixture of Poisson distributions to observed spike counts. The probability of drawing a particular count n from a mixture of Poisson distributions is the weighted sum of the probabilities of drawing the count n from each of the individual Poisson distributions in the mixture:
<A><AC>P</AC><AC>˜</AC></A><SUB>s</SUB> (n)=<LIM><OP>∑</OP><LL>i = 1</LL><UL>k</UL></LIM>p (&lgr;<SUB>s, i</SUB>)<FR><NU>e<SUP>−&lgr;<SUB>s, i</SUB></SUP>&lgr;<SUP>n</SUP><SUB>s, i</SUB></NU><DE>n!</DE></FR>, (12)
where k is the number of distributions in the mixture; lambda s,i >=  0 is the mean of the ith distribution in the mixture for stimulus s; and the weights p(lambda s,i) are positive and sum to 1.

For each stimulus s, the parameters lambda s,i and p(lambda s,i) are estimated by maximizing the log likelihood of the observed spike counts given the corresponding model. We seek the most parsimonious description of each distribution. Thus, for each stimulus, we first find the best single Poisson distribution (k = 1). If the observed distribution could reasonably have come from the fit Poisson (chi 2 test, p > 0.05), we use this model; otherwise, we fit a mixture of two Poisson distributions and again check for consistency. In general, if the data are inconsistent with a mixture of k Poisson distributions, we fit a mixture with k + 1 Poisson distributions. In this work, we did not require mixtures of more than five Poisson distributions (see Results).

Information calculations. Transmitted information (Shannon and Weaver, 1949; Cover and Thomas, 1991) is defined as I(R, S) Sigma Sr,s p(r, s)log(p(s|r)/p(s)), where the stimulus probabilities p(s) are taken from the experiment, and p(s|r) is calculated using Equation 1 or 2. Using response models avoids some estimation problems associated with small data sets (Panzeri and Treves, 1996; Golomb et al., 1997) and yields estimates of information comparable with those using other validated methods (Gershon et al., 1998; Wiener and Richmond, 1998).

Data sets. We decoded responses recorded from monkey primary visual cortex in two previously reported experiments (Kjaer et al., 1997; Wiener et al., 2001). In each experiment, responses were recorded using standard single-electrode techniques from complex cells in primary visual cortex of awake rhesus monkeys. At the beginning of each trial, a fixation point appeared on a screen. One hundred milliseconds after the monkey fixated, a stimulus was flashed on the receptive field of the neuron for 300 msec and then replaced with the background. The monkey was not required to react to the stimulus. If the monkey fixated within ~2° of the fixation point during the entire period from the appearance of the fixation point until the stimulus disappeared, it was rewarded with a drop of liquid when the stimulus disappeared. If the monkey shifted its gaze further than 2° from the fixation point, the trial was aborted, and the monkey received no reward. There was a delay of 300 msec between trials, during which the monkey was not required to fixate.

In one set of experiments (Wiener et al., 2001), 128 stimuli were shown: 32 oriented bars (Fig. 4A), 32 sine-wave gratings (Fig. 4B), 32 Walsh patterns (Fig. 4C), and 32 photographic images (Fig. 4D). In another experiment (Kjaer et al., 1997), 16 Walsh patterns were used (Fig. 4E).



View larger version (75K):
[in this window]
[in a new window]
 
Figure 4.   Stimuli used in the experiments. In one set of experiments (Wiener et al., 2001), the stimulus set consisted of 32 oriented bars (A), 32 sine-wave gratings (B), 32 Walsh patterns (C), and 32 photographic images (D). In the other set of experiments, 16 Walsh patterns were used (the 8 shown in E and their contrast-reversed counterparts).

Computation. The calculations presented in this paper were performed in the R statistical computing environment (Ihaka and Gentleman, 1996).


    Results

TOP
ABSTRACT
Introduction
Materials and Methods
Results
Discussion
REFERENCES

Our original analyses were performed on data from the 29 neurons from the experiment with 16 stimuli (Kjaer et al., 1997). In a single experiment (recording from a single neuron), each stimulus was presented approximately the same number of times; the number of presentations per stimulus varied from neuron to neuron, ranging from 19 to 230 (median, 42). The mixture-of-Poissons model made decoding sufficiently fast that we were able to decode data from the 17 neurons from the experiment with 128 stimuli (Wiener et al., 2001). In these experiments, the median number of presentations per stimulus ranged from 8 to 52 (median 14) in different neurons; each stimulus was presented approximately the same number of times. Below we will show that decoding results using the two methods on the data from the experiment with 16 stimuli were very similar. Except where stated otherwise, the results in this paper were obtained using the mixture-of-Poissons model.

Mixtures of Poisson distributions for spike count

We examined the mixture-of-Poissons model for 2636 spike count distributions in two different sets of V1 data (Kjaer et al., 1997; Wiener et al., 2001); 50.8% of the spike count distributions were adequately fit with a single Poisson distribution, 39.4% with a mixture of two Poisson distributions, 7.4% with three, 2.0% with four, and 0.4% with five. No distribution required a mixture of more than five distributions. Examples of mixture models with one, two, and three components are shown in Figure 5.



View larger version (36K):
[in this window]
[in a new window]
 
Figure 5.   Spike count distributions fit by mixtures of Poisson distributions. The top, middle, and bottom panels show spike count distributions fit by a single Poisson distribution, a mixture of two Poisson distributions, and a mixture of three Poisson distributions, respectively. In each panel, the histogram (gray bars) shows the observed distribution of spike counts. The dots connected by lines show the fitted values, and the solid and dashed lines show the component Poisson distributions. A, Single Poisson distribution with mean of 12.4. B, Mixture of two Poisson distributions with means of 0.4 (weight, 0.56) and 3.1 (weight, 0.44). C, Mixture of three Poisson distributions with means of 0.2 (weight, 0.11), 5.8 (weight, 0.31), and 14.4 (weight, 0.58). Note the different scales on the x- and y-axes of the three panels.

The modeled distributions are, by design, smoother than the observed distributions. Various spike counts are more or less likely in the model than in the data. Spike counts that are very frequent in the data may be assigned less probability in the model, and spike counts not actually observed in the data may be assigned nonzero probability (as is the case for a count of four in Fig. 5A,C). Fitting a mixture of Poisson distributions to the observed spike count is, among other things, a form of smoothing (meaning we believe that if we gathered enough data we would encounter a response with four spikes elicited by the stimulus corresponding to Fig. 5C). As expected when smoothing, the means of the modeled distributions are very close to the observed means (median difference across all neurons, 0% of the measured mean; iqr -3 to +4%), but the variances are lower (median difference, -5% of the measured variance; iqr, -24 to + 9%).

In the Poisson decoding, we model the distribution of mean spike counts in any subinterval of the trial (from any particular time to the end of the trial) on the basis of the distribution of mean spike counts in the entire trial [p(lambda s,i) at the start of the trial] and the firing rate modulation over time (i.e., how many of the spikes are expected to have occurred by now; see Eq. 11). If we use the mixture-of-Poissons model to predict, on five successively shrinking subintervals (beginning 50, 100, 150, 200, and 250 msec after stimulus onset), the variance of spike counts elicited by each stimulus and take a linear regression of log(predicted variance) versus log(observed variance), we find a negative intercept (median across 46 neurons, -0.13; iqr, -0.25 to -0.03) and a slope near 1 (median, 0.98; iqr, 0.95-1.02). The r2 values of the regressions are high (median, 0.94; iqr, 0.91-0.96). Below we will examine the effect on decoding accuracy of using the estimated subinterval spike count distributions compared with using spike count distributions directly measured on the subintervals.

Decoding

Figure 6 shows an example of the development of stimulus probabilities over time (i.e., the probability that each stimulus elicited the observed spike train) when decoding using the mixture-of-Poissons method. Figure 6A shows an example from an experiment in which 16 Walsh patterns were shown (Kjaer et al., 1997), and Figure 6B shows an example from an experiment in which 128 stimuli were shown (Wiener et al., 2001). The spike trains decoded are shown underneath each panel. The stimulus probabilities at each time depend only on the spike train observed up to that time; the decoding algorithm does not look into the future. Although the probabilities change more abruptly when spikes arrive, they change even when no spikes are fired. Thus the absence of spikes, as well as their presence, is informative (cf. Sherlock Holmes's dog that did not bark).



View larger version (28K):
[in this window]
[in a new window]
 
Figure 6.   Decoding responses of neurons in monkey V1 in an experiment using 16 stimuli (Kjaer et al., 1997) (top) or 128 stimuli (Wiener et al., 2001) (bottom). x-Axis, Time from stimulus onset. y-Axis, Stimulus probabilities. Stimulus probabilities from decoding single spike trains (black vertical lines below each panel). Each line shows the probability assigned by the decoding algorithm to a single stimulus. A, Decoding a response in an experiment using 16 stimuli (Kjaer et al., 1997). The top line shows the probability of the stimulus that elicited the spike train (Fig. 4E, second Walsh pattern from the left). B, Decoding a response in an experiment using 128 stimuli (Wiener et al., 2001). The top line shows the probability of the stimulus that elicited the spike train (Fig. 4C, first row, second Walsh pattern from the left).

Figure 6, A and B, shows correctly decoded trials in which one stimulus is far more probable than all others. In these cases, the probability of the guessed stimulus is high. Table 1 summarizes the distribution of the maximum estimated probability, that is, the probability of the guessed stimulus, at the end of our decoding window (300 msec after stimulus onset). It also shows the difference between the probabilities of the most probable and second most probable stimulus (the "margin of victory" over the "runner-up"). The winning probabilities are higher in the experiments with 16 stimuli than in the experiments with 128 stimuli, because the total probability (p = 1) is divided among fewer stimuli.


                              
View this table:
[in this window]
[in a new window]
 
Table 1.   Probability of most probable (guessed) stimulus when spike trains are decoded using the mixture-of-Poissons model

Intuitively, it should become easier to distinguish among stimuli as the probability of one stimulus rises and the probabilities of others fall. We evaluate the decoding in two ways: by measuring the amount of information obtained from the spike train and also by seeing how well the decoding lets us guess which stimulus elicited the observed response. As stated in Materials and Methods, in all our calculations we avoided overfitting using three-way cross-validation: the available trials were divided into three subsets, with two-thirds used to fit the model and the remaining third used to test decoding. Each third of the data was used twice to fit and once to test; the results we present are averaged over the three sets.

Information theory (Shannon and Weaver, 1949) shows that the amount of information in a set of spike trains depends on how they are interpreted, that is, on the nature of the neural code. Figure 7, A and C, shows that for both sets of experiments, decoding using only the number of spikes in a response (dark boxes) yields substantially less information than considering both spike count and spike timing using the mixture-of-Poissons model (light boxes).



View larger version (46K):
[in this window]
[in a new window]
 
Figure 7.   Decoding using timing (through the mixture-of-Poissons model) is more effective than decoding using spike count alone. A, C, Distribution across 29 (A) or 17 (C) neurons of information transmitted by the neuronal responses (spike trains) about which stimulus was shown, using expanding windows starting at stimulus onset. Dark boxes, Spike count only; light boxes, spike count and timing together. B, D, Distribution across 29 (B) or 17 (D) neurons of the percent of trials correctly decoded by guessing the stimulus with highest probability of having elicited the observed response. For comparability, figures are presented as multiples of the percent of trials that would be correctly decoded by chance (B, 1/16 = 6.25%; D, 1/128 = 0.78%). The dotted line shows the percent of trials correctly decoded by chance. We avoided overfitting using three-way cross-validation: the available trials were divided into three subsets, with two-thirds used to fit the model and the remaining third used to test decoding. Each third of the data was used twice to fit and once to test; the results shown here are averaged over the three sets. Boxes, Median; line and indentation at center, interquartile range: bottom and top. If notches do not overlap, corresponding medians are different (p < 0.05). Whiskers have been eliminated, and the dark boxes have been made wider, for visibility.

Does this extra information provided by paying attention to timing actually help us decode more accurately? It is natural to guess that the stimulus with the highest probability elicited the observed spike train (as it did in both examples in Fig. 6). Figure 7, B and D, shows the distribution (across 29 and 17 neurons, respectively) of the percent of trials correctly decoded, that is, for which the highest-probability stimulus did elicit the response. For comparability between the two experiments, the figures are shown as multiples of the percent of trials that could be correctly decoded simply by guessing (1/16 = 6.25%, and 1/128 = 0.78%). In both experiments, more trials were correctly decoded using both spike count and timing via the mixture-of-Poissons method than using spike count alone.

When timing is taken into account, the amount of information transmitted by the neuronal responses is larger in the experiment with 128 stimuli than in the experiment with 16 stimuli (Fig. 7A,C). This difference could be because one experiment uses more stimuli than the other, or it could be because the larger experiment used stimuli of several different kinds. To check whether the number of stimuli was the crucial factor, we randomly selected subsets of 64, 32, and 16 stimuli, allowing stimuli of all four kinds to enter each set. Figure 8A shows that this has very little effect on the amount of information transmitted by the responses. Figure 8B compares the amount of information transmitted by responses to the randomly selected groups of 32 stimuli with the amount of information transmitted by the four subsets of 32 stimuli consisting of the bars, gratings, Walsh patterns, and photographic images, respectively (Fig. 4A-D). Responses to the Walsh patterns and photographs used in these experiments transmit less information than do the responses to bars and gratings.



View larger version (26K):
[in this window]
[in a new window]
 
Figure 8.   Differences in stimulus set, rather than in the number of stimuli, account for differences in the results of the two sets of experiments. x-Axis, Time from stimulus onset (milliseconds). y-Axis, Transmitted information (bits). A, Information transmitted by responses of a single neuron to different numbers of stimuli: 128 (solid circles), 64 (plus symbols), 32 (open triangles), and 16 (open circles). For all except the full set of 128 stimuli, the lines represent the median value of transmitted information over 100 random samples (from the full set of 128) of the appropriate number of stimuli. Transmitted information drops only slightly with the number of stimuli. B, Information transmitted by responses of the neuron to random subsamples of 32 stimuli (filled circles; same as triangles in A) and to subsets of 32 stimuli consisting of bars (Fig. 4A, open circles), gratings (Fig. 4B, open triangles), Walsh patterns (Fig. 4C; plus symbols), and photographic stimuli (Fig. 4D; times symbols). Much less information is transmitted by responses to Walsh patterns or photographic stimuli than by responses to bars, gratings, and random subsamples.

Wiener et al. (2001), in an analysis using spike count alone, have previously shown that the difference in information transmission in neuronal responses to stimuli of the four different kinds is primarily attributable to the fact that responses to Walsh patterns and photographs have a smaller dynamic range than responses to bars and gratings. This new result shows that the difference persists in an analysis that takes into account spike timing as well. The persistent difference in results obtained using different kinds of stimuli reinforces a point made by Wiener et al. (2001): conclusions based on experiments using a narrow set of stimuli may not generalize well to experiments using a larger set. The need for a large number of stimuli unfortunately conflicts with the need for many trials per stimulus in an experiment with a limited number of trials.

Relation between decoding success and transmitted information

Intuitively, we expect transmitted information and decoding success to be related: if more information is transmitted by the responses, more responses should be correctly decoded. Above (Fig. 7) we showed that in both sets of experiments (with 16 and 128 stimuli), transmitted information and percent of trials correctly decoded both increase as the trials progress. Figure 9 shows that, for individual neurons from the two sets of experiments, more information transmitted generally corresponds to a greater portion of trials correctly decoded.



View larger version (26K):
[in this window]
[in a new window]
 
Figure 9.   Relation across neurons between transmitted information and percent of trials correctly decoded for individual neurons. x-Axis, Transmitted information (bits). y-Axis, Percent of trials correctly decoded (as a multiple of the number of trials that would be correctly decoded by chance). Each symbol shows transmitted information and the percent of trials correctly decoded 300 msec after stimulus presentation for a single neuron. The percent of trials correctly decoded increases with increasing information transmitted, except when decoding using the mixture-of-Poissons method in the experiment with 128 stimuli. Circles, Decoding using the spike count in the experiment with 16 stimuli; triangles, decoding using the spike count only in the experiment with 128 stimuli; plus symbols, decoding using the mixture-of-Poissons method in the experiment with 16 stimuli; times symbols, decoding using the mixture-of-Poissons method in the experiment with 128 stimuli.

Effect of refractory period

Because we estimate the spike count distribution from data, we incorporate the effect of refractory period on spike count. We found that the effect of any additional refractory or rebound period on the number of trials correctly decoded is very small: 0.2% fewer trials are correctly decoded when the refractory period is included (median across neurons; iqr, -2.7 to 2.2%). The effect on individual stimulus probabilities was also small: in the experiments using 16 stimuli, the median correlation between the estimated stimulus probabilities obtained when taking the additional refractory effects into account and those obtained when not taking the additional refractory effects into account was 0.997 (median across all trials in 29 neurons), and the fifth percentile of correlations was 0.96. The median correlation in the experiments using 128 stimuli was 0.999 (median across all trials in 17 neurons), and the fifth percentile was 0.99. Except where otherwise noted, the results throughout this study were calculated using the refractory period.

Decoding signal versus decoding noise

When there are no differences in timing among responses elicited by different stimuli, we would expect the decoder to reduce to a spike count decoder. To check this, we simulated two sets of responses with identical spike density functions (normalized PSTH) but different distributions of spike count (Poisson distributions with means of 4 and 10). The only nonrandom differences in timing arise from the differences in spike count. A trial with seven spikes should, at the end of the trial, be assigned with probability 0.4 to the Poisson distribution with mean 4 and with probability 0.6 to the Poisson distribution with mean 10. In fact, depending on the data used to create the Poisson models, different spike trains of length 7 will be assigned different probabilities, but we expect the assigned probabilities to converge to p = 0.4 and 0.6. Figure 10A shows that when spike timing is known to be random, decoding on the basis of timing (using the mixture-of-Poissons model; white boxes) converges to the correct answer more slowly than the optimal decoding using spike count alone (black boxes).



View larger version (30K):
[in this window]
[in a new window]
 
Figure 10.   Decoding using the mixture-of-Poissons method is inferior to spike count decoding when there are no timing differences among responses to different stimuli. A, Mixture-of-Poissons decoding converges to the correct answer less quickly than does spike count decoding in a two-stimulus example with identical PSTH for both stimuli. x-Axis, Number of trials per stimulus. y-Axis, Distribution of estimated probabilities that a train with seven spikes arose from a homogeneous Poisson process with a mean of 4 spikes rather than from one with a mean of 10 spikes. The left (open) box in each set shows results using the mixture-of-Poissons decoding method. The right (filled) box shows results when decoding on the basis of spike count alone. The simulated data were generated by a homogeneous Poisson process with a mean of 4 for one stimulus and a homogeneous Poisson process with mean of 10 for the other stimulus. The mean number of spikes for each process was held fixed while the number of trials changed. Similar results were obtained using inhomogeneous Poisson processes as long as the rates varied in time in the same way (i.e., the normalized PSTHs were the same for the two processes). The left box in each set represents 2500 values, the results of decoding 500 randomly generated test trains using models derived from five different sets of simulated data. The right box represents 250 values, the result of decoding the response "seven spikes" according to the models derived from 250 sets of simulated data (using additional sets of simulated data does not change the results). Interpretation of box plots: The line at the center of each box shows the median estimated probability, and the notch shows a 5% confidence interval for the median. The bottom and top of the box show the 25th and 75th percentiles, and the bottom and top whiskers show the 5th and 95th percentiles. B, x-Axis, Time from stimulus onset (milliseconds). y-axis, Number of trials correctly decoded (as a multiple of chance) using the spike count only (dark boxes) and the mixture-of-Poissons method (light boxes). Each box plot represents results from 25 artificial data sets with spike distributions modeled on those from one of the neurons from the experiment with 16 stimuli and no differences in the timing of stimuli as above. Interpretation of box plots is as above (whiskers removed for visibility).

A similar effect can be observed in surrogate data generated to match the spike count distribution seen in one of the neurons from the experiment with 16 stimuli but with spikes equally likely at any time (i.e., with a flat PSTH) for all stimuli. Here again, spike timing is purely random and carries no information about which stimulus generated the response. Figure 10B shows that more trials are correctly decoded using the spike count alone than using the full mixture-of-Poissons model. Again, attempting to decode using random timing degrades performance.

In the cases just presented, the spike density function is (by construction) of no help in decoding but still must be estimated. When a small amount of data is available, random spike time variations cause differences in the estimates of the spike density functions for different stimuli, and these spurious differences interfere with decoding. The random differences in the spike density functions become smaller when more data are available to estimate them (although true differences in spike density functions can be detectable even with modest amounts of data). The need for data to eliminate spurious correlations would only increase if more complicated timing features were to be taken into account. Thus it is important to avoid trying to decode on the basis of complicated timing features that can be accounted for as stochastic consequences of the spike count and simpler timing features.

Comparing different decoding variants

Trains from the neurons from the experiment with 16 stimuli were decoded using both the order statistic method (with the spike count distributions represented using mixtures of Poisson distributions) and the simplified mixture-of-Poissons method. The correlation between the two sets of probabilities at the end of the trial (that is, 300 msec after stimulus onset) was 0.997 (median across neurons; iqr, 0.992-0.999). This verifies that the two methods give nearly identical results (up to certain issues related to time discretization near the end of a trial). As a result, the same stimulus was guessed by the two methods in 95% of the cases (median across neurons; iqr, 94-98%). It may seem odd that different stimuli could be guessed when the correlation between probabilities is so high. Table 1 shows that the difference between the probability of the winning stimulus and the next most probable stimulus can be quite small; thus small discrepancies can change the guessed stimulus in a small number of trials. As would be expected, the differences in the largest and second-largest probabilities in trials when the two methods made the same guess were ~5 times as large as the differences in trials when the two methods made different guesses. We did not decode the experiments with 128 stimuli using the order statistic method because of the computational burden.

As previously presented, when using the mixture-of-Poissons model, we have modeled the distribution of mean spike counts in any subinterval of the trial (from any particular time to the end of the trial) on the basis of the distribution of mean spike counts in the entire trial [lambda s,i and p(lambda ) at the start of the trial] and the firing rate modulation over time (Eq. 11). It is possible, at substantial expense in both computation time and additional model complexity, to avoid this approximation by measuring the distribution of spike counts in each subinterval and finding a new mixture of Poisson distributions to model each distribution. Despite the additional model complexity and computation, decoding using measured subinterval distributions gives results similar, but not identical, to those obtained when using estimated subinterval distributions: the correlation between the two sets of probabilities at the end of trials was 0.88 (iqr, 0.83-0.94), and the same stimulus was guessed in 75% of the trials (median across neurons; iqr, 65-82%). Approximately the same number of trials were correctly decoded using the two methods (Fig. 11A), although using the measured distributions yielded slightly more transmitted information (Fig. 11B).



View larger version (54K):
[in this window]
[in a new window]
 
Figure 11.   Decoding results using the mixture-of-Poissons model to estimate spike count distributions on subintervals (dark boxes) are similar to those obtained when directly measuring spike count distributions on subintervals (light boxes). A, The portion of trials correctly decoded is similar for the two methods. x-Axis, Time from stimulus onset (milliseconds). y-axis, Percent of trials correctly decoded (as a multiple of chance). The dotted line shows the approximate percent of trials correctly decoded by chance. B, Using measured distributions results in higher transmitted information than using estimated distributions. x-Axis, Time from stimulus onset (milliseconds). y-Axis, Transmitted information. Light boxes are narrower for visibility only. This graph uses the data from the experiment with 16 stimuli (Kjaer et al., 1997). Interpretation of box plots is as in Figure 7.

Decoding using the order statistic method or the mixture-of-Poissons method with directly measured spike count distributions on subintervals takes substantially longer (by approximately an order of magnitude) than using the mixture-of-Poissons model when estimating the count distributions on subintervals. The results of this section show that the extra time produces little change in decoding.

How good is the mixture-of-Poissons model?

Time-rescaling goodness-of-fit test

Barbieri et al. (2001) and Brown et al. (2002) presented a method for checking the goodness of fit of spike-train models. Their method is based on the time-rescaling theorem, which shows that any point process can be transformed, by a suitable rescaling of time, into a homogeneous Poisson process. To apply their method to our data, we randomly chose two-thirds of the trains to estimate the model, withholding one-third of the trains to use as a test set. Each model was tested using the Kolmogorov-Smirnov test presented by Brown et al. (2002), with p = 0.95. In the neurons from the experiment with 128 stimuli, the test spike trains were consistent with the model for 109 stimuli (median across neurons; iqr, 104-112). In the neurons from the experiment with 16 stimuli, the test spike trains were consistent with the model for 11 stimuli (median across neurons; iqr, 10-13). When the refractory-rebound effect was ignored, the results were very similar: the test spike trains were still consistent with the model for 108 of 128 stimuli (median across neurons; iqr, 104-111) and in 11 of 16 stimuli (median across neurons; iqr, 9-13). Overall, 82% of stimuli were consistent with the model when the refractory period was included in the model, and 78% were still consistent when the refractory period was ignored. Thus the mixture of Poisson distributions with different mean rates, and therefore different temporal structures, seems to account for much of the observed temporal structure of the spike trains. This may explain why our decoding results with or without the refractory period are so similar.

Number of trials correctly decoded

Another way to examine how well our model describes the data is to compare the performance of the decoder on real data with its performance on artificially generated surrogate data that match the real data in many ways and are known to have the structure specified by the model. If our model describes the data well, we would expect to decode real spike trains nearly as well as we decode artificially generated spike trains. If our model describes the data poorly, we would expect to decode artificial spike trains much more accurately than real spike trains.

We generated cell-matched surrogate data as described in Materials and Methods. Figure 12A compares the percent of trials correctly decoded in real and surrogate data, both for the spike count code and using the mixture-of-Poissons model. When decoding using the spike count only, we correctly decode 15% (median across neurons; iqr, 7-20%) more trials from surrogate data than from real data in the experiments with 16 stimuli (circles) and 18% (median; iqr, 10-23%) more in the experiments with 128 stimuli (triangles). The decoder does nearly as well when decoding data from the experiments with 16 stimuli using the mixture-of-Poissons method (plus symbols): we correctly decode 17% (median; iqr, 7-23%) more trials from surrogate data than from real data. However, when the mixture-of-Poissons method is used with data from the experiments with 128 stimuli (times symbols), many more trials are correctly decoded from surrogate data than from real data (median, 100% more; iqr, 75-150%). Figure 12B shows that the performance of the mixture-of-Poissons decoder is related to the amount of data available: the more data available to estimate the model parameters in the first place, the more closely the number of trials correctly decoded in real data approaches the number correctly decoded in surrogate data.



View larger version (16K):
[in this window]
[in a new window]
 
Figure 12.   Performance of the decoder on real data and matched surrogate data. A, Percent of trials correctly decoded (in multiples of chance) for the real data (x-axis) and for matched surrogate data (y-axis). Results are shown both when decoding using the spike count only (circles, 16 stimuli; triangles, 128 stimuli) and when decoding using the mixture-of-Poissons method (plus symbols, 16 stimuli; times symbols, 128 stimuli). Each symbol shows results for a single neuron, and each neuron is represented twice, once for decoding using the spike count and once for decoding using the mixture-of-Poissons method. The figures for surrogate data are medians across 10 artificial data sets. Real data are decoded nearly as well as surrogate data when using the spike count code and even when using the mixture-of-Poissons method for the experiment with 16 stimuli. However, for data from the experiment with 128 stimuli, the decoder performs much better on surrogate data than real data. B, The performance of the decoder depends on the number of trials per stimulus available. x-Axis, Median number of trials per stimulus. y-Axis, Ratio of the number of trials correctly decoded for surrogate data to the number of trials correctly decoded in real data. Only data from the experiment with 128 stimuli decoded using the mixture-of-Poissons method are shown (times symbols are used for consistency with A).

To check whether the number of stimuli involved (and therefore the complexity of the problem) contributes to the difference between the two sets of results, we created 16-stimulus neurons from 128-stimulus neurons by taking the responses from 16 randomly chosen stimuli and discarding the rest. Twenty-five subneurons were created from each of the two neurons with the most trials per stimulus (50 and 52). In these artificial neurons, 16% more trials are correctly decoded from surrogate data than from the real data (median; iqr, 11-20%), not significantly different from that seen for the neurons from the experiment that actually had only 16 stimuli (Kruskal-Wallis test, p = 0.7). There is still no significant difference if we try to minimize the influence of the number of trials per stimulus by comparing the results for the subsampled data only with results from the seven neurons from the experiment with 16 stimuli with between 40 and 60 trials per stimulus (Kruskal-Wallis test, p = 0.2). Thus correctly choosing among 128 stimuli requires more data to parameterize the model than correctly choosing among 16 stimuli.

Accuracy of individual stimulus probabilities: model error and estimation error

The most stringent possible test of the performance of the decoder assesses the accuracy of all the estimated stimulus probabilities. For example, if we look at all the stimuli assigned probabilities near p = 0.1 in various trials, we would hope that those stimuli were actually the stimuli that elicited those trials 10% of the time. This is essentially the same test we might use to evaluate the quality of a weather-forecasting service: it should rain on half the days with a 50% chance of rain, one-fourth of the days with a 25% chance of rain, and so on. To test this, we bin the estimated probabilities (we have one estimated probability for each stimulus for each trial) and then ask how often the stimuli represented in each bin really did elicit the corresponding trial.

Departure from accurate prediction can come from two sources. The first possible source of error is model misspecification: the mixture-of-Poissons model may not capture the true structure of the spike trains. The second possible source of error is difficulty fitting the model on the basis of the amount of data available. We will call these model error and estimation error. To separate these errors, we must examine how accurately the model estimates stimulus probabilities in three different kinds of data: the real data; artificial data, generated according to the model, with the same number of trials per stimulus as the real data; and artificial data, generated according to the model, with many more trials per stimulus than available in the real data. Model error can be examined by comparing results from the real data and the smaller set of artificial data: because the two sets have equal numbers of trials per stimulus, any difference in our ability to decode must derive from the fact that one set of data is known to be consistent with the model, whereas the other is not. Estimation error can be examined by comparing results from the larger and smaller artificial data sets: because both artificial data sets are known to be consistent with the model, any difference in our ability to decode must derive from difference in the amount of data available.

Figure 13A shows that, for the data from the experiment with 16 stimuli, the predicted stimulus probabilities are close to the true stimulus probabilities. Small predicted stimulus probabilities tend to be slightly too large, and large predicted stimulus probabilities tend to be slightly too small. This is true whether we decode using the spike count only (filled squares) or using the full mixture-of-Poissons model (filled circles), although the difference is smaller for spike count only. The same effect is seen in artificial data with the same number of trials per stimulus as seen in the real data, although it is substantially less pronounced (open squares, open circles). This indicates that some of our error is attributable to the model not quite fitting the data. However, in artificial data with many more trials per stimulus than in our actual data (500 trials per stimulus), the predicted probabilities are almost exactly equal to the observed probabilities both when the spike trains are decoded using the spike count only (plus symbols) and when they are decoded using the full mode