Abstract
The spiking output of an individual neuron can represent information about the stimulus via mean rate, absolute spike time, and the time intervals between spikes. Here we discuss a distinct form of information representation, the local distribution of spike intervals, and show that the timevarying distribution of interspike intervals (ISIs) can represent parameters of the statistical context of stimuli. For many sensory neural systems the mapping between the stimulus input and spiking output is not fixed but, rather, depends on the statistical properties of the stimulus, potentially leading to ambiguity. We have shown previously that for the adaptive neural code of the fly H1, a motionsensitive neuron in the fly visual system, information about the overall variance of the signal is obtainable from the ISI distribution. We now demonstrate the decoding of information about variance and show that a distributional code of ISIs can resolve ambiguities introduced by slow spike frequency adaptation. We examine the precision of this distributional code for the representation of stimulus variance in the H1 neuron as well as in the Hodgkin–Huxley model neuron. We find that the accuracy of the decoding depends on the shapes of the ISI distributions and the speed with which they adapt to new stimulus variances.
Introduction
What constitutes “the neural code”? Spike trains from single neurons may transmit information via the number of spikes in some time window, the absolute timing of spikes, and spike patterns. Spike rate and spike timing can be thought of as a continuum, with information represented by spike count at different time scales in different organisms and brain areas (Bialek et al., 1991; Mainen and Sejnowski, 1995; Reinagel and Reid, 2002). Information encoding via spike patterns and interspike intervals (ISIs), however, relies on relationships between spikes. In the identified neuron H1 of the fly visual system the meaning of particular sequences of spike intervals has been determined by using reverse correlation (de Ruyter van Steveninck and Bialek, 1988; Brenner et al., 2000). These multiple spike symbols permit stimulus reconstructions that increase in fidelity with the increasing number of spikes per symbol (Rieke et al., 1997). The meaning of spike patterns such as bursts has been examined in the retina (Meister and Berry, 1999), primate visual cortex (Reich et al., 2000), the lateral geniculate nucleus (Lesica and Stanley, 2004), and conductancebased models (Kepecs and Lisman, 2003). In the early auditory system (Cariani, 1999) single spikes are phaselocked to oscillations of the incoming sound wave. Although specific cycles of the wave may fail to evoke a spike, the distribution of intervals across the cell population shows clear peaks at the fundamental wave length and its harmonics and thus represents pitch. In this paper we take a different approach: we consider the distribution of intervals generated in time by a single neuron.
Stimulus decoding has been analyzed extensively in the neural code, usually when the stimulus distribution does not change with time (Bialek et al., 1991; Haag and Borst, 1997; Warland et al., 1997; Strong et al., 1998; Egelhaaf and Warzecha, 1999; Stanley et al., 1999; Brenner et al., 2000; Keat et al., 2001). Typically, however, statistics of the sensory environment change in time, and for many sensory systems the coding strategy of the system also changes. In the retina (Shapley and Victor, 1979; Kim and Rieke, 2001; Baccus and Meister, 2002), visual cortex (Carandini and Ferster, 1997), the auditory system (Kvale and Schreiner, 2004; Dean et al., 2005), and somatosensory cortex (M. Maravall, R. S. Petersen, A. L. Fairhall, E. Arabzadeh, and M. E. Diamond, unpublished observations) a changing stimulus variance results in contrast adaptation. During contrast adaptation the relationship between the input stimulus and the output firing probability may change to match the new dynamic range of the stimulus. In the case of H1 this takes a particularly elegant form: the input/output functions have the same shape, but the stimulus is scaled by its SD (Brenner et al., 2000; Fairhall et al., 2001). This rescaling potentially renders the code ambiguous with respect to stimulus variance. Although this ambiguity may be resolved by the spike rate, which depends on the variance envelope, spike rate also adapts, with spike frequency adaptation occurring over many seconds.
We claim that it is possible to decode stimulus statistics rapidly by monitoring the local statistics of spike intervals (Fairhall et al., 2001). Our goal here is to examine the speed, accuracy, and precision with which this decoding strategy can represent a timevarying variance. We consider distributional coding of visual stimulus motion in the flyidentified neuron H1 and, to demonstrate its generality, in a Hodgkin–Huxley (HH) model neuron.
Materials and Methods
Fly data.
Spike times were recorded from the H1 neuron (Franceschini et al., 1989) in the Calliphora vicina fly (Fairhall et al., 2001). Action potentials were recorded extracellularly as the fly viewed a horizontally moving vertical bar pattern. Bars moved across an oscilloscope screen at a standardized distance from the animal with random velocity, the velocities of which were chosen from a uniform distribution with unit variance (σ^{2} = σ = 1) multiplied by some constant. Stimulus frames were drawn every 2 ms, with a new velocity chosen for each frame. In the threestate switching experiment the stimuli with three different SDs, σ_{H} = 3σ_{M} = 9σ_{L}, were presented in the sequence as follows: σ_{H} → σ_{M} → σ_{L} → σ_{M} (see Fig. 1a). Each SD was presented for the same duration per cycle; the sequences with SDs σ_{H} and σ_{L} were each presented in 8 s blocks, whereas the two presentations of σ_{M} were each 4 s. Data were recorded for 2 h or 300 consecutively presented 24 s trials. In steadystate experiments the constant variance stimuli, with SDs spanning a 60fold range, were presented in random order to a single fly as described above for 15 min each over the course of a day.
Simulated data.
A standard HH spaceclamped model neuron was used to simulate changes in the neuronal membrane potential with respect to time (Hodgkin and Huxley, 1952). The equations were solved numerically by using fourthorder Runge–Kutta integration with a fixed time step of 0.05 ms. Injected current was simulated by a series of zeromean normally distributed random numbers that were smoothed by a Gaussian filter (fullwidth halfmaximum, 0.6 ms). Spike times were identified as the upward crossing of the voltage trace at 0 mV, with a resting potential of −65 mV. For steadystate data 10 1 h data runs were simulated, with normalized SDs spanning an eightfold range.
Estimating probability distributions.
Normalized base 10 logarithmic distributions of ISIs were created for spike train data. Because spike intervals span a wide range, using a logarithmic instead of a linear scale allows for differences in short intervals to be noticeable; bin widths were 0.05 log_{10} s, with range [−2.7 to 0.2] for the fly and [−2.1 to 1.7] for the model neuron. One was added systematically to all bins before the normalization of distributions that would be in the denominator of a ratio to avoid division by zero; when sample number n ≫ 1, this is analogous to assuming a uniform previous distribution and then finding the mean of the posterior probability distribution by using Bayes’ rule (cf. Gelman, 1995; Johnson et al., 2001). According to Bayes’ rule, the previous distribution is multiplied by a likelihood and divided by the probability of all data to give the posterior distribution. If the previous distribution is a uniform distribution, the posterior distribution is then the twoparameter Beta distribution Beta(α,β), which in this case is Beta(1 + x, 1 + n − x), where x is observed successes and n is the number of samples. The mean of any Beta distribution is α/(α + β), thus yielding a mean of (1 + x)/n when n ≫ 1. Avoiding zero bins allows all log ratios of these distributions to be computed; in practice, the treatment of these values noticeably affects results when either the distributions are undersampled or the supports of the probability distributions are markedly different, such as is the case between the extreme curves of Figure 4b.
Discriminating between two distributions.
Because our goal is to evaluate the information contained in the spike interval distributions, we use the Kullback–Leibler divergence, a statistical measure that is related to mutual information, to quantify the difference between two probability distributions as follows: This measure is zero if and only if P = Q and is otherwise > 0. Unlike a true distance measure, the Kullback–Leibler divergence is not symmetric, D_{KL}(P,Q) ≠ D_{KL}(Q,P), and does not obey the triangle inequality. For our purposes this is advantageous because it allows us to quantify the potential asymmetry in observing changes in different directions, i.e., comparing lowtohigh variance changes with hightolow variance changes.
D_{KL} quantifies the intrinsic classification difficulty (Johnson et al., 2001). Although D_{KL} does not give the absolute probability of error for a specific decoder, it does limit the ultimate performance of any classifier; a onebit increase in D_{KL} corresponds to a twofold decrease in error probability. Given a decoding model with some set of parameters, for small parameter changes the smallest achievable meansquared estimation error is inversely proportional to D_{KL}; this is the Cramer–Rao bound (Johnson et al., 2001).
Discrimination by using ISIs: binary choice.
Given a sequence of ISIs generated in response to a statistically steadystate stimulus, we would like to quantify our ability to decode responses by classifying them according to whether they were caused by a stimulus with SD σ_{A} or σ_{B}.
We wish to compute the Kullback–Leibler divergence between the distributions of ninterval sequences Δ_{1}, …,Δ_{n} generated by different stimulus SDs, D_{KL}^{(n)}=D_{KL}(P(Δ_{1},…, Δ_{n}σ_{B})‖P(Δ_{1},…, Δ_{n}σ_{A})). Because in practice the multidimensional distributions P(Δ_{1}, …,Δ_{n}σ) are very difficult to sample for n > 2, we will assume that successive intervals are independent. In this case, D_{KL}^{(n)} is given by the cumulative D_{KL} as follows: We estimated the distributions P(Δ_{i}σ_{A}) and P(Δ_{i}σ_{B}) from 10,000 unique random samples of interval sequences from the steadystate fly and model neuron data. To debias and estimate the variance of D_{cKL}, we applied the Bootstrap method (Press, 1992; Johnson et al., 2001) by drawing 10,000 sequences of n ISIs randomly with replacement from the initial set of 10,000 sequences. After repeating this procedure 200–300 times, we found the mean μ and the SD σ. Using the mean, we debiased D_{cKL}^{(n)} by subtracting μ − D_{cKL}^{(n)}, and we took the SD as the error. Note that, for steadystate data, the following applies: D_{cKL}^{(n)}=nD_{KL}(P(Δσ_{B})‖P(Δσ_{A})). We note that, for well sampled steadystate data, calculating D_{cKL}^{(n)} is equivalent to calculating the expectation of a decision variable D_{n} as follows: The D_{KL} values of Figure 4 are related to both the D_{cKL}^{(n)} values, which assess the ability to decode from an ninterval sequence, as well as to a possible loglikelihood decision variable (Fairhall et al., 2001), which could be used in decoding. In Fairhall et al. (2001), the signaltonoise ratio (SNR) of D_{n} was used to determine the reliability of decoding, where SNR is defined as the square of the mean over the variance of D_{n}.
Multiple choices: calculating the information that one interval gives about the stimulus variance.
When there are more than two choices, we compute the mutual information between the set of intervals Δ and the stimulus SDs σ_{i}. This quantifies the average reduction in uncertainty about the stimulus source that is gained by observing a single interval. The mutual information is defined as the difference between the total entropy H[P(Δ)] and the noise entropy H_{s}. In this application the noise entropy is the entropy of P(Δσ), the distribution of Δ conditioned on the SD σ, averaged over the set of possible values of σ as follows: H_{s}=−_{Δ} P(Δσ)log_{2}P(Δσ) (Cover and Thomas, 1991; Rieke et al., 1997; Dayan and Abbott, 2001). The difference between the total entropy and the noise entropy can be rewritten as the following: This information can be calculated either for the steadystate distributions or for subsets of intervals with a particular temporal relationship to the stimulus envelope, such as for the nth interval after a change in stimulus variance as in Figure 3. SDs for each data point are found by using a Bootstrapping method as described above. All distributions are well sampled because n ≫ m (n = [300 600 300], m = 59), where n is the sample number and m is the number of bins. Additionally, bias is negligible because V/B^{2} ≈ [n(log m)^{2}]/m^{2}, where V and B are the variance and bias of our estimator, respectively (Paninski, 2003). Therefore, we evaluate Equation 4 directly.
Testing the assumption of independence.
To test the assumption of interval independence, we calculate the mutual information between an interval Δ_{1} and Δ_{n}, the nth interval after it in a data sequence, as follows: Note that Equation 5 is symmetric with respect to the two intervals. Because of the limited sampling of the joint distributions, our estimates of these probability distributions will tend to lead to an overestimate of information (Treves and Panzeri, 1995); however, because n ≫ m (n: 30,000, m = 59^{2} = 3481), the biascorrected estimators are effective (Paninski, 2003). We correct for calculating the information for all of our data, I_{full}, and for two halves of our data, I_{half 1} and I_{half 2}; the corrected information is then I_{corr} = 2I_{full} − 0.5(I_{half 1} + I_{half 2}). This method derives from the fact that the first correction term in undersampled biased information is C_{1} = (2nln2)^{−1}(m_{s} − 1)(m_{r} − 1), where n is the sample size and m_{s} and m_{r} are the sizes, or number of bins, of the stimulus and response sets, respectively (Treves and Panzeri, 1995; Panzeri and Treves, 1996), and allows us to calculate I_{corr} without explicitly finding C_{1}. We verified that C_{1} is indeed linear with regard to smaller subsets of data as well; thus the firstorder term is a good approximation of the error.
Estimating a timevarying stimulus variance.
We demonstrate an explicit decoding of this interval code to estimate the timevarying SD of the stimulus by using the HH model. Probability distributions P(Δσ) are estimated in response to a range of steadystate stimulus SDs (see Fig. 4b). A timevarying stimulus variance then can be estimated from n successive output intervals of the HH model, using a maximum likelihood estimator as follows: where the estimate is based on an implicit assumption of a locally unchanging variance. Alternatively, stimulus variance can be estimated from spike rate via a firing rate–current variance curve analogous to a firing rate–current mean (f–I) curve. Because the HH model does not display spike frequency adaptation, the responses of the model to statistical changes in the stimulus are nearly instantaneous, and estimates from intervals and spike rate should be similar.
Results
In the case of H1 it has been shown previously that the spike rate varies adaptively with the stimulus variance envelope (Fairhall et al., 2001). Here we present a case in which spike frequency adaptation leads to explicit ambiguity in the firing rate response to the local value of stimulus variance. In this case we show that the distribution of ISIs presents an alternative decoding variable to spike rate. We examine the speed, accuracy, and precision with which this distributional code conveys information about the stimulus variance.
Resolving ambiguity
We construct a threestate experiment in which rate responses show adaptation in different directions during presentation of the same variance stimulus (Fig. 1). In this experiment we switch the SD of the white noise stimulus among three values, σ_{H} = 3σ_{M} = 9σ_{L}, presented in the sequence σ_{H} → σ_{M} → σ_{L} → σ_{M} (Fig. 1a). Thus the system experiences a stimulus SD of σ_{M} while adapting both to an upward transition σ_{L} → σ_{M} and to a downward transition σ_{H} → σ_{M}. Spike rates corresponding to stimulus SD σ_{M} from the two transitions differ noticeably (Fig. 1b). To estimate the distribution of possible firing rates, we collect the firing rates produced in response to the random white noise stimuli at a given time with respect to the variance envelope. Figure 1c displays the distributions of spike counts in 10 ms bins obtained from the last second of each variance epoch. The rate distributions sharing the stimulus SD σ_{M} overlap equally with those resulting from σ_{H} and σ_{L} (Fig. 1c). However, the steadystate spike interval distributions for σ_{M} are nearly identical and clearly differ from those resulting from σ_{H} and σ_{L} (Fig. 1d).
Therefore, if one were to use the spike rate to estimate the stimulus variance, one might conclude erroneously that four different stimulus variances were used rather than three, although the spike interval distributions would identify correctly the three different stimuli variances. Because we are computing rate by counting spikes in a given time window, small changes in the tails of log–ISI distributions can lead to large changes in the spike rate such that similar ISI distributions result in different rate distributions, as demonstrated in Figure 1, c and d. For example, adding a few long intervals has a negligible effect on the interval distribution but a dramatic effect on the estimated rate, because these few intervals span a long period of time.
To quantify how the locally sampled distributions of spike rate (Fig. 2a) and spike intervals (Fig. 2b) evolve with time, we use the Kullback–Leibler divergence D_{KL}, defined in Equation 1. We sample rate P_{t}(r) and interval P_{t}(Δ) distributions (300 repeats) in a 1 s bin at each time point t. We then compare these with their respective reference distributions P(rσ_{i}) and P(Δσ_{i}), which we determine from the rates and intervals sampled in the last second of each variance epoch. In Figure 2a we see that the rate distributions continue to evolve with time throughout the epoch until they are, by construction, identical to the reference distribution in the final second. The evolution of the rate is evident through the continuous change in the D_{KL} measure. However, in Figure 2b, D_{KL} for the interval distributions approaches a steady state of close to zero within approximately the first second. Furthermore, when the stimulus has SD σ_{M}, the interval D_{KL} values from the two σ_{M} steadystate distributions are approximately equally close to zero.
Figures 1d and 2b suggest that information about stimulus variance can be obtained quickly from ISIs after a transition. Therefore, we calculate the timedependent mutual information provided by each successive interval after a variance switch according to Equation 4 in Materials and Methods. For the nth interval after a switch, Δ_{n}, we calculate I(Δ_{n};σ) and find that the amount of information provided by each interval increases rapidly to a steady state (Fig. 3). Given a threestate discrimination, there is a maximum of log_{2}(3) = 1.585 bits of information to obtain. If the assumption of independent intervals is valid, perfect discrimination then can be achieved with ∼8–12 intervals.
Evaluating the accuracy and precision of the distributional code
How accurately can such an interval code convey stimulus information, and what are the speed/accuracy tradeoffs in estimating the variance from a sequence of intervals? One expects that the distributions are most discriminable in the steady state and that adaptation will slow the acquisition of information about a variance change. Therefore, we examined steadystate data from the C. vicina fly viewing uniformly distributed white noise motion stimuli with a constant variance and data from a HH model neuron (Hodgkin and Huxley, 1952) driven by a Gaussiandistributed white noise current input with constant variance.
ISI distributions of the fly and simulated data demonstrate a shift of the peak toward shorter intervals, with increasing stimulus variance that is more marked in the fly ISI distributions than in the simulated data (Fig. 4a,b). Although both distributions display tails at long time scales, the tails of the fly ISI distributions have a consistent slope, whereas the relative area under the HH tails increases dramatically with decreasing variance. The presence of these long time scale tails allows for significant changes in spike rate without a large change in the overall ISI distribution. Contours of the Kullback–Leibler divergences between distributions generated from stimuli with SD σ_{A} and σ_{B} are plotted in Figure 4, c and d, and the asymmetry in the D_{KL} is readily apparent because the greatest distance for the fly data is between responses resulting from low to highvariance switches in contrast to the model neuron in which the opposite is true.
Asymmetry in the discrimination of variance transitions has been noted previously in a fly experiment of timevarying variance, in which steadystate information transmission was reached more quickly after an increase in stimulus variance than after a decrease in variance (Fairhall et al., 2001). In the detection of a variance change in a Gaussian distribution, this asymmetry follows directly from signal detection theory: it is easier to spot an unexpected largevariance outlier than to realize that too many values are falling in the center of the distribution (DeWeese and Zador, 1998; Fairhall et al., 2001; Snippe et al., 2004). Our results, which examine steadystate data, suggest that in decoding the change the opposite also can be true; the interval distributions are highly nonGaussian, and the direction of the asymmetry in D_{KL} will depend on the particular shape of the two distributions. For decoding during adaptation at least two asymmetries may compete: different rates of adaptation either can reinforce or can counteract the asymmetry in reading out the nonGaussian signal from the interval data. Even in steady state there may be two causes of asymmetry: again, the possible asymmetry of the D_{KL} divergence between distributions (Fig. 4c,d) and an asymmetry in the variance of a discriminator variable, such as the SNR discriminator based on Equation 3. Even when the D_{KL} is higher for a lowtohigh variance transition, a very large variance can make this transition less reliably detected from the resulting intervals than a hightolow transition (data not shown).
To quantify how well information about stimulus variance can be decoded from sequences of n intervals, we assume successive intervals are independent and use the cumulative D_{cKL}^{(n)}, Equation 2, to compare responses (see Materials and Methods). Figure 5 shows how the estimated D_{cKL}^{(n)} varies as a function of stimulus SD ratio for several values of n. It is evident that discrimination is relatively easier for the model neuron than for the fly H1 neuron. The asymmetries of Figure 4, c and d, also are reflected here such that discrimination is easier for lowtohigh variance changes in the fly data, and hightolow variance changes are discriminated more easily from model neuron data.
As a measure of discrimination we use a difference of one bit, which corresponds to a twofold reduction in the probability of classification error by a decoder (Johnson et al., 2001). Discriminating stimuli with SDs that differ by a factor of 10 in the fly (Fig. 6a,c) requires approximately three intervals and ∼100 ms. For HH output the discrimination is relatively easier: discrimination of variances differing by a factor of 3 requires approximately three spikes and <100 ms (Fig. 6b,d). Although the time to discrimination typically increases with the number of spikes required, this is not always true and depends on the shape of the ISI distributions. For example, when we compare distributions that differ significantly in the long time scale tails, few spikes may be needed to reach a given level of discrimination, but the average required time may be long if the discriminating spike intervals lie in the long time scale tails.
Because this measure of decoding ability assumes independence of successive ISIs, we test this assumption in two ways. First, we calculate D_{KL} values for a twointerval discrimination, using both the joint distribution of the two ISIs, which takes into account interval correlations, and the independent approximation, Equation 3. For the fly data we find that, as expected, discrimination with the joint distribution is slightly better but on the whole very similar (Fig. 7a). Second, we calculate the mutual information I(Δ_{1};Δ_{n}) between the first and the nth interval in a sequence, according to Equation 5, for each of the distributions in Figure 4a and plot their mean values (Fig. 7b). This gives us a measure of the correlations between intervals and is equivalent to the D_{KL} between the joint and independent distributions of Δ_{1} and Δ_{n}. After a separation of more than one interval, the intervals, although slightly correlated, have very small mutual information and can be considered as approximately independent (Fig. 7b). For the HH neuron there is approximately zero correlation after a single interval.
Realtime decoding
Until now, we have assessed the ability with which one, in principle, could decode information about stimulus variance. However, we also would like to decode stimulus variance explicitly. Because we have unlimited data from the HH model, we use the simulation to test the plausibility of onetrial realtime decoding of a continuously varying variance envelope. Therefore, we now estimate stimulus variance using one example of a 60 s spike train from the HH neuron (Fig. 8). We assume that we have access to a library of reference distributions P(Δσ) in which each distribution is gathered from a steadystate stimulus variance experiment (Fig. 4b). From this we can use a maximum likelihood estimator to estimate the value of the timevarying stimulus variance: the estimate is the SD that yields the highest probability of n successive spike intervals, as in Equation 6. In principle, given a suitable parametric family of distributions as a function of variance, one might compute these results analytically. However, here we use sampled data with 10 values of the variance spaced such that successive variances are ∼1.6 times the previous (given Fig. 5b, this is a reasonable discretization scale for decoding with the use of more than one interval). If we consider two successive intervals, an estimate requires on average 22 ms of spike train data (Fig. 8a), whereas an estimate based on 15 intervals requires a mean time of 314 ms (Fig. 8b). Using these mean times as the window width, we also estimate stimulus SD from the number of spikes in this window. As the number of intervals considered or the window width increases, variance estimates better match the true stimulus variance (Fig. 8c,d). Because the HH model does not display spike frequency adaptation, we do not expect any ambiguity, as exemplified in Figure 1, and estimates from intervals and spike rate should converge for longenough windows (Fig. 8e). However, calculating spike rate in short windows severely discretizes the possible estimates of variance (Fig. 8a, red circles), and for short times the calculations based on intervals can give better variance estimates even when adaptation is not present (Fig. 8e).
Discussion
We have examined the ability of a code based on the local distribution of ISIs to convey information about stimulus variance in an identified fly visual neuron and in a simulated HH neuron. We present a specific example in the fly visual system in which adapting spike rates measured locally give insufficient information to determine stimulus variance, whereas an ISI distributional code allows this discrimination to be performed with reasonable accuracy. We find that this code can be decoded rapidly to convey accurate information about the local stimulus variance, that the precision of this code depends on the particular shape of the ISI distribution, and that this code is effective in the general and simple HH model neuron.
During adaptation in H1 the rate distributions do not provide a unique mapping to stimulus variance, whereas ISI distributions provide an approximately onetoone mapping (Fig. 1). This results from the observation that rate distributions evolve slowly with time (on the order of seconds), whereas ISI distributions quickly (∼200 ms) reach a steady state (Fig. 2). The ISI distributions permit variance discrimination as quantified by the Kullback–Leibler divergence, which depends on the shape of the ISI distribution (Figs. 4⇑–6). This can lead to asymmetries with respect to variance increases or decreases in the time to decode the variance change. Decoding with ISI distributions can be used to estimate a timevarying stimulus variance (Fig. 8).
As demonstrated in Figures 1⇑–3, this distributional code can yield stimulus variance information quickly that is unavailable from spike rate; qualitatively, this is based on the shapes of the ISI distributions, which reach steady state after a sudden variance change more quickly than does spike rate. That the ISI distributions may be very similar although the spike rate distributions are different and evolving may seem paradoxical, because the firing rate can be defined as the inverse of the mean ISI. However, here we have considered rate computed from spike counts in a given time window. Short intervals are characteristic of a given variance, and variance adaptation leads to a slow change in the proportion of short and long intervals. In the switch to a new variance there is very rapid change in the typical pattern of short intervals, although the relative probability of long intervals adapts more slowly. These small changes in the long interval tails of the distributions lead to significant variations in the rate, as we have observed during adaptation. These features typically render the rate unsuitable to estimate accurately over the short times relevant for the encoding of natural stimuli (Gautrais and Thorpe, 1998). Furthermore, different parts of the interval distribution adapt at different speeds, and we accentuate this feature by plotting log–ISI distributions.
The robustness of this distributional code is related to the shape of the steadystate distributions. The ISI distributions of both the fly H1 neuron and the HH model vary systematically with the variance of the driving stimulus. These variations can be analyzed quantitatively by using the Kullback–Leibler divergences (Figs. 4c,d, 5, 6). It has been noted previously that it is easier, both theoretically and experimentally, to discriminate a variance increase than a variance decrease (DeWeese and Zador, 1998; Fairhall et al., 2001; Snippe et al., 2004). However, here we have considered the speed and accuracy of decoding from a nonGaussian distribution, where this may not hold. As has been pointed out (DeWeese and Zador, 1998), the speed of recognizing a change in distribution depends on the occurrences of outliers after the change. In our results the distributions corresponding to different stimulus variances have distinct and nonGaussian forms; Figure 4, a and b, includes cases in which the occurrence of distinctive outliers happens more frequently for the lowvariance distribution than for the highvariance distribution, thus resulting in easier identification of the lowvariance rather than the highvariance source.
In addition to coding via spike rate, spike timing, and specific spike patterns, distributional coding may provide another means for the neuron to transmit information. Although information in spike rate typically is thought to be transmitted rapidly via averaging across populations of neurons (Shadlen and Newsome, 1998), we have shown that this distributional code provides a means of conveying ensemble information rapidly (∼10–100 ms) via the output of a single neuron. Other results also show a strong dependence of the interval statistics on the variance of the driving stimulus (Johannesma, 1969; Hunter et al., 1998; Wang and Wang, 2000; Hunter and Milton, 2003). Furthermore, results from the HH neuron demonstrate that this variance dependence is present for a simple conductancebased model neuron and is likely to be present for many neural types. The decoding scheme that we propose shows that, in principle, this information is available. A biophysical implementation of such a decoding may involve an array of tuned synapses showing shortterm depression and facilitation (Tsodyks and Markram, 1997; Varela et al., 1997) to provide sensitivity to interval sequence. Thus a library of probability distributions P(Δσ) (as in Fig. 4b) might be “stored” at synapses in which a preferred ISI sequence leads to a maximal postsynaptic response.
Adaptation is functionally very important for sensory systems, allowing the system to tailor its responses to the local stimulus and thereby increase information transmission. However, adaptation leads to ambiguity, whereby the relationship between stimulus and response is no longer onetoone but depends on the statistical context of the stimulus. There are three possible solutions to this problem. Information may be conveyed by other neurons in the network, or it may be discarded entirely, as may be the case for light/dark adaptation of photoreceptors. We suggest a third possibility: contextual information may be conveyed by the statistical properties of the spike train. Although the first two of these possibilities also may be true for H1, we have demonstrated that information about the statistical ensemble indeed may be extracted. The dependence of the interval statistics on the stimulus ensemble may be a generic property of neural response, because it is observed in neurons as simple as the HH model (Wang and Wang, 2000). It therefore is tempting to suggest that using this dependence to convey ensemble information may be a universal capability of neural computation.
Footnotes

This research was funded in part by Nippon Electric Company (NEC) Research Institute and by a Burroughs Wellcome Careers at the Scientific Interface Grant to A.L.F.; B.N.L. also was supported by the Medical Scientist Training Program, a fellowship from the National Institute of General Medical Sciences (T32 07266), as well as an Achievement Rewards for College Scientists Fellowship. We are grateful to Geoff Lewen and Robert de Ruyter van Steveninck for assistance with the experiments and to William Bialek, Michael Berry, Blaise Aguera y Arcas, Matt Higgs, and William Spain for discussions.
 Correspondence should be addressed to Adrienne Fairhall, Department of Physiology and Biophysics, University of Washington, Health Sciences Building G424, Box 357290, Seattle, WA 98195. fairhall{at}u.washington.edu