Abstract
We have used a combination of theory and experiment to assess how information is represented in a realistic cortical population response, examining how motion direction and timing is encoded in groups of neurons in cortical area MT. Combining data from several singleunit experiments, we constructed model population responses in small time windows and represented the response in each window as a binary vector of 1s or 0s signifying spikes or no spikes from each cell. We found that patterns of spikes and silence across a population of nominally redundant neurons can carry up to twice as much information about visual motion than does population spike count, even when the neurons respond independently to their sensory inputs. This extra information arises by virtue of the broad diversity of firing rate dynamics found in even very similarly tuned groups of MT neurons. Additionally, specific patterns of spiking and silence can carry more information than the sum of their parts (synergy), opening up the possibility for combinatorial coding in cortex. These results also held for populations in which we imposed levels of nonindependence (correlation) comparable to those found in cortical recordings. Our findings suggest that combinatorial codes are advantageous for representing stimulus information on short time scales, even when neurons have no complicated, stimulusdependent correlation structure.
Introduction
The modern study of neural coding began with the discovery that neural responses to sensory stimuli are composed of discrete events: either a cell generates an action potential or it remains silent (Adrian, 1926). If we average, either over long periods of time or over many presentations of the same sensory inputs, then we can recover a continuous description of neural responses in terms of the firing rate or probability of spiking. But the brain does not have access to averages across long times or repeated presentations: behaviors occur in response to individual stimuli and often are evoked by changes in sensory input that occur on very short time scales. The mammalian brain can, however, average responses over a population of neurons and has access to many cortical neurons that are nominally redundant in the sense that they have nearly identical feature selectivity (Mountcastle, 1957, 1997; Hubel and Wiesel, 1962). In this study, we ask whether averaging is the best strategy for pooling information from a nominally redundant population, or whether the underlying discreteness of spikes and silences is informative even in large populations of neurons.
In general, populations of neurons with similar feature selectivity could function in two very different ways. In one view, the signals carried by the different cells are redundant and primarily independent, so that averaging over the population serves to reduce noise. At the opposite extreme, information is transmitted by a combinatorial code, defined as the pattern of spikes and silences across the population. If different combinations of spikes and silences stand for different sensory stimuli, then averaging would discard potentially large amounts of information present only in the combinatorial code. A number of groups have presented evidence for combinatorial coding, in the cortex (Gawne and Richmond, 1993; Panzeri et al., 1999; Reich et al., 2001; Narayanan et al., 2005; Montani et al., 2007) and elsewhere (Perkel and Bullock, 1968; Schneidman et al., 2006b), but it seems fair to say that the view of averaging for noise reduction continues to be more prominent in discussions of sensory coding in the cortex.
Here we reexamine the issue of combinatorial coding using neural responses to visual motion recorded in the middle temporal (MT) area of visual cortex (Dubner and Zeki, 1971; Maunsell and van Essen, 1983; Albright et al., 1984). In MT, as in other cortical areas, it is difficult to record simultaneously from more than a few neurons with similar feature selectivity. We have extended previous work by using sequential recordings from many MT cells to construct models that are faithful to the measured properties of the individual cells but allow us to extrapolate to the behavior of larger populations. Even when we assume that cells respond independently to their sensory inputs, the situation when averaging works best, we find that particular combinatorial patterns of spiking and silence in such populations frequently carry substantial information about the sensory input, information that is lost by averaging across neurons. This coding advantage is robust over a reasonable range of correlation strengths and arises because neurons can have very different response dynamics, even when they are nominally redundant in the sense that they possess similar feature selectivity. Our central conclusion is that combinatorial coding becomes possible as soon as cells having “similar feature selectivity” along conventional axes (such as motion direction and speed in MT) possess some diversity of response properties along another axis.
Materials and Methods
Experimental methods.
Experimental data have been published previously (Osborne et al., 2004). To acquire these data, extracellular singleunit microelectrode recordings were made in three sufentanilanesthetized, paralyzed monkeys (Macaca fasicularis) according to a protocol that had been approved in advance by the Institutional Animal Care and Use Committee at the University of California at San Francisco. Using random dot texture stimuli presented on a highresolution analog oscilloscope display, we mapped receptive field location, determined the preferred direction and speed of the neuron under study, and sized the stimuli to maximally excite each neuron. The random dot texture was moved behind a stationary aperture, creating a moving stimulus at a fixed retinal location.
Visual stimuli were presented in discrete trials. Each stimulus appeared and remained stationary for 256 ms, then stepped to a constant velocity for 256 ms, and was again stationary for 256 ms. A brief pause separated successive trials, and directions of motion were pseudorandomly interleaved. A typical experiment included 13 motion directions that spanned ±90° around the neuron's preferred direction in 15° increments, all presented at the neuron's preferred speed. Each stimulus was presented up to 222 times. Spike times were recorded with 10 μs resolution.
Constructing a model population.
We consider first a model population in which each cell responds independently to its sensory inputs. Mathematically, this means that the probability of responses from the population can be decomposed as a product of probabilities for each individual cell, as in Equation 1 below, where these singlecell properties have been estimated directly from the experimental data. Examples of the responses from the population can be drawn out of this distribution, but for small groups of cells, we can also draw samples directly from the raw data. These approaches are mathematically equivalent.
To create a model population of 10 or fewer neurons with nominally redundant feature selectivity, we aligned all cells by their preferred direction and assumed that all neurons had the same preferred speed. Then, we resampled the rasters of individual cells at Δτ = 8 ms resolution, labeling the occupancy of each time bin with a “1” if there had been one or more spikes in the time interval or a “0” if there had not. At 8 ms resolution, very few neurons emitted multiple spikes in a single bin; this occurred in fewer than 10% of the bins in our analyses. Thus, 8 ms was a good compromise between filling bins with spikes but not overfilling them with multiple spike events. Rather than focus on the pattern of spikes and silences across time in individual neurons, we created binary population “words,” patterns of 1s and 0s across neurons in a group, and used the words to represent whether each neuron had fired or not in a single time bin. Hence, at each time point during the response, we drew the N letters of each word randomly from the collection of stimulus repetitions from all N cells in our sample in the appropriate bin. Each neuron in our sample corresponded to a fixed position in the word, and we could construct many different words by random draws from the many repetitions of each stimulus for each neuron. The probability of observing a particular word, P(n ≡ {n_{i}}), where n_{i} labels the count (0 or 1) in neuron i, was then measured by estimating the frequency of occurrence of that pattern of 1s and 0s within the entire dataset.
Directly sampling P(n) from the independently recorded neural responses in this way is tractable for small populations of fewer than 10 neurons. For population sizes >10, however, the intuitive method of resampling from the data requires generation of millions of samples to avoid introducing biases in the estimation of information theoretic quantities. Therefore, rather than sampling the rasters directly for groups larger than 10, we construct the probability distributions for the responses using the trialaveraged firing rates of the individual neurons.
On the assumption that our neurons respond independently to their sensory inputs, we can write the conditional probability distribution for the population's response n ≡ n_{i} at some moment of time t (defined relative to stimulus onset) as follows: In terms of the spiking probabilities of each neuron, in the case in which either one spike (n_{i} = 1) or no spikes (n_{i} = 0) are emitted in each time bin, this distribution takes the following form: where q_{i}(t) denotes the probability of a spike from the ith neuron at time t. The expression can be rewritten as follows: where Z(t) is a normalization constant and . In the limit Δτ→0, we can identify the timedependent firing rate of each neuron r_{i}(t) = q_{i}(t)/Δτ. Thus, the instantaneous firing rates of our cells fully determine the φ_{i}(t)s, and we can compute the word probabilities, P(n), using Equation 3 without any further assumptions. This computational approach works because the spike rate of each neuron, at each moment of time, is well determined by our experimental data with small error bars. Using this method allows us to estimate the word distribution for large population sizes in which direct sampling would be difficult both computationally and experimentally. We checked that the method yielded the same answers as the sampling approach for values of N where the calculations were tractable.
Constructing a correlated population.
We continue to work in small time windows, of duration Δτ = 8 ms, such that the response of each neuron i consists either of one spike (n_{i} = 1) or silence (n_{i} = 0). In the independent model (Eq. 3), there are no correlations between the responses of different neurons, n_{i} and n_{j}, once we know the stimulus. We are able to construct correlations among neurons by adding an explicit term to the exponential that couples the responses of the different cells: Equation 4 defines a family of models with a range of correlations between our neurons while preserving their experimentally observed timevarying firing rates. For small J, the strength of these correlations is proportional to J. This coupling is constant in time and stimulus independent. Equation 4 is the least structured or maximum entropy model that generates some average level of correlations among all the pairs of neurons (Jaynes, 1957; Schneidman et al., 2003; Schneidman et al., 2006a).
If we choose some value of the coupling, J, which ultimately will set the strength of correlations in the population, we still need to be sure that, at every moment, the probability of spiking from the individual cells matches the observed firing rates as a function of time r_{i}(t). To do this, we solve for the φ_{i}(t)s in Equation 4, subject to the following constraints: where the r_{k}(t)s are measured singlecell firing rates, averaged over a small time window, Δτ = 8 ms. Because these equations are not coupled in time, we can solve for the fields at each time point independently. We have an analytical solution for the fields with J = 0, and we can proceed from this solution using perturbation theory, from which we obtain an equation relating small changes in J to their effect on the fields, φ: where α and β index neurons in the group of N cells and χ is the connected part of the twopoint correlation function, χ = 〈n_{i}n_{k}〉 − 〈n_{i}〉〈n_{k}〉. In practice, we solved for the fields at very small increments, ΔJ/J = 0.001, checking satisfaction of the constraints on the firing rates at each step. This perturbative approach is fast but accumulates errors. To correct for the accumulated errors, we performed local function minimization whenever the fractional error in the singlecell rates exceeded 10^{−8} and then returned to the perturbative stepping until the error bound again was reached.
Once we created a model population response, we summed the spike counts across the full time window of the response to motion and computed the correlation coefficients between counts in all pairs of neurons in our model population. The mean of these coefficients provides an index for the overall strength of the correlations. Experimentally, for neurons in MT, the correlation coefficients are in the range of 0.1–0.2 (van Kan et al., 1985; Gawne and Richmond, 1993; Zohary et al., 1994; Lee et al., 1998; Bair et al., 2001; Averbeck and Lee, 2003; Kohn and Smith, 2005), which corresponds to J = 0.11 to 0.16 in our models.
Estimating information.
To estimate the information carried by population words about the stimulus, we first computed the probability of observing each particular Nneuron word from our dataset P(n ≡ {n_{i}}), where n_{i} labels the count (0 or 1) in neuron i. Word probabilities were determined by the frequency of occurrence across all motion directions and times. The total entropy of the words is given by the following: The probability of observing a word for a particular stimulus, P(nφ,t), was estimated in a similar manner to P(n), but now at each particular time t relative to the onset of motion in a direction φ. The entropy of the conditional distributions is given by the following: The average amount of information that words carry about the stimulus is given by the difference between the total entropy and the average noise entropy: where T represents the total number of time bins in the response and φ indexes the 13 motion directions. We computed the information from counts in a similar way, using the same P(n) as before, but collapsing over words with the same number of spikes, such that P_{count}(n) = Σ_{n}P(n) · δ(n,Σ_{i}n_{i}), where the sum runs over all words, n, that have a count equal to n, enforced by the δ function, δ(n,Σ_{i}n_{i}), which is unity when the sum equals n and zero otherwise. Similarly, P_{count}(nθ,t) = Σ_{n}P(nθ,t) · δ(n,Σ_{i}n_{i}). With P_{count}(n) in hand, we computed I_{counts} in a completely analogous manner to the calculation of I_{words}.
When we computed word probabilities directly from Equation 3, we propagated the errors in measured firing rates to obtain errors in the derived information measures. In cases in which we generated samples of population words directly from observed spike trains, all entropy estimates were corrected for finite sampling effects by taking multiple random samples of fractions of the dataset and then performing a linear extrapolation to infinite sample size (Strong et al., 1998). Errors in information quantities were estimated by extrapolating the SD of values computed from half the sample in the same manner.
All of our information quantities depend only on patterns of spiking and silence in single time bins. Thus, for the purpose of our calculations, correlations between time bins (and hence the question of whether the spike trains approximate modulated Poisson processes) are irrelevant. As a test of our computations, we created shuffled spike trains with exact Poisson statistics and reproduced all of our results.
Testing the effects of population response diversity: tuning versus dynamics.
After analyzing the information available from different groups of real neural responses, we asked how the information from words changed when the responses of different populations of neurons were altered artificially to have the same direction tuning width and/or the same timevarying firing rate. To do so systematically, we repeated the analysis many times, using the direction tuning and timevarying firing rate of each neuron, in turn, as a template. For each group of 10 cells, we randomly chose another neuron from the population as the template and computed its normalized tuning curve, r̄(θ)* = 〈r(t,θ)〉_{t}/〈r(t,θ = 0)〉_{t}, where the asterisk indicates the template neuron, the bar indicates normalization, and 〈…〉_{t} indicates a time average over the 256 ms presentation of one motion direction, θ. We also computed the shape of the selected neuron's temporal modulations in rate at the preferred direction, r̄(t)* = r(t,θ = 0)/〈r(t,θ = 0)〉_{t}. We then used these two functions to serve as templates for fixing the tuning or firing rate dynamics of the group. To fix the tuning of the population, we allowed each cell to retain its own temporal dynamics, but rescaled each trace, denoted r̄(t,θ), by a constant factor that forced the cell's tuning curve to have the same shape as r̄(θ)*, such that r̄(t,θ) = [(r(t,θ)/〈r(t,θ)〉_{t})·〈r(t,θ = 0)〉_{t}]·r̄(θ)*. Note that rescaling fixes the tuning curve shape (i.e., the bandwidth) but allows each neuron's tuning curve to retain its original peak amplitude. To fix the firing rate dynamics, each cell retained its own directional tuning curve, but temporal dynamics were set by the template r̄(t)*, so that r̄(t,θ)* = 〈r(t,θ)〉_{t}·r̄(t)*.
Spikesilence synergy.
To measure the synergy between spikes and silences in our population words, we took the difference between the stimulus information that a particular word captured and the sum of the information from each component spike and silence (Brenner et al., 2000; Schneidman et al., 2006b), The stimulus information, I, was computed as described by Brenner et al. (2000) and is given by the following: where r(t) is the rate for a given event, the occurrence of a given word, or a spike or silence from a particular cell.
Plotting response–conditional stimulus distributions.
We measured the distribution of stimuli giving rise to a particular response pattern directly from our neural data. We then smoothed the result with a Gaussian kernel, the parameters of which were chosen to approximate the size of a “pixel” determined by the direction spacing used in our experiments and the chosen temporal resolution. In this kernel approach, we approximate the distribution as a (normalized) sum of terms, in which each term is a small “blob” centered on each stimulus that gives rise to the particular neural response pattern. Here we are estimating a distribution of motion direction and time since motion onset, and we used a twodimensional Gaussian kernel, so that our estimate becomes as follows: where σ_{t} = 20 ms and σ_{θ} = 10°. The values of σ_{t} and σ_{θ} were chosen to be about the size of one time or direction bin so that we obtained a smoothed estimate of the conditional distribution but did not lose resolution.
Results
Given a population of neurons that, by conventional criteria, have the same feature selectivity, we want to know whether averaging over the population makes the best use of the neural responses, or whether averaging discards information that is embedded in a combinatorial code. Our data comprise recordings from extrastriate cortical area MT, which provides the sensory inputs that mediate both the perception of visual motion and behaviors like smooth pursuit eye movements (Newsome et al., 1985; Groh et al., 1997; Born et al., 2000). Guided by the time scales characterizing the dynamics and precision of smooth pursuit (Osborne et al., 2005, 2007; Osborne and Lisberger, 2007), we focus on neural responses in small time windows, during which time each cell typically generates either a single action potential (1) or not (0); patterns of response across the population thus consist of binary words, whereas the averaged or pooled response of the population is given simply by the total number of spikes. There are many plausible but distinct definitions of “combinatorial coding.” In this section of the paper, we leave the abstract definition of combinatorial coding aside and focus on a series of concrete questions about the relationship of words and counts to the sensory input: Do the words carry information about the stimulus that is lost when we keep only the spike counts? Why? Do different words with the same number of spikes stand for different sensory inputs? Is the information provided by the pattern of spikes and silences greater than the sum of the contributions of each spike or silence in the individual cells? In the Discussion, we argue that the answers to these questions give us a new perspective on the problem of combinatorial coding in cortical area MT and in neural populations more generally.
Extra information about stimulus properties from patterns of spikes
To evaluate the possible existence of a combinatorial code, we consider a population of N MT neurons. Experimentally, we have observed many responses to each of a finite set of different stimuli (Osborne et al., 2004). Each of the cells in our sample was directionally tuned, with relatively similar selectivity and bandwidth when responses were normalized and preferred directions were aligned (Fig. 1A); by conventional measures, our sample is a redundant population of neurons. If we assume that each neuron responds independently to its sensory inputs, we can draw a single trial response from each neuron in our database to create a model population response, although the samples were recorded sequentially from many different neurons (see Materials and Methods for details). Armed with the model population responses, we can answer the questions outlined above.
If we look in a small window of time Δτ, then the ith cell generates n_{i} spikes, with i = 1, 2, …, N. For small values of Δτ, we will almost never see two spikes from a single cell. Thus, the response of the population {n_{i}} at that moment in time can be treated as an Nletter binary word, n (representing a “vector” pattern of 1s and 0s across labeled cells), as shown in Figure 1B. By keeping track of the appearance of combinations of spiking and silence across the population over all times in the response, we can ask how much information these code words carry about the stimulus. At each instant of time, the stimulus in our experiments is specified by the direction of motion θ and the time from the onset of motion, t − t_{onset}, and calculations described in Materials and Methods, allow us to use the experimental data to estimate the information that the pattern or number of spikes provides about the stimulus, I(n;θ, t − t_{onset}) or I(n;θ, t − t_{onset}), respectively.
The results in Figure 1C demonstrate that the information provided by binary code words n increases as a function of the number of cells N that contribute to the word, exceeding 1 bit for a population of 16 neurons. If we use the same draws from the experimental data to estimate the information that the (scalar) spike count n ≡ Σ_{i=1}n_{i} provides about the stimulus, we find that the total amount of information from spike counts is smaller than the total information from words and never exceeds 1 bit even when spike counts are pooled across all neurons. The combinations of spiking and silence in the model MT population provide more than twice as much information as the pooled spike counts, although the cells in the pool have nominally redundant feature selectivity. Our first finding extends the previous analysis of Reich et al. (2001) from 6 neurons to 22 neurons, showing that a combinatorial code could be advantageous even for large neural populations. In this analysis, it is important that the stimulus is defined both by the direction of motion and by the time since motion onset; even with a limited number of motion directions, the availability of information about timing avoids the possibility that information will saturate because the set of inputs is too impoverished, as discussed by Rieke et al. (1997) and by Reich et al. (2001).
The fact that patterns of activity provide extra information about the stimulus compared with counting spikes means that the neurons are not fully redundant. Although they have very similarly shaped tuning curves and we have treated them as preferring the same direction of stimulus motion, they vary in response amplitude, in tuning bandwidth, and in the time course of the trialaveraged firing rate during the response to a step of target velocity (Lisberger and Movshon, 1999). As illustrated in Figure 2A, the firing rate can rise quickly or slowly, and different neurons might show a small transient, a larger transient, or a purely tonic response. To ascertain which feature of the neural response provides the extra stimulus information available from words versus pooled spike counts, we next created a number of carefully contrived populations of 10 model neurons that preserved either the diversity of timevarying firing rates or the diversity of direction tuning curves, or that eliminated all diversity (see Materials and Methods). For each population, we then performed the same set of information calculations that led to Figure 1C.
For a population of model units that preserved the diversity of firing rate dynamics, r(t), but forced all the neurons to have the same direction tuning width (Fig. 2B, filled circles), the amount of extra information from words was the same as that for the draws from the experimentally observed spike trains of MT neurons (open circles). If we contrived each unit to have the same timevarying trajectory of firing rate r(t), but retained the diversity of directional tuning bandwidths, then about half of the extra information from words was lost (Fig. 2B, open triangles). Thus, the diversity of firing rate dynamics, and not diversity of tuning curve bandwidth, accounts for much of the information advantage of words over counts. The extra information that remains reflects the fact that differences in the response amplitude at the peak of the tuning curve across neurons (Maunsell and van Essen, 1983) impose different timeaveraged absolute firing rates across the population, even if the shape of the trajectory of the trialaveraged firing rate r(t) was the same for each model neuron. We note that similar results on this latter point were obtained by Shamir and Sompolinsky (2006), examining the effects of simulated heterogeneities in static tuning on population codes. Finally, if we created populations of fully redundant model units with one uniform trajectory, r(t), and the same direction tuning curves, then the extra information from words was lost, as expected (Fig. 2B, filled triangles). Analysis of information as a function of time revealed that the extra information in the combinatorial code was concentrated near the time of the onset transients of the neural response, where the diversity of response dynamics is greatest (data not shown).
What does the extra information tell us?
The results of the previous section tell us how much information the patterns of spiking and silence can convey about the stimulus. The next step is to understand what these patterns are telling us about the stimulus. To focus our attention on a manageable set of patterns, we first computed the information carried by words and counts for different total spike counts in populations of N = 2…16 cells, drawing 100 populations for each condition from our 36 cortical neurons. Figure 3A shows that, for these population sizes, most of the extra information carried by words versus counts comes from words with relatively few spikes (i.e., from analysis windows when only a few of the neurons in the population emitted spikes and the rest were silent). Furthermore, most words had zero, one, or two spikes, and increasing numbers of spikes were progressively less common (Fig. 3B). Combining these two effects shows that the dominant term in the extra information provided by words typically comes from instances when only one neuron fired a spike. Even when the size of the population was increased to 16, most of the extra information still arose from words of only one or a few spikes (Fig. 3C). To understand what features of the stimulus are represented by different binary words, we therefore focused on words with only one spike.
Our next step was to construct response conditional ensembles (de Ruyter van Steveninck and Bialek, 1988): the distribution of stimuli that were associated with a particular neural response. We can think of these ensembles as “receptive fields” for the population word defined by the occurrence of a particular pattern of spiking and silences in a given time bin across the population. Given that we have just observed a particular word, we ask in what direction the stimulus was moving and when (relative to the time of our observation) it started moving. The distribution of these stimulus parameters is collected across all observations of the particular population response in the entire experiment, and the resulting probability density is displayed as a color map in Figure 4 for a sample of nine neurons. We see in Figure 4A that the occurrence of n = 1 spike in a population of N = 9 neurons is highly ambiguous in terms of the stimulus that elicited it. The red ring shows that there was a wide range of stimulus directions and times from motion onset that had high probabilities for a count of one spike, although the neurons all had very similar direction tuning.
The event that contains one spike from nine neurons is composed of nine possible binary words, from 100000000 through 000000001, in which a single neuron in the group spikes and all others are silent. Figure 4B shows that each binary word points to a different distribution of stimuli and that each word actually represents a quite narrow range of stimulus directions and times from motion onset. Importantly, the binary words go a long way toward resolving the ambiguity between motion direction and motion onset time that is present when we look only at the pooled spike count in Figure 4A, an ambiguity that is not present in a behavior driven by the MT population response, namely smooth pursuit eye movements (Osborne et al., 2005). Notice that if the neurons really were redundant, as one might have thought from their tuning curves, then each of the events would have to point to the same distribution of stimuli, and each word would be associated with the same distribution of stimuli found whenever the spike count across the population was one. However, the small differences in tuning and response dynamics lead to regions of nonoverlap in the wordtriggered receptive fields. If neuron 2 fires, for example, and all the rest are silent, then the most likely stimulus is the one that lives in the small region of nonoverlap between the receptive field of neuron 2 and the other neurons in the group. The stimulus that causes a different neuron in the group to fire and the rest of the group to remain silent could be quite different, creating the diversity of responseconditioned stimuli shown in Figure 4B. A similar plot to Figure 4 could be constructed for any group of cells, implying that the extra information in onespike words versus counting one spike is a general property of our MT population.
Figure 4 demonstrates that different patterns of activity across a population of MT neurons can represent different stimuli, but not necessarily that the combination of spiking in one neuron and silence in all the others is telling us anything that the labeled spike alone does not. We tested the importance of such combinations directly in Figure 5 by constructing a set of response conditional ensembles based on keeping track of the spike from a single cell and progressively discarding knowledge of silence in other cells. Although the combination of spiking in neuron 1 and silence in the rest of the population (100000000) (Fig. 5, top left color map) points to a specific, small area in the space of stimuli, specificity declines in the representation as we throw away the knowledge of silence in more and more cells. Finally, the occurrence of a spike in neuron 1 with no knowledge about the state of the other cells (1********) (Fig. 5, bottom right color map) points to a large area with tens of degrees of uncertainty about motion direction and hundreds of milliseconds of uncertainty about the time of motion onset. In the example illustrated in Figure 5, it is striking that the most uncertain large blob in the bottom right panel has almost no overlap with the original distribution of stimuli conditional on spiking in neuron 1 and silence in the others: combinations of spikes and silence not only carry more information than spike counts alone, but they also stand for very different events in the sensory input.
Returning to the quantity of information carried by combinations of spikes and silences, we found that a significant fraction of these combinations were synergistic: the information carried by the pattern of spikes and silences was larger than expected by summing the information carried by spikes and silences from the individual cells, as observed previously in the retina (Schneidman et al., 2006b). These synergistic words are the true hallmark of a combinatorial code: they cannot be read out by considering each component spike or silence independently. Whereas words generated from our sample could not code synergistically, on average, because the neurons are independent, we found that a substantial portion of the most commonly observed patterns of neural activity coded synergistically. For 10cell groups, ∼30% of all onespike words had significant spikesilence synergy, defined as I_{synergy} = I(word) − Σ_{i}I(n_{i}), where the n_{i} are the individual spikes and silences from each cell, labeled by i, that make up the word (see the Materials and Methods for details). The prevalence of synergy increased with N: >60% of 16cell, onespike words were synergistic (I_{synergy} > 0), suggesting that combinatorial coding is even more significant in larger cortical populations. For yet larger population sizes, we also find that many words with two spikes, for example, are synergistic, reinforcing the potential importance of combinatorial coding in large populations.
Effect of neuron–neuron correlations
So far, our discussion of the population responses in MT has assumed that the cells respond independently to sensory inputs, meaning that correlations between neurons are purely stimulus induced. We have ignored correlations generated by the brain itself, not only for simplicity but also to give the classical model of averaging over multiple redundant cells the greatest chance to succeed. We found that the diversity of temporal dynamics in neuron responses opens the possibility for a form of combinatorial coding, even among independent neurons. We now ask whether trialbytrial correlations among neural responses alter the potential utility of a combinatorial code.
As explained in Materials and Methods, we constructed correlated populations by fixing the mean strength of the pairwise correlations among cells, but otherwise we left the responses as random as possible, so as not to build in any structure that might artificially enhance the opportunities for combinatorial coding. As before, the timedependent firing rates of the individual cells were matched to what we observed experimentally. We then computed the statistics of the model population responses with different levels of mean correlation and examined the information content of these responses, as for the uncorrelated populations above.
In Figure 6, we illustrate the impact of correlations on the information encoded by populations of N = 10 neurons. As expected from previous work (Johnson, 1980; Britten et al., 1992; Seung and Sompolinsky, 1993; Abbott, 1994; Zohary et al., 1994; Shadlen et al., 1996; Abbott and Dayan, 1999), the information available from counting spikes is reduced when we add positive correlations among cells because it increases the trialbytrial variance of the total spike count. In contrast, negative correlations reduce the count variance and enhance information transmission. For coding based on patterns of spiking and silence across the population, small positive correlations also cause a slight drop in information that reverses as correlations become stronger, increasing the advantage of the combinatorial code over the spike count code at high levels of correlation. Across correlation levels, the extra information from a code based on words versus counts is greater than or approximately equal to that found in the independent population. Thus, the opportunities for combinatorial coding are robust across a wide range of correlation strengths, including those observed experimentally, which are usually in the range of 0.1–0.2 (van Kan et al., 1985; Gawne and Richmond, 1993; Zohary et al., 1994; Lee et al., 1998; Bair et al., 2001; Averbeck and Lee, 2003; Kohn and Smith, 2005). We conclude that combinatorial codes can exist without exotic correlations among neurons and that they are not disrupted by the modest levels of correlation consistent with available data.
Discussion
In real life, our appreciation and use of sensory information depends on interpreting individual responses of many neurons, on a time scale that makes sense in relation to behavior. With the natural situation in mind, we have focused on the representation of visual motion by single trial responses across a population of neurons in cortical area MT, guided by the role that these responses play in driving smooth pursuit eye movements. Pursuit behavior has a short latency (∼100 ms) (Lisberger and Westbrook, 1985), reflects integration of motion signals over even shorter times scales (∼25 ms) (Osborne and Lisberger, 2007), and has a still higher temporal precision (<10 ms) in relation to the visual input (Osborne et al., 2005, 2007). These behavioral constraints suggest that we should look at “the neural population response” not as an integral over hundreds of milliseconds, but rather in small windows of time, on the order of ≤10 ms. In such small windows, it is rare for neurons in MT to generate more than one spike: the response of an entire population of cells thus consists of a binary vector describing as 1s and 0s the pattern of spikes and silences across the members of the population. The central question about the structure of the code, then, is whether the particular combinations of 1s and 0s actually matter, or whether maximum information is available from the pooled spike counts in subpopulations of redundant cells.
The phrase “combinatorial coding” has acquired different meanings in different contexts. In recent discussions of the olfactory system, for example, the fact that single odors excite many neurons that express different receptors is taken as evidence that odors are represented by a combinatorial code (Malnic et al., 1999). In contrast, many discussions of combinatorial coding in cortex have focused on the question of whether pairs of neurons have correlations with strength or structure that depend on the particular stimulus being encoded (Aertsen et al., 1989). We believe that there is a widely shared intuition that, in a combinatorial code, the whole message carries information that is not simply conveyed by the sum of its parts. For example, the combination of letters “th” in written English has a meaning that is not decomposable into the separate meanings of “t” and “h.” The challenge is to find a quantitative characterization of neural responses that embodies this intuition.
A minimal quantitative requirement for combinatorial coding is that the combinations of spikes and silences should carry more information than is available simply by counting the spikes. Even in the case of a population of cells with very similar feature selectivity and independent responses, we find that this is true: binary words of spiking and silence across the population contain twice as much information about the sensory input as does the pooled spike count (Fig. 1). Because we did not record from our neurons simultaneously, our results cannot arise from the correlationbased mechanism proposed by Aertsen et al. (1989), for example. Rather, we find that extra information arises from differences in response dynamics among our sample of similarly tuned cortical neurons.
It is possible to view the combinatorial code as a “labeled line” code for response dynamics in the same sense that neurons can be labeled with the preferred direction or speed to estimate those properties of the stimulus (Churchland and Lisberger, 2001). If the diversity of dynamics that we observe were to have a simple form (such as a class of cells with strongly transient responses and another class with more sustained responses), it might make sense to say that the catalog of features should be expanded to include these dynamics, perhaps generating labeled lines for constant velocities versus acceleration, which is an old idea. But the diversity that we find is not so easily categorized, and many of the subtle differences in dynamics do not correspond in any obvious way to different preferred stimuli. Even if we could meaningfully label each individual neuron in this way, the different labeled lines would have to interact at any locus that is extracting information from the code. Therefore, it is perhaps more appropriate at this stage to focus on the binary words themselves rather than the identity of the neurons that participate in the pattern of activity. A more stringent test for combinatorial coding, then, is to look at the interactions between the elements of individual words rather than just average quantities. For example, “th” may be a prototypical example of combinatorial coding, but “qu” is almost surely redundant; averages over all pairs of letters would obscure these differences. Correspondingly, in our population of neurons, we find that different words with the same spike count stand for qualitatively different sensory stimuli (Fig. 4) and that the information carried by these words cannot be simply decomposed into contributions from spiking and silence in the individual cells (Fig. 5).
It might seem paradoxical that neurons responding independently to their sensory inputs can embody a combinatorial code. Indeed, if neurons are independent in this sense, then, on average, a population of cells must convey less information about the stimulus than expected by summing the information carried by each cell individually (Brenner et al., 2000). However, as with the examples of “th” and “qu,” this does not mean that all combinations of spiking and silence will be redundant. Indeed, in populations of 10 cells, we find that approximately onethird of all words composed from one spike and nine silences are synergistic rather than redundant, even when the neurons are conditionally independent, and that the synergistic symbols point to qualitatively different stimulus features. Importantly, from a biological perspective, the different stimuli associated with different population words serve to disambiguate stimulus identity (motion direction) from stimulus timing, and thus could play a key role in driving precise motor behaviors.
The source of the combinatorial code that we observe is not mysterious, but rather results from the known diversity of responses of MT neurons. The extra information provided by patterns of spiking and silence arises from the diversity of dynamic response properties across neurons that otherwise have very similar tuning curves for the direction of motion. The variation of response dynamics has been characterized well by previous studies (Lisberger and Movshon, 1999; Schlack et al., 2007), but discussion of the impact of response dynamics has been restricted to questions of how to extract higher derivatives of target motion from the population response. Our results suggest that celltocell temporal response diversity should be considered in a broader schema for neural coding, because the full distribution of variations in the temporal response dynamics allows the population of neurons to provide substantially more information about the sensory input.
Our findings on the potential of a combinatorial code extend the results of Reich et al. (2001), obtained from samples of up to six simultaneously recorded neurons in the primary visual cortex (V1). Similarly, Schneidman et al. (2006b) found that patterns of spiking and silence across simultaneously recorded groups of up to four retinal ganglion cells carry more information than the sum of the contributions from the individual neurons. In our analysis, we have been able to show that, as one might expect, the extra information in a combinatorial code grows with the size of the neural population. Furthermore, we have been able to show that this advantage exists even when all cells in the population have exactly the same normalized tuning curves. Closely related ideas have been explored in the theoretical work of Shamir and Sompolinsky (2006), albeit not in the limit of binary responses considered here, and focusing mainly on diversity of tuning curve shapes and not on variation in dynamic response properties.
Our approach might seem to be at a disadvantage relative to Reich et al. (2001) and Schneidman et al. (2006b), because we analyzed model populations created by assembling data recorded sequentially from different neurons, whereas they analyzed populations of neurons recorded simultaneously. However, by evaluating the possibility of a combinatorial code for conditionally independent neurons, we have created the least favorable situation for finding positive results. Furthermore, by combining theory and experiment, we are able to dissect the contributions to combinatorial coding in much larger populations and to demonstrate the robustness of the essential results to plausible levels of correlation among the neural responses.
To make use of a combinatorial code, downstream neurons must combine inputs from different neurons in a way that distinguishes the different combinations of spiking and silence that we found to stand for different sensory stimuli. The study of neural sensitivity to patterns of inputs can be traced to now classical work in invertebrate systems (Segundo et al., 1963; Jacobs and Miller, 1985). More recently, it has been emphasized that simple circuits in which signals are processed along converging inhibitory and excitatory paths can evaluate the logical functions required for cells to distinguish among combinatorial code words (Schneidman et al., 2006b). Nonlinear interactions among signals converging onto the same dendrite (Polsky et al., 2004) provide additional possibilities for biologically plausible combinatorial decoding schemes in cortex.
We have presented our analysis in the concrete context provided by the coding of visual motion in area MT of the primate cortex, but the results should apply more generally. The magnitude of the extra information carried in a combinatorial code depends on the details of the neural population we are considering, but we emphasize that there is nothing extreme about the population of cells we have analyzed here. Combinatorial coding does not depend on the existence of unusual structures in the spike trains, either of single neurons or of populations. Rather, the possibility of combinatorial coding is a natural consequence of well known dynamic response properties of neurons throughout the cortex.
Footnotes

This work was supported by National Institutes of Health Grants EY017210 and EY03878, the Howard Hughes Medical Institute, the Life Sciences Research Foundation, and the Swartz Foundation. We thank Stefanie Tokiyama and Karen McLeod for assistance with animal monitoring and maintenance, Scott Ruffner for software development, and Nicholas Priebe and Carlos Cassanelo for participating in the original experiments.
 Correspondence should be addressed to Stephanie E. Palmer, Department of Physics, Princeton University, LewisSigler Institute for Integrative Genomics, Carl Icahn Laboratory, Room 238, Princeton, NJ 08544. sepalmer{at}princeton.edu