Abstract
Current understanding of many neural circuits is limited by our ability to explore the vast number of potential interactions between different cells. We present a new approach that dramatically reduces the complexity of this problem. Largescale multielectrode recordings were used to measure electrical activity in nearly complete, regularly spaced mosaics of several hundred ON and OFF parasol retinal ganglion cells in macaque monkey retina. Parasol cells exhibited substantial pairwise correlations, as has been observed in other species, indicating functional connectivity. However, pairwise measurements alone are insufficient to determine the prevalence of multineuron firing patterns, which would be predicted from widely diverging common inputs and have been hypothesized to convey distinct visual messages to the brain. The number of possible multineuron firing patterns is far too large to study exhaustively, but this problem may be circumvented if two simple rules of connectivity can be established: (1) multicell firing patterns arise from multiple pairwise interactions, and (2) interactions are limited to adjacent cells in the mosaic. Using maximum entropy methods from statistical mechanics, we show that pairwise and adjacent interactions accurately accounted for the structure and prevalence of multineuron firing patterns, explaining ∼98% of the departures from statistical independence in parasol cells and ∼99% of the departures that were reproducible in repeated measurements. This approach provides a way to define limits on the complexity of network interactions and thus may be relevant for probing the function of many neural circuits.
 vision
 information theory
 correlated variability
 neural coding
 synchrony
 retinal ganglion cell
Introduction
A central challenge in neuroscience is to understand how large circuits of neurons represent and process information. For decades, studies of neural function were restricted to recordings from single neurons, with the tacit assumption that the function of complex circuits could be deciphered with such measurements (Wandell, 1995). However, multineuron recordings have revealed substantial interactions that cannot be observed with singleneuron recording. For example, retinal ganglion cells (RGCs) exhibit strong stimulusindependent correlated activity; the circuits mediating such correlations and the consequences for visual processing are not fully understood (Arnett, 1978; Johnsen and Levine, 1983; Mastronarde, 1983b; Meister et al., 1995; DeVries and Baylor, 1997; Nirenberg et al., 2001; Schneidman et al., 2003a). Such findings suggest interesting possibilities for circuit function but at the same time raise a major concern: will it be necessary to record from all of the cells in a neural circuit, and analyze all possible interactions, to determine how the circuit works? If so, a deep understanding of many neural systems may be out of reach for a long time.
In this context, any simplifying principles that can make the problem more tractable are of great value. To illustrate this, consider the number of possible circuits terminating on a collection of n neurons. The number of distinct circuits is determined by the number of distinct input patterns that contribute to the circuit. Examples of distinct input patterns are illustrated schematically for n = 8 cells in Figure 1A. Even with this small value of n, the entire collection of distinct patterns is too large to depict easily; thus, only selected examples are shown. In general, the number of possible input patterns is ∼2^{n}, a prohibitive complexity: for the collection of several hundred RGCs depicted in Figure 2A, 2^{n} exceeds the number of stars in the known universe (Liske et al., 2003). However, two simple constraints on connectivity can dramatically simplify the problem. The first is “pairwise” connectivity, in which each input pattern contacts only two cells. All pairwise patterns may be depicted together in a single diagram (see Fig. 1B). The second is “adjacent” connectivity, in which each pattern contacts all cells within its extent. This constraint also restricts the possibilities to a set that can be depicted easily (see Fig. 1C). The pairwise and adjacent constraints each reduce the number of possible patterns to ∼n^{2}, a huge simplification. With both constraints, the number of possible patterns is reduced to ∼n (see Fig. 1D). The practical consequences of this simplification are profound: in principle, one can understand the function of the entire circuit simply by recording from individual cells and pairs of neighboring cells.
The importance of these simplifying principles is illustrated in the retina. Recent work suggests that synchronized firing among RGCs could originate in common input to multiple RGCs, forming a multiplexed neural code in which the large and distinctive electrical “footprint” of a presynaptic cell conveys a specific visual message to the brain (Meister et al., 1995; Schnitzer and Meister, 2003). Other studies have also suggested interactions in RGC light responses over large spatial scales (McIlwain, 1964; Olveczky et al., 2003). From such observations, a picture of retinal connectivity emerges in which all possible interactions between large numbers of cells must be probed before the function of the circuit can be understood (see Fig. 1A). Conversely, several lines of evidence indicate that synchronized firing can originate from pairwise, adjacent interactions: gap junction coupling between adjacent RGCs, electrical coupling of neighboring RGCs via an intermediate amacrine cell, or common synaptic inputs from bipolar or amacrine cells to neighboring RGCs (Mastronarde, 1983b; Dacey and Brace, 1992; Jacoby et al., 1996; Brivanlou et al., 1998; Hu and Bloomfield, 2003; Hidaka et al., 2004; Schubert et al., 2005; Volgyi et al., 2005). Thus, interactions among multiple RGCs over large spatial scales may simply reflect the combined effect of pairwise, adjacent interactions (see Fig. 1D), which can be characterized using readily available experimental methods.
In this study, we test whether multineuron firing patterns are consistent with purely pairwise–adjacent connectivity or instead imply more complex circuitry. Until recently, two major challenges have precluded such an investigation. First, one must be able to record simultaneously from all cells in a circuit over a substantial area (Segev et al., 2004; Frechette et al., 2005). Second, one must have a framework within which to assess the significance of higherorder interactions among the neurons in a circuit (Amari, 2001; Schneidman et al., 2003b, 2006). We approach these issues with a combination of novel techniques. We apply new 512electrode electrophysiological recording (Litke et al., 2004; Frechette et al., 2005) to the primate retina, which contains 1–2 dozen distinct RGC types that convey complete, parallel images of the visual scene to distinct targets in the brain and which closely resembles the human retina (Rodieck, 1998). In these recordings, distinct cell types such as the ON and OFF parasol cells are easily identified (see Fig. 2A) (Polyak, 1941; Watanabe and Rodieck, 1989). The regular mosaic arrangement of these cells in the retina (Peichl and Wassle, 1981; Wassle et al., 1983; DeVries and Baylor, 1997), combined with recordings that sample almost all of the cells of each type, make possible a direct test of pairwise and adjacent connectivity. We perform this test by examining the spatial patterns of electrical activity in different cells using the maximum entropy framework borrowed from statistical mechanics (Jaynes, 1957a,b), as described previously (Martignon et al., 2000; Amari, 2001; Schneidman et al., 2003b, 2006). The results indicate that patterns of electrical activity in groups of parasol cells may be understood almost entirely based on pairwise interactions, restricted to adjacent cells in the mosaic. This finding substantially simplifies our understanding of the retinal circuit.
Materials and Methods
Recordings.
Preparation and recording methods were described previously (Chichilnisky and Kalmar, 2002; Litke et al., 2004; Frechette et al., 2005). Briefly, eyes were obtained from deeply and terminally anesthetized macaque monkeys (Macaca mulatta) used by other experimenters in accordance with institutional guidelines for the care and use of animals. Immediately after enucleation, the anterior portion of the eye and vitreous were removed in room light, and the eye cup was placed in a bicarbonatebuffered Ames’ solution (Sigma, St. Louis, MO) and stored in darkness at 32–34°C, pH 7.4, for ≥20 min before dissection. Under infrared illumination, pieces of peripheral retina 3–5 mm in diameter, isolated from the retinal pigment epithelium, were placed flat against a planar array of 512 extracellular microelectrodes, covering an area of 1800 × 900 μm. The present results were obtained from 30–60 m segments of recording. The preparation was perfused with Ames’ solution bubbled with 95% O_{2}, 5% CO_{2} and maintained at 32–34° C, pH 7.4.
Spike sorting.
The voltage on each electrode was digitized at 20 kHz and stored for offline analysis. Details of recording methods and spike sorting have been given previously (Litke et al., 2004). Briefly, spikes were identified using a threshold of three times the voltage SD. For each spike, the waveform of the spike and the simultaneous waveforms on six adjacent electrodes were extracted. Three to five waveform features were identified using principal component analysis. A mixtureofGaussians model was fit to the distribution of features using expectation maximization (Duda et al., 2001). The number of clusters and initial conditions for the model were determined automatically using an adapted watershed transformation (Castleman, 1996; Roerdink and Meijster, 2001). All clusters were visually inspected, and, when necessary, a mixtureofGaussians model was instead fit using manually selected initial conditions. Clusters with a large number of refractory period violations (>10% estimated contamination) or spike rates below 1 Hz were excluded from additional analysis.
Stimulation and receptive field analysis.
An optically reduced stimulus from a gammacorrected cathode ray tube computer display refreshing at 120 Hz was focused on the photoreceptor outer segments. The low photopic intensity was controlled by neutral density filters in the light path. The mean photon absorption rate for the long (middle, short) wavelengthsensitive cones was approximately equal to the rate that would have been caused by a spatially uniform monochromatic light of wavelength 561 (530, 430) nm and intensity 9200 (8700, 7100) photons/μm^{2}/s incident on the photoreceptors. For the collection of parasol cells shown in Figure 2A, the mean firing rate during exposure to a steady, spatially uniform display at this light level was 10.7 ± 3.3 Hz for ON cells and 17.1 ± 3.5 Hz for OFF cells.
Spatiotemporal receptive fields were measured using a dynamic checkerboard (white noise) stimulus in which the intensity of each display phosphor was selected randomly and independently over space and time from a binary distribution. Root mean square stimulus contrast was 96%, and stimulus duration was 30 ms. The pixel size (60 μm) was selected to accurately capture the spatial structure of parasol cell receptive fields. For each RGC, the spiketriggered average stimulus was computed; this summarizes how the cell integrates visual inputs over space and time (Marmarelis and Naka, 1972; Chichilnisky, 2001). An elliptical twodimensional Gaussian function was fit to the spatial profile; outlines in Figure 2A represent 1 SD boundary of these fits.
To analyze adjacent interactions, the mosaic structure of receptive fields was exploited. The modal distance between cells, D, was computed by examining a histogram obtained from all cell pairs by dividing the separation between cells in the pair by the mean 1 SD diameter of the cells and identifying the major mode near 0. For the ON and OFF parasol cells, this value was 2 (on average, 135 μm) and 2.4 (on average, 113 μm), respectively. Neighboring cells were identified as those for which the normalized distance was less than D(1 + )/2, a value halfway between the nearest neighbor and nextnearest neighbor separation in a perfect triangular mosaic with elementary spacing D.
In testing the model restricted to pairwise–adjacent interactions (Fig. 1), note that the firing of two cells separated by an intermediate cell would be expected to reflect pairwise interactions via the intermediate cell. If the latter cell is excluded from analysis (that is, there is a “gap” in the group), the analysis will tend to reject the pairwise–adjacent model. This effect can also occur in a path of pairwise interactions through multiple intermediate cells; thus, it cannot be entirely eliminated without recording from the entire retina. To minimize this effect, groups of cells covering an approximately convex area of retina were selected for analysis. In the selection algorithm, a group of size N was generated by starting with a single cell and, on each iteration, adding a cell randomly selected from the collection of all cells having the maximum number of neighbors in the existing group, until the group had a total of N cells. To avoid problems in sparsely sampled regions of the mosaic, groups were rejected if they did not conform to two criteria. First, every cell in the group must have at least one neighbor in the group. Second, for every pair of adjacent cells in the group, there must be at least one other cell in the group adjacent to both cells of the pair.
Maximum entropy.
Maximum entropy methods are used in statistical inference to identify an unknown distribution given several constraints that are insufficient to fully specify the answer. A parsimonious unique solution is to select the distribution with the greatest entropy consistent with the constraints (Jaynes, 1957a,b; Cover and Thomas, 1991). The entropy indicates the average number of bits required to transmit the identity of samples from a distribution (Cover and Thomas, 1991). Mathematically, maximizing the entropy is equivalent to selecting the maximum likelihood distribution consistent with the observed data (Berger et al., 1996).
Specifically, suppose the unknown distribution is P(x), where x is a vector. Assume that N constraints on the distribution are known:
where E[·] is the expectation over the unknown distribution, and f_{i}(·) is an arbitrary function of x. The maximum entropy solution is the distribution that maximizes entropy subject to the observed constraints: where H[P] = −Σ_{x}P(x)log_{2}P(x) is the entropy of a distribution P(x), and λ_{i} are Lagrange multipliers. The second derivative matrix (or Hessian) of this equation is negative definite for all x. Thus, no local maxima exist and the unique global maximum can be found with any constrained gradient ascent optimization technique (Darroch and Ratcliff, 1972; Press et al., 1988; Berger et al., 1996; Martignon et al., 2000; Malouf, 2002). The functional form of the maximum entropy solution with pairwise constraints is an exponential distribution, which matches the Ising model in statistical mechanics (Hertz et al., 1991) and the Hopfield model of neural networks (Hopfield, 1982). The multipliers {λ_{i}} measure the magnitude and sign of interactions between pairs of neurons; in the present work, the estimated multipliers were universally positive (data not shown).
For the present analysis, spike trains were binned at a resolution of 10 ms; bins with 2 or more spikes (<1%) were replaced with 1. The distribution P(x) is over all binary words x expressed by M cells; this is a 2^{M} element vector (see Fig. 5). For example, for M = 3 cells, P(x) is the probability distribution over the words {(0, 0, 0), (0, 0, 1), …, (1, 1, 1)}. The constraint is a linear function specifying which distribution values to sum to generate the observed marginal distributions, e.g., P(1, 1) = P(1, 1, 1) + P(1, 1, 0). In matrix form, all constraints can be expressed as a N × 2^{M} sparse matrix, with values of 1 selecting which elements of the joint distribution sum to form the observed marginals. The exponential growth of the number of words with the number of cells analyzed necessitates efficient algorithms to estimate the maximum entropy distribution (Berger et al., 1996; Malouf, 2002) and effectively limits the number of cells that can be analyzed simultaneously. Any constrained gradient ascent method can be used to find P̂(x), although in practice, specialized algorithms exploiting Jensen’s inequality can be faster (Berger et al., 1996). The search terminated when the fractional change in entropy was smaller than 10^{−8}.
Likelihood analysis.
Models were compared using likelihood analysis (see Fig. 6). The likelihood of a dataset under a given model is the probability of having observed that dataset under the assumption that the model is correct. In general, a more accurate model results in higher likelihood. In the present analysis, given a model probability distribution P over a set of firing patterns {f_{i}} and given corresponding observed counts {c_{i}} of each of these patterns, the log likelihood of the observed counts is as follows: where the leading term is a normalization to account for the fact that a given set of counts can arise from multiple different time series. Note that, in general, P(f_{i}) < 1; hence, longer data collection results in lower likelihood. Therefore, Figure 6 shows the average log likelihood, L̄ = L/Σ_{i} c_{i}. Using Stirling’s approximation for the leading term in Equation 3, it is easily shown that, for large datasets, −L̄ is an estimate of the Kullback Leibler (KL) divergence D between the observed probability distribution P_{obs}, whose entries are {c_{i}/(Σ_{i} c_{i})} and the model probability distribution {P(f_{i})} (Cover and Thomas, 1991, their section 12.1). Using terminology from recent theoretical work, the index (D_{ind} − D_{pair})/D_{ind} used to quantify the accuracy of the pairwise model is a crossvalidated version of the fraction of the “multiinformation” captured by the pairwise (or secondorder) “connected information” (Schneidman et al., 2003b).
Estimating the KL divergence in finite data are nontrivial and subject to severe biases, especially in highdimensional data (e.g., firing pattern distributions from large numbers of cells or over time). This estimation problem is intimately related to estimating the entropy of a distribution, as well as applications in universal source coding, and is the subject of intense investigation (Nemenman et al., 2002; Paninski, 2003; Orlitsky et al., 2004). In simulation (data not shown), the bias of KL estimates was not significant for firing pattern distributions obtained from M ≤ 7 cells, using standard techniques (Krichevsky and Trofimov, 1981). In the most extreme case, the mean occupancy of each bin in the firing pattern distribution was on the order of 100–1000, qualitatively indicating an estimate with minimal bias.
Results
Action potentials were recorded from RGCs in isolated macaque monkey retina perfused with physiological saline solution, using an array of 512 electrodes (Litke et al., 2004; Frechette et al., 2005). The receptive field of each RGC was identified using reverse correlation with a whitenoise stimulus. Cells were segregated into distinct classes according to their receptive field characteristics; ON and OFF parasol cells were identified by their size and response kinetics (Chichilnisky and Kalmar, 2002). The receptive fields of all ON and OFF parasol cells in one recording are shown in Figure 2A.
These form an orderly mosaic that tiles visual space with minimal overlap, as expected from previous work in primates and other species (Wassle et al., 1983; Dacey, 1993; DeVries and Baylor, 1997). The completeness of the mosaics indicates that the recordings sampled most of the parasol cells in the 4 × 8° region of retina.
Responses of nearby RGCs are not statistically independent
To quantify functional connectivity in the retinal circuit, spontaneous activity of ON and OFF parasol cells was recorded in the presence of steady, spatially uniform, photopic illumination. Interactions between pairs of cells were probed by examining the crosscorrelation in spiking activity, that is, the probability that one cell fired as a function of time relative to the time of a spike in the second cell. If the two cells fired independently, the crosscorrelation would be flat. Instead, nearby cells of the same type (ON or OFF) exhibited a strong tendency to fire nearly synchronously, as shown by the pronounced peak at the origin with a width of ∼10 ms (Fig. 2B). This synchronized firing in the absence of timevarying stimulation (also known as “noise correlations”) resembles observations in other species and indicates functional connectivity attributable to common inputs and/or reciprocal connections in the retina (Arnett, 1978; Johnsen and Levine, 1983; Mastronarde, 1983b; Meister et al., 1995; DeVries and Baylor, 1997; Brivanlou et al., 1998; Hidaka et al., 2004). Anticorrelation between ON and OFF parasol cells was also observed (data not shown) (Mastronarde, 1983a), but subsequent analysis will be restricted to cells of the same type.
Pairwise synchrony was restricted to nearby cells. To quantify this tendency, spike trains from pairs of cells were binned at a resolution of 10 ms, yielding binary spike counts A and B as a function of time. The probability of synchronous firing, P(A = 1, B = 1), was measured, where 1 denotes the occurrence of a spike. This probability was expressed relative to the probability expected from statistically independent firing using a “synchrony index:” The index is shown as a function of separation between pairs of cells in Figure 2C. Cells separated by less than several hundred micrometers, corresponding to approximately three times the average cell–cell separation in the mosaic, fired synchronously at rates well above chance (S = 0). Cells that were more widely separated did not. In summary, pairs of parasol RGCs in primate retina are functionally connected, and this connectivity is universal among cells of like type and spatially localized (Masronarde, 1983b).
The above observations provide significant constraints on functional connectivity between RGCs. However, the connectivity of the circuit cannot be understood with these measurements alone. First, the fact that pairwise synchrony extends over a distance of several cells in the mosaic (Fig. 2C) does not require that direct connections reach this far. Instead, synchrony might be mediated by pairwise interactions, for example, gap junction electrical coupling, propagated through several intermediate cells (Dacey and Brace, 1992; Jacoby et al., 1996; Hu and Bloomfield, 2003; Hidaka et al., 2004; Schubert et al., 2005; Volgyi et al., 2005). Second, the observed pairwise synchrony provides no information about higherorder interactions. To illustrate the distinction between pairwise and higherorder synchrony, consider two hypothetical mechanisms: (1) common input from a spiking amacrine cell to more than two RGCs (Meister et al., 1995; Schnitzer and Meister, 2003), and (2) direct gap junction electrical coupling between pairs of RGCs (note that these candidate mechanisms are not exhaustive but merely serve as examples). These mechanisms can produce very similar crosscorrelations in pairs of cells but very different concerted firing patterns in multiple cells. Specifically, in case 1, a single spike in an amacrine cell could induce a spike in most or all of its RGC targets simultaneously. However, in case 2, the probability of the entire collection of RGCs firing synchronously is relatively low. The distinction is fundamental: the mechanism of case 1 has been suggested as a way to convey distinctive, multiplexed visual messages to the brain (Meister et al., 1995; Schnitzer and Meister, 2003). This simple example illustrates that pairwise synchrony provides only a limited picture of circuit connectivity. Thus, probing the full extent of neural interactions requires measuring patterns of firing in many cells simultaneously.
Triplet synchrony is explained by pairwise interactions
Consider three adjacent ON parasol cells labeled A, B, and C in Figure 2A. Each cell pair exhibits strong synchronized firing (Fig. 3A). To characterize the joint activity of the three cells requires extending the notion of crosscorrelation to three dimensions. The left plot in Figure 3B shows the probability of spikes in all three cells as a function of time shifts ΔT between cells A and B and between cells A and C. The three extended ridges represent the preponderance of synchronous firing in each of the cell pairs, regardless of the activity of the third cell, similar to Figure 3A (for an alternative representation, see Perkel et al., 1975). However, the prominent peak at the origin, with a width of ∼10 ms, indicates that all three cells often fired synchronously. The key question is whether this multicell firing occurs at the rate expected from the known pairwise correlation (e.g., mechanistic hypothesis 2 above) or, alternatively, whether more complex patterns of interaction are required to explain triplet firing (e.g., mechanistic hypothesis 1 above).
To distinguish these possibilities requires a null hypothesis: the probability of all three cells firing in the same time bin (Δt = 10 ms), given only the pairwise interactions. This is analogous to the twocell case, in which the null hypothesis was statistical independence and a synchrony index much greater than 0 revealed a departure from the null hypothesis. However, in the case of three or more cells, the null hypothesis is not as simple, because the statistical independence hypothesis, does not account for known pairwise correlations. Ideally, one would compare the observed data with a null hypothesis that assumes nothing beyond pairwise correlations.
An approach borrowed from statistical mechanics, known as “maximum entropy,” solves the problem uniquely (Jaynes, 1957a,b; Berger et al., 1996; Martignon et al., 2000; Amari, 2001; Scheidman, 2003b) (see Materials and Methods). Intuitively, the approach is to pose a null hypothesis that assumes no prevalence of structure in firing patterns above and beyond the observed pairwise interactions. Specifically, denote the joint distribution of responses of the three cells by P(A,B,C), the pairwise joint distributions by P(A,B), P(B,C), and P(A,C), and the singlecell firing rates by P(A), P(B), and P(C). Our null model, P_{null}(A, B, C), must satisfy the singlecell and pairwise constraints. This problem is underconstrained because many different models for the joint activity are consistent with these marginal constraints. The solution is to compute a joint distribution, P_{pair}(A, B, C), that has the maximum possible entropy (i.e., least structure) given that it satisfies the marginal constraints. P_{pair}(A, B, C) is a complete null hypothesis for the observed firing patterns. Note that, in the twocell case, maximum entropy is equivalent to statistical independence: P_{null}(A, B) = P_{ind}(A, B). The right panel in Figure 3B shows the maximum entropy pairwise prediction for all three cells firing at various time offsets. This model accounts for the major features of the threedimensional crosscorrelation, qualitatively consistent with the idea that pairwise interactions alone can explain the temporal structure of synchronized firing in these three cells.
To test the maximum entropy pairwise prediction quantitatively, analysis was restricted to patterns of firing in the same time bin for all three cells (e.g., central peaks in Fig. 3B). Although this analysis does not account for the possibility of complex temporal interactions, it captures the spatial patterns of synchronous activity that dominate both the pairwise and triplet crosscorrelograms. A “pattern index” was computed that generalizes the synchrony index (Eq. 4) to all spike patterns in three cells: In this expression, P_{obs}(A, B, C) denotes the observed probability of a particular threecell firing pattern. P_{null} represents a null hypothesis for the prevalence of different firing patterns. The index Q indicates how frequently a particular firing pattern was observed, relative to the prediction of the null model P_{null}. Note that, in the reduced case of two cells firing at the same time (A = B = 1) and a null model of statistical independence (P_{null} = P_{ind}), Q is equivalent to the index S in Equation 4.
First, the degree of triplet synchronized firing was established by examining deviations from statistical independence. For this analysis, P_{null} = P_{ind}. The black symbols in the left panel of Figure 4A show the value of Q computed for the firing pattern (1, 1, 1) in many cell triplets randomly sampled from the mosaic of ON parasol RGCs shown in Figure 2A. The index is plotted as a function of the geometric mean of the distances between each pair of cells in the triplet. As with the pairwise synchrony index (Fig. 2C), Q is large for nearby triplets and approaches 0 for widely spaced triplets. Specifically, triplets of parasol cells fired synchronously up to 2^{Q} ≈ 10 times more frequently than expected from statistical independence. Note that triplets with a large mean distance can sometimes exhibit departures from independence if two of the cells in the triplet are near one another. Similar results are shown for OFF parasol cells in the right panel.
Second, to assess whether pairwise interactions alone account for triplet synchrony, the predictions of the maximum entropy model using pairwise constraints were examined. In this case, P_{null} = P_{pair}. The values of Q for this model (red symbols) cluster near the horizontal line at 0, that is, the probability of triplet synchrony is approximately equal to the probability expected from pairwise interactions. Thus, the excess of triplet firing above what is expected from statistical independence is accounted for by pairwise interactions.
All triplet firing patterns are explained by pairwise interactions
To test the maximum entropy pairwise prediction more fully, analysis was performed on all simultaneous firing patterns for triplets of recorded cells. For example, all cells firing synchronously is represented by (1, 1, 1), and no firing is represented by (0, 0, 0). The distribution of all eight possible firing patterns observed in a set of three ON parasol cells is shown in Figure 5A. For comparison, Figure 5C shows the firing pattern distribution expected from statistical independence (Eq. 5), which deviates substantially from the data. Figure 5B shows the maximum entropy pairwise prediction, which captures most of the structure in the data.
To quantify these observations for each firing pattern separately, values of Q were examined for all eight possible triplet firing patterns. In Figure 4B, distinct values of Q for each firing pattern in a triplet of cells are shown with different points. The band of points near the top reproduces the data of Figure 4A, that is, the firing pattern (1, 1, 1). Other patterns occur with greater or lesser frequency than predicted from statistical independence because strongly coupled cells tend to fire together or not at all (Fig. 5A,C) The departure from independence in all triplet firing patterns, evidenced by the large vertical spread of black points, is almost entirely eliminated by taking into account pairwise interactions (red symbols).
To provide a statistical summary of the performance of the pairwise model across all firing patterns, the average likelihood of the observed data, P_{obs}, under different models was computed. The likelihood indicates the probability of having observed the data given a particular model. In general, more (less) accurate models exhibit likelihood values approaching 1 (0). To express the likelihood in units that are invariant with respect to the length of the data sample, the average (geometric mean) likelihood per time bin was computed (see Materials and Methods). The average likelihood of the data in Figure 5A under the independent model (Fig. 5C) was 0.94959, whereas the likelihood under the pairwise model (Fig. 5B) was 0.99944. Although this difference seems modest, the multiplicative accumulation of probability over sequential time bins means that, from 60 s of data collection, one may conclude that the probability of the data having arisen from the pairwise model is >10^{133} times larger than the probability of the data having arisen from the independent model. In other words, the enormous difference between the predictive power of the pairwise and independent models is easily distinguished even with modest data sets.
As a benchmark for the likelihood, an empirical model P_{emp} was obtained from the recorded frequencies of all eight firing patterns in a separate segment of recording from the same retina (the same recording that was used to fit P_{ind} and P_{pair} above). P_{emp} represents an ideal model for the observed data P_{obs} given the intrinsic reproducibility of the measurement. For the data in Figure 5A, the average empirical likelihood was 0.99955. Compared with the ratio of likelihoods above, the ratio of empirical to pairwise model likelihood was smaller than 2 with 60 s of data. Note that the empirical likelihoods of the data are not 1, reflecting a combination of finite counting statistics and possible (small) nonstationarity in the neurophysiological recording. Also, note that the empirical likelihood is an upper bound for the performance of any model; as expected, it exceeds the likelihood associated with both the pairwise and independent models. The pairwise model likelihood is very close to this upper bound, whereas the independent model likelihood is far from it.
The average likelihood was examined for all triplets of cells recorded. The dark blue symbols in Figure 6A show the likelihood of the observed triplet firing patterns assuming statistical independence, P_{ind}, compared with the likelihood obtained with the empirical model, P_{emp}. Again, the likelihood under the independent model is much lower, confirming that the firing of RGCs departs substantially from statistical independence. In contrast, Figure 6B shows that the pairwise model P_{pair} produces likelihood values nearly identical to those produced by the empirical model P_{emp}. Thus, pairwise interactions explain the frequency of triplet firing patterns nearly as accurately as a repeated measurement.
Multicell firing patterns are explained by pairwise interactions
The maximum entropy framework is easily extended to test for interactions in larger groups of cells. For example, for fivecell firing patterns, the observed firing pattern distribution, P_{obs}(A, B, C, D, E), is compared with a maximum entropy null model, P_{pair}(A, B, C, D, E), and a statistically independent model, P_{ind}(A, B, C, D, E). Figure 5D–F shows these three firing pattern distributions for collections of five cells. As in the threecell case, the pairwise model prediction substantially captures the structure of the observed firing pattern distribution, whereas the statistically independent prediction does not.
As above, the pattern index was examined to quantify the quality of the predictions. Figure 4C shows values of Q for the independent and pairwise models, for groups of six cells randomly sampled from the ON and OFF parasol mosaics of Figure 2A. Black symbols reveal large departures from statistical independence: values of Q as high as ∼10 indicate firing patterns occurring ∼1000 more frequently than predicted by chance. Red symbols reveal that these departures are almost entirely accounted for by pairwise interactions. This finding is supported by comparing the likelihood of the observed firing pattern distribution P_{obs} under the models P_{ind} and P_{pair} with the likelihood obtained from the empirical model P_{emp} for groups of four, five, six, and seven cells. [For this analysis and what follows, results from larger groups are not reported because of potential biases attributable to high dimensionality of the data (see Materials and Methods).] In all cases, the large systematic failures of statistical independence (Fig. 6A) are explained by the pairwise model (Fig. 6B).
In summary, pairwise interactions explain almost all of the departures from statistical independence in parasol cell signals, with a precision comparable with the reproducibility of the measurements. This implies that the structure of multineuron firing patterns may be understood with high accuracy based on pairwise connectivity, without postulating more complex interactions. A possible mechanistic interpretation is that frequent multicell firing in the retina does not imply widely diverging common inputs (Schnitzer and Meister, 2003) but instead can arise from reciprocal connections between, or common inputs to, pairs of cells (see Discussion).
Multicell firing patterns are explained by pairwise, adjacent interactions
The next step in simplifying our understanding of the circuit is to test whether pairwise interactions are restricted to immediately adjacent cells or instead must span larger distances. The mosaic arrangement of receptive fields, which corresponds to the physical arrangement of RGCs in the retina (Wassle et al., 1983; DeVries and Baylor, 1997), provides a uniquely useful measure of adjacency (Fig. 2A), and the maximum entropy approach provides a straightforward test of the hypothesis.
Specifically, consider a more restrictive null model for the distribution of firing patterns, P_{adj}. This is the unique maximum entropy distribution subject to the singlecell constraints {P(A), P(B), … } as before and pairwise constraints {P(A, B), P(B, C), … } obtained only from immediately adjacent cells in the mosaic (see Materials and Methods). The pattern index obtained from many cell groups with this model are shown with blue symbols in Figure 4C. To reduce the effect of adjacent interactions present in the retina but missing in the group of cells analyzed, only groups that uniformly cover an approximately convex area of the retina were considered (see Materials and Methods). As a result, the points cover only a small range of distances. Trivial cases in which all pairs are adjacent were excluded. As with the more general pairwise model, the pairwise–adjacent model (turquoise symbols) accounts for almost all of the departures from statistical independence (black symbols). Figure 6C shows that the likelihood of the data P_{obs} under the pairwise–adjacent model P_{adj} is very similar to the likelihood under the empirical model from a repeated measurement P_{emp}. Comparison with Figure 6B shows that the pairwise and pairwise–adjacent model likelihoods are nearly indistinguishable. In summary, pairwise interactions between adjacent RGCs in the mosaic are sufficient to account for almost all multineuron firing patterns.
These findings imply that firing patterns in parasol cells can be understood with high accuracy using the simplest possible connectivity illustrated in Figure 1D, a dramatic reduction in complexity. A possible mechanistic interpretation of this finding is that pairwise synchrony between nonadjacent cells in the mosaic (Fig. 2B) do not necessarily imply longdistance contacts but instead may be explained by propagation via pairwise connections between adjacent cells (e.g., gap junctions). To test the propagation hypothesis, the synchrony index observed in pairs of cells was compared with the index predicted from the pairwise–adjacent model fitted to cell groups of size n = 7 (Fig. 8). Over the entire range of distances between cells, and across the range of synchrony index values observed, the observed and predicted synchrony indices were mostly similar, for both adjacent cell pairs and nonadjacent cell pairs. This is significant because, in nonadjacent cells, the pairwise–adjacent model only predicts synchrony as a consequence of synchrony with intermediate cells. Note, however, that systematic discrepancies are present, particularly for nonadjacent OFF cells, suggestive of subtle departures from the pairwise–adjacent model.
Measuring the accuracy of pairwise and pairwise–adjacent models
To quantify how accurately the pairwise and pairwise–adjacent models explain interactions between RGCs, the average log likelihood value L̄ shown in Figure 6 provides a natural measure. First, larger values of L̄ indicate that the observed data are more consistent with the model. Second, it is easily shown (see Materials and Methods) that the value −L̄ is an estimate of the Kullback Leibler divergence, D, between the observed firing pattern distribution, P_{obs}, and the model distribution. The divergence is an informationtheoretic quantity that measures the inefficiency of storing the observed firing patterns using a compression scheme optimized for the model probability distribution (Cover and Thomas, 1991). Thus, larger divergence values (smaller values of L̄) correspond to a less accurate model.
The divergence of the independent model, D_{ind}, quantifies the departures from statistical independence in the firing of different RGCs. A natural measure of the success of the models is the degree to which they capture these departures from independence. The index (D_{ind} − D_{pair})/D_{ind} expresses the fraction of the departures from independence accounted for by the pairwise model. On average, this value was ∼99% (Table 1). Similarly, the index (D_{ind} − D_{adj})/D_{ind} expresses the fraction of departures from independence accounted for by the pairwise–adjacent model. On average, this value was ∼98%. As a benchmark, the index for the empirical model, (D_{ind} − D_{emp})/D_{ind}, was ∼99%. The latter quantity represents the highest value that can be expected given the reproducibility of the data. The departure from independence accounted for by each model was ∼99% of this benchmark value. In summary, both models almost entirely account for the departures from independence observed in RGC firing.
To test whether these results depend strongly on the timescale of the analysis (10 ms time bins), analysis was repeated with bin sizes twofold larger and smaller. The results in Table 1 indicate that the predictive power of pairwise and pairwise–adjacent models is essentially constant across this range of time scales.
Multicell firing patterns in the presence of visual stimulation are explained by pairwise, adjacent interactions
The data presented so far were obtained with steady spatially uniform illumination of the retina and thus reflect the circuitry that mediates spontaneous synchronized firing in RGCs. However, it is possible that some contributions to synchronized firing, such as diverging inputs from amacrine cells, arise only in the presence of visual stimuli that vary over space and time. To test this possibility, the maximum entropy analysis was applied to data collected in the presence of the whitenoise stimulus used for the receptive field measurement (Fig. 7). This stimulus provides a wide range of spatial and temporal variations in an experiment of reasonable duration. Note that stimuli with a large spatial scale would be expected to introduce higherorder correlations by simultaneously activating multiple RGCs. This would confound the analysis because synchronized firing could be produced by the stimulus, the retinal circuitry, or both (Schneidman et al., 2003a). This problem was avoided by using a stimulus with independently modulating pixels that were small relative to the parasol cell receptive field. In these conditions, the pairwise model captured ∼98% of the departures from independence, the pairwise–adjacent model captured ∼98%, and the empirical model benchmark captured ∼99% (Table 1). Again, the models accounted for ∼99% of the departures from independence that were reproduced by the empirical benchmark. Thus, even in the presence of a dynamic, spatially varying stimulus, pairwise and adjacent models almost entirely account for the departures from independence observed in RGC firing.
Sensitivity of maximum entropy analysis for detecting nonpairwise, nonadjacent circuitry
Pairwise and adjacent connectivity clearly can explain most of the observed interactions between parasol RGCs. However, without direct experimental manipulation, it is impossible to draw firm conclusions about the structure of retinal circuits that underlies this statistical observation. A first step, however, is to ask, how sensitive is the maximum entropy analysis to departures from pairwise and adjacent connectivity? This question was approached by testing the sensitivity of the analysis to artificial perturbations of the data, in two ways.
First, the analysis was applied to artificial data obtained from a purely pairwise (or pairwise–adjacent) model, with varying amounts of common input added to simulate departures from the model. Specifically, a maximum entropy pairwise distribution, P_{pair}, was fitted to the observed data from a group of n = 7 RGCs. A hypothetical common input to all n cells was then simulated, occurring randomly with probability 0 < r < 1 in each time bin, and generating a spike in each RGC with efficacy expressed as a probability p. If (x_{1}, …, x_{n}) is the binary firing pattern for n cells, then define m = Σ_{i}x_{i} to be the total number of spikes in the firing pattern. The common input alone would produce a particular RGC firing pattern containing m spikes with probability: P_{common} is the probability distribution of firing patterns created by the nonpairwise common input. The effect of this common input was simulated by calculating the distribution of firing patterns expected from the logical or combination of spikes from the firing patterns produced by the common input distribution, P_{common}, and the pairwise distribution, P_{pair}. This resulting simulated firing pattern distribution contains structure with systematic departures from pairwise interactions.
To assess the sensitivity of the pairwise model to common input, the degradation in performance of the model was measured as a function of the putative input rate r in the simulation. Specifically, a new pairwise model was fitted to the simulated data, and the fraction of departures from statistical independence accounted for was computed. In the case of r = 0 (no common input), this value was 100%, because the pairwise model computed from the simulated distribution exactly reproduces the original P_{pair}. As Figure 9 demonstrates, the pairwise model is indeed sensitive to common input: this value systematically drops from 100% as the input rate r increases.
To assess the possible degree of common input in the recorded data, the difference between the fraction of departures from independence accounted for by the empirical and pairwise model was measured. This captures the failures of the pairwise model that are not accounted for by the lack of reproducibility in the data. This difference was then subtracted from 100%, and the input rate r that produced an equivalent reduction was identified from the curves in Figure 9. This procedure is diagrammed with dashed lines in Figure 9 and measures the rate of common input that is consistent with the observed data.
With a common input of strength p = 1, an input rate of r = 0.0034 (equivalent to an evoked spike rate rp/Δt in each target RGC of 0.34 Hz) produced a reduction in the fraction of departures of independence accounted for by the pairwise model equal to the value observed in the data (Fig. 9A). The data are not consistent with a common input occurring at a rate higher than r, because r subsumes any discrepancy between model and data. For comparison, the mean firing rate of the recorded ON parasol cells was 10.7 Hz. Similarly, for the pairwise–adjacent model, an input rate of r = 0.0028 (equivalent to an evoked RGC spike rate of 0.28 Hz) produced a value equal to the value observed in the data. As expected, for weaker common input (lower p), higher evoked rates were consistent with the values observed in the data (Fig. 9B,C). This is because lower values of p produce a much smaller proportion of multicell firing patterns. In summary, the data indicate that, at most, a small fraction of the observed spikes in RGCs could be produced by widely diverging common inputs in the retinal circuit.
A second test of the sensitivity of the maximum entropy approach was focused on the importance of adjacent interactions. The performance of the pairwise–adjacent model P_{adj} was compared with the performance of an alternate model. The latter was the maximum entropy distribution subject to the same singlecell constraints {P(A), P(B), … } as P_{adj} but with pairwise constraints {P(A, B), P(B,C), … } obtained from a randomly selected subset of k cell pairs, where k is equal to the true number of adjacent cell pairs in the group. This pairwise–random model uses the same number of constraints as the pairwise–adjacent model but ignores the true spatial layout of recorded cells. The fraction of the departures from statistical independence accounted for by the pairwise–adjacent model, (D_{ind} − D_{adj})/D_{ind}, is compared with the corresponding statistic for the pairwise–random model in Figure 10, for collections of n = 7 ON and OFF parasol cells. The pairwise–random model exhibits substantially reduced capacity to explain the departures from independence in the data, spanning a range of performance of ∼20–95% rather than the values of ∼97–99% exhibited by the pairwise–adjacent model. As expected, a few of the values approach parity with the pairwise–adjacent model: by chance, some of the random samples will coincide with the truly adjacent pairs. These results indicate that the accuracy of the pairwise–adjacent model is not an artifact of limitations in the analysis but instead hinges critically on the true spatial layout of recorded cells, confirming that adjacent interactions are of particular importance in understanding multineuron firing patterns.
Discussion
Our central finding is that multineuron firing patterns in parasol RGCs of primate retina can be explained accurately by purely pairwise interactions restricted to adjacent cells in the mosaic. This is consistent with the simplest possible model in Figure 1 and provides a parsimonious functional description of retinal network activity. A major practical implication for future work is that largescale visual signals conveyed from the primate retina to brain can be understood on the basis of measurements from individual cells and pairs of adjacent cells. However, the limits of this interpretation, and the implications for retinal circuitry, must be approached with caution. Below, we first discuss several implications and then return to consider the caveats.
First, the existence of multineuron synchrony in large collections of RGCs does not imply complex circuitry. Previous work suggested that such synchrony reflects widely diverging input from a presynaptic interneuron, such as an amacrine cell (Schnitzer and Meister, 2003). The present findings indicate that, at least in parasol cells, functional connections between pairs of adjacent cells can explain the observed synchrony. Note that this finding does not identify the mechanisms of synchrony. Previous evidence implicates a combination of mechanisms: direct gap junction coupling between neighboring RGCs (Brinvanlou et al., 1998; Hu and Bloomfield, 2003; Hidaka et al., 2004; Schubert et al., 2005; Volgyi et al., 2005), gap junction coupling through intermediate amacrine cells (Dacey and Brace, 1992; Jacoby et al., 1996; Vollgyi et al., 2005), and chemical synapses providing common input from bipolar or amacrine cells (Mastronarde, 1983a; Brivanlou et al., 1998) (and see DeVries, 1999; Hu and Bloomfield, 2003). The present findings do not refine this picture; instead, they reveal the circuit organization of these mechanisms in large groups of RGCs. Specifically, the following kinds of connectivity are not required to explain synchrony: (1) presynaptic common input to multiple RGCs or nonneighboring RGCs in the mosaic, or (2) reciprocal connections via intermediate cells that contact multiple or nonadjacent RGCs.
Second, the present results suggest that the spatial scale of connectivity between RGCs is smaller than the spatial scale of physiological interactions. Specifically, synchronized firing clearly extended to pairs of cells that are not adjacent in the mosaic (Fig. 2C), but this synchrony could be explained by interactions between adjacent cells. Thus, longrange contacts, such as gap junctions at the tips of dendritic arbors (which overlap considerably in parasol cells) (Dacey and Brace, 1992) or signals propagating through widefield amacrine cells, are not required to explain synchrony. Instead, synchrony could be caused by propagation of signals through a chain of adjacent cells in the mosaic, for example, through proximal gap junctions or narrowfield amacrine cells. Again, this finding does not uniquely identify the mechanism.
There are several caveats to the interpretations above. First, as with any model, the conclusions one may draw about retinal circuits are limited by the fact that small nonpairwise or nonadjacent interactions cannot be entirely excluded in any finite dataset. The degree to which the data quantitatively exclude such interactions are revealed by the sensitivity analysis presented in Results. Second, there are theoretical limits on what can be concluded about circuitry based on correlated firing patterns. For example, widely diverging Gaussian inputs can produce purely pairwise multineuron statistics. Third, only spontaneous activity and responses to a simple, finegrained visual stimulus (white noise) were examined. It remains possible that more complex interactions, over longer distances, occur in the presence of patterned visual stimulation with more natural structure, as has been suggested in previous work (McIlwain, 1964; Olveczky et al., 2003; Schnitzer and Meister, 2003). A test of this possibility will require shuffle correcting to control for multicell synchrony induced by stimuli covering multiple receptive fields. Fourth, analysis was restricted to parasol cells of the primate retina, which are efficiently sampled in the present recordings. It remains possible that different cell types, and retinas of different species, exhibit more complex interactions. Finally, the present work focused on spatial patterns of activity at a single point in time. It remains possible that complex temporal patterns are introduced by interactions that are not pairwise or not adjacent. These are important avenues for future work, and the maximum entropy framework can be extended to address many of these issues.
Although maximum entropy approaches have a long history in several fields, their application in neuroscience is fairly new. Recent theoretical work has set the stage (Martignon et al., 2000; Amari, 2001; Schneidman et al., 2003b), and one group has applied the approach to recordings from salamander and guinea pig retina and cultured cortical networks, concluding that pairwise interactions explain ∼90% of network interactions (Schneidman et al., 2003b, 2006). The present work provides several conceptual and technical advances. First, analysis was restricted to interactions between known morphological and functional types of RGC, in macaque monkey retina, and explained a substantially higher proportion of network interactions (∼98–99%). Second, analysis was restricted to neighboring cells in the mosaic, providing a spatial constraint on pairwise interactions within each cell type. Third, the model was crossvalidated (constrained by one dataset and evaluated on another) to avoid overfitting, which can produce accurate model fits that do not generalize. Fourth, responses were measured in the presence of a steady, uniform stimulus and finegrained whitenoise stimulus rather than naturalistic stimuli or modulating uniform stimuli; the latter have coarse spatial structure and thus would be expected to produce significant departures from pairwise statistics. Finally, sensitivity analysis provided a bound on interpretation of the model in terms of network connectivity and suggested that even fairly high prediction accuracy (e.g., 90%) can be consistent with substantial nonpairwise interactions in the retinal circuit.
The present findings suggest an approach to understanding the function of many large circuits in the brain, to the degree that recording technology permits. For example, the possibility of complex interactions between cells within and across columns in the neocortex (Ts’o et al., 1986; Cossart et al., 2003) or between many hippocampal neurons involved in storing spatial memories (Wilson and McNaughton, 1993) limits our capacity to understand these critical circuits. In the present work, an extremely simple pattern of connectivity sufficed to explain widespread synchrony in the retina. In some systems, such as gap junctioncoupled networks in the thalamus and cortex (Chuikshank et al., 2005) or inferior olive (Llinas and Yarom, 1981), the present approach may translate essentially unmodified to test whether electrical coupling can fully account for network interactions. In other systems, interactions are likely to be more complicated. For example, cortical neurons may exhibit highly specific interactions dependent on layer and cell type (Yoshimura and Callaway, 2005; Yoshimura et al., 2005), resulting in complex firing patterns. The maximum entropy method is readily extended to measure the complexity of the underlying circuits. Specifically, hypothesized constraints on connectivity that emerge from anatomical considerations can be included in computation of the maximum entropy distribution, allowing a direct test of whether they account for recorded multicell firing patterns. For example, the approach could be used to test whether interactions are restricted to small groups of cells (N = 2, 3, 4, …), or to cells over a specific length scale such as a cortical column. The approach factors out the influence of signals propagating through intermediate cells, which otherwise would confound the analysis of group size or spatial scale. Importantly, the methods can be used with measurements of spontaneous activity in circuits such as acute slices in which natural exogenous stimulation is not possible. With the increasingly widespread use of multielectrode recordings and optical methods in vivo and in vitro in many nervous system structures (Wilson and McNaughton, 1993; PerezOrive et al., 2002; Beggs and Plenz, 2003; Cossart et al., 2003; Briggman et al., 2005; Ohki et al., 2005; Womelsdorf et al., 2005; Ogawa et al., 2006), the importance of understanding the complexity of network interactions has grown tremendously. Thus, extensions of the present approach may prove useful for understanding functional connectivity in other neural circuits.
Footnotes
 Received March 24, 2006.
 Revision received June 27, 2006.
 Accepted July 1, 2006.

This work was supported by National Science Foundation (NSF) Integrative Graduate Education and Research Traineeship Grant DGE0333451, a BurroughsWellcome La Jolla Interfaces in Science Fellowship (J.S.), NSF Grant PHY0417175 (A.M.L.), a Sloan Research Fellowship, and National Institutes of Health Grant EY13150 (E.J.C.). We thank E. Simoncelli for key discussions; W. Dabrowski, A. Grillo, P. Grybos, P. Hottowy, and S. Kachiguine for technical development; R. Malouf for valuable technical suggestions; P. Latham for comments on this manuscript; E. Callaway, H. Fox, and K. Osborn for providing access to retinas; B. Kutka for technical assistance; and S. Barry for machining. We thank the San Diego Supercomputer Center and the National Science Foundation (Cooperative Agreements 05253071 and 0438741) for largescale data storage.
 Correspondence should be addressed to Jonathon Shlens, Systems Neurobiology, The Salk Institute, 10010 North Torrey Pines Road, La Jolla, CA 92037. Email: shlens{at}salk.edu