## Abstract

Sensory information is represented in the brain by the joint activity of large groups of neurons. Recent studies have shown that, although the number of possible activity patterns and underlying interactions is exponentially large, pairwise-based models give a surprisingly accurate description of neural population activity patterns. We explored the architecture of maximum entropy models of the functional interaction networks underlying the response of large populations of retinal ganglion cells, in adult tiger salamander retina, responding to natural and artificial stimuli. We found that we can further simplify these pairwise models by neglecting weak interaction terms or by relying on a small set of interaction strengths. Comparing network interactions under different visual stimuli, we show the existence of local network motifs in the interaction map of the retina. Our results demonstrate that the underlying interaction map of the retina is sparse and dominated by local overlapping interaction modules.

## Introduction

The nature of the population code with which large groups of neurons represent and transmit information depends on the network of interactions between them. The discussion whether neural population codes are redundant, independent, decorrelated, synergistic, or error correcting (Barlow and Levick, 1969; van Hateren, 1992; Gawne and Richmond, 1993; Dan et al., 1998; Gat and Tishby, 1999; Barlow, 2001; Petersen et al., 2001; Schnitzer and Meister, 2003; Schneidman et al., 2006) is to a large extent a question of the architecture of neuronal networks. Generally, the number of dependencies among neurons may be exponentially large and arbitrarily complex, and so understanding neuronal functional architecture relies on finding simplifying principles that govern the organization and activity of the network. Studies of functional dependencies in neural systems have focused on relations among brain regions (Sporns et al., 2004, 2005; Eguíluz et al., 2005; Achard et al., 2006; Livet et al., 2007; Stam et al., 2007) and global network properties such as small-world (Watts and Strogatz, 1998) and scale-free (Barabasi and Albert, 1999) properties, *in vivo* (Yu et al., 2008; Bonifazi et al., 2009) and *in vitro* (Bettencourt et al., 2007). The interplay between neuronal “architectural design” and functionality has been studied for wiring length and optimal coding (Chen et al., 2006), neuronal layout and dendritic morphology (Shepherd et al., 2005; Song et al., 2005; Lee and Stevens, 2007), and connectivity scaling laws (Clark et al., 2001; Chklovskii et al., 2002).

Analysis of the joint activity patterns in neural circuits has shown that, although typical pairwise correlations between cells are weak, even small groups of neurons can demonstrate strongly correlated population activity. In the retina, cortical cultures, cortical slices, and in cortical activity *in vivo*, it was shown that the minimal probabilistic models of the population that rely only on pairwise interactions between cells can be surprisingly accurate in capturing the network activity (Schneidman et al., 2006; Shlens et al., 2006, 2009; Tang et al., 2008; Yu et al., 2008). These maximum entropy models give a detailed, unique, functional interaction map of the network. The success of these models presents a huge reduction in the complexity of the network compared with the exponential number of potential network interactions of all orders. We ask here whether we can identify further simplifying design principles of the network organization.

The retina, because of its sensory role, organization, and experimental accessibility, has been a central experimental system to study neural population coding (Meister et al., 1995; Schnitzer and Meister, 2003; Segev et al., 2004). Recording from large populations of retinal ganglion cells, responding to natural and artificial stimuli, we studied the architecture of functional interactions in the retina. We examined how different approaches can reduce the complexity of the functional interaction network and which design principles are at the basis of these simplifications. We found that functionally the retinal code can be modeled as relying on a sparse interaction network, which is predominantly locally connected and comprises overlapping modules.

## Materials and Methods

##### Electrophysiology.

Experiments were performed on adult male and female tiger salamander (*Ambystoma tigrinum*). Before the experiment, the salamander was adapted to bright light for 30 min. Retinas were isolated from the eye and peeled from the sclera together with the pigment epithelium. Retinas were placed with the ganglion cell layer facing a multielectrode array with 252 electrodes (Ayanda Biosystems) and superfused with oxygenated (95% O_{2}/5% CO_{2}) Ringer's medium (in mm): 110 NaCl, 22 NaHCO_{3}, 2.5 KCl, 1 CaCl_{2}, 1.6 MgCl_{2}, and 18 glucose (at room temperature). The electrode diameter was 10 μm, and electrode spacing varied between 40 and 80 μm. Recordings of 24–30 h were achieved consistently. Extracellularly recorded signals were amplified (Multi Channel Systems) digitized at 10 kSamples/s on four personal computers and stored for offline spike sorting and analysis. Spike sorting was done by extracting from each potential waveform amplitude and width, followed by manual clustering using an in-house-written MATLAB program.

##### Visual stimulation.

Natural movie clips were acquired using a Sony (Handycam DCR-HC23) video camera at 30 frames/s. The stimulus was projected onto the salamander retina from a cathode ray tube video monitor (ViewSonic G90fB) at a frame rate of 60 Hz such that each acquired frame was presented twice, using standard optics (Puchalla et al., 2005). The original color movies were converted to grayscale, using a gamma correction for the computer monitor. The receptive field of each cell was mapped by calculating the average stimulus pattern preceding a spike under the random checkerboard stimulation. The center position of ganglion cell receptive field was found by fitting a two-dimensional Gaussian to the spatial profile of the response. The checkerboard stimulus was generated by selecting each checker (100 μm on the retina) randomly every 33 ms to be either black or white. In all cases, the visual stimulus covered the retinal patch that was used for the experiment entirely. “Natural pixel” stimuli were generated by selecting a random pixel from a natural movie and displaying the intensity of that pixel uniformly on the entire screen.

##### Exact solution for the maximum entropy pairwise distribution.

The maximum entropy pairwise distribution is known to take the form *x* is a binary vector that represents the spiking activity of each neuron in the network, α* _{i}* is related to the tendency to spike of the

*i*th neuron, β

*corresponds to the interaction between neurons*

_{ij}*i*and

*j*, and

*Z*is a normalization constant. The parameters α

*and β*

_{i}*can be found by maximizing the following function: where*

_{ij}*H*denotes the entropy function. This function is concave with the following derivatives

*denotes average with respect to the distribution*

_{P}*P*(similarly for β

*). The parameter values were found using gradient ascent.*

_{ij}##### Estimating maximum entropy distributions for large networks.

To find maximum entropy distributions using exact methods, one must calculate model expected values, denoted above *C*,*D* only, the threshold was relaxed to 2 and 11% because of smaller dataset size and higher uncertainty in empirical estimates), which is within the experimental error. To estimate the partition function, we generated samples from the desired distribution using Gibbs sampling and defined our estimate as *x* = (00…0)], because it holds that

##### Sparse interaction maximum entropy models.

To quantify the contribution of each of the simplifications of the full model, we limited ourselves to models of the following exponential form *E* ⊆ {1…*n*} × {1…*n*}. That is, *P*_{model} satisfies all the empirically measured firing rates but only a subset (denoted *E*) of the pairwise correlations, resulting in an interaction term β only for those interactions in the set *E*.

##### Discrete valued interaction models.

To generate a model with *K* different interaction clusters, each cluster having a single interaction value, we find the maximum entropy model that obeys the firing rates of each of the cells and a set of additional constraints of the form *C _{k}* represents a group of interactions that are forced to take the same value. The result is an exponential model:

*C*, which has an interaction strength identical to all pairs in that cluster, β

_{k}*.*

_{k}##### Measures of model performance.

In general, model accuracy was assessed by its similarity to the pairwise model, as quantified by the Kullback–Leibler divergence (Cover and Thomas, 1991). Because the number of possible network states grows exponentially with network size whereas experimental data is limited, empirical sampling becomes unreliable for large groups of neurons. Therefore, we used the pairwise model as a reference instead of the empirical distribution to overcome sampling noise and estimation errors (Ganmor et al., 2009, their Fig. 1*C*). Note that, for the family of models used in this study (i.e., maximum entropy models with a feature set that is a subset of the feature set of the pairwise model), we have that *D*_{KL}(*P*_{data} ‖ *P*_{model}) = *D*_{KL}(*P*_{data} ‖ *P*^{(2)}) + *D*_{KL}(*P*^{(2)} ‖ *P*_{model}); thus, the divergence from the true distribution is the *D _{KL}* we measure here plus a constant term

*D*

_{KL}(P

_{data}‖

*P*

^{(2)}) (Csiszar, 1975; Amari, 2001; Schneidman et al., 2003). For presentation purposes, we further scaled the measured divergence by

*D*(

_{KL}*P*

^{(2)}‖

*P*

^{(1)}) and present a scaled divergence

*d*(

*P*

^{(2)},

*P*

_{model}) =

*D*(

_{KL}*P*

^{(2)}‖

*P*

_{model})/

*D*(

_{KL}*P*

^{(2)}‖

*P*

^{(1)}) (

*d*= 1 means the model covered none of the distance between

*P*

^{(2)}and

*P*

^{(1)}, and is thus identical to the independent model, whereas

*d*= 0 means that the model covered the entire distance and is thus identical to the full pairwise model). Notice that this means that, regardless of the “true” underlying distribution, the shape of the curves we present (when we use the scaled divergence) will not change, but they may be shifted upward. When empirical sampling was involved, accuracy was measured using the Jensen–Shannon divergence (Lin, 1991) between the distributions. The Jensen–Shannon divergence was used instead of the more common Kullback–Leibler divergence (Cover and Thomas, 1991), because it is bounded and does not diverge in the case of empirical sampling. For comparison of empirical sampling to the different models, we randomly split the data into train and test sets (∼1 h for each set, noncontiguous). We quantify the performance of the models by the Jensen–Shannon divergence between the empirical distribution, the independent model, or the pairwise model (all estimated using the training data) and the empirical distribution estimated using the test data.

##### Markov blankets.

Given a network (or graph) with the set of random variables *X* = {*X*_{1}, …, *X _{N}*} as its nodes, the Markov Blanket (MB) of the variable

*X*is the set of its neighbors MB(

_{i}*X*). Given its Markov blanket, a node is independent of the rest of the nodes in the network, i.e.,

_{i}*P*(

*X*|

_{i}*X*\{

*X*}) =

_{i}*P*(

*X*| MB(

_{i}*X*)), where

_{i}*X*\{

*X*} denotes the whole network except for

_{i}*X*. Identifying the Markov blanket of a node

_{i}*X*is known to be a computationally hard problem, and so we use an approximation suggested by Abbeel et al. (2006). Briefly, we searched for a minimal set of nodes

_{i}*Y*for which the conditional entropy

*H*(

*X*|

_{i}*Y*) was very close to the value

*H*(

*X*|

*X*\{

*X*}). This was done by sequentially adding neurons to the set of neighbors (according to the distance of their receptive field from that of

_{i}*X*) and stopping when the conditional entropy was within 5% of its final value, estimated by conditioning on the 55 neurons with closest receptive field centers. For details about graphical representations of joint distributions and Markov blankets, see Koller and Friedman (2009).

_{i}## Results

To study the functional network architecture that underlies the population code of the retina, we recorded the joint activity of large groups of ganglion cells in the tiger salamander retina, responding to natural and artificial movies (Meister et al., 1994; Segev et al., 2004). We then constructed pairwise maximum entropy models of these neuronal networks (see below) and explored the nature of the interactions between neurons in these models. Figure 1*A* shows an example of the joint activity of 99 ganglion cells responding to a natural movie. If time is discretized into small enough bins, Δτ, then in each bin each of the neurons is either silent (“0”) or spiking (“1”), and we can represent the activity of the population at any given time bin by an *n*-bit binary “word” *x* = (*x*_{1}, *x*_{2}, …, *x _{n}*), where

*n*is the number of neurons. Here we used a time bin of Δτ = 20 ms, based on the typical width of the correlation function between cells, but this particular choice is not critical for the results we present (Schneidman et al., 2006; Ganmor et al., 2009).

In general, the distribution of activity patterns of *n* cells may require 2* ^{n}* − 1 parameters to describe. Although the typical correlation between pairs of neurons is weak, even small groups of neurons may be strongly correlated as a group (Bair et al., 2001; Schnitzer and Meister, 2003; Schneidman et al., 2006). Surprisingly, it was found for several different neural systems that the distribution of joint activity patterns of small- and intermediate-sized populations of neurons can be described with high accuracy using the minimal model that relies only on the firing rates of the cells and their pairwise correlations (Schneidman et al., 2006; Shlens et al., 2006, 2009; Tang et al., 2008). This minimal model is the maximum entropy distribution, which is the most random, or least constrained, model that has the same firing rates and pairwise correlations as in the empirical data but does not make additional assumptions about higher-order relations between neurons. The maximum entropy model is mathematically unique and is known to take the following form (Jaynes, 1957):
where the

*n*single-cell parameters {α

*} and the*

_{i}*n*(

*n*− 1)/2 pairwise interaction parameters {β

*} are found numerically such that*

_{ij}*x*〉

_{i}_{data}, 〈

*x*〉

_{i}x_{j}_{P(2)}= 〈

*x*〉

_{i}x_{j}_{data},

*i*,

*j*, = 1…

*n*, where 〈 〉

_{P(2)}denotes average over

*P*

^{(2)}, 〈 〉

_{data}denotes the empirical average, and

*Z*is a normalization factor or partition function.

Although the pairwise interactions in the model do not necessarily reflect a physical interaction between cells, they give a unique functional interaction map between the neurons in the network (supplemental Fig. S1, available at www.jneurosci.org as supplemental material) and represent statistical dependence between pairs of units. Importantly, the interactions in the maximum entropy model differ from the pairwise correlations between cells, which quantify the average tendency of cells to fire or be silent together. Pairwise correlations are notoriously problematic to interpret or analyze when more than two elements are involved. In particular, they are prone to “overcounting” of dependencies between elements, such that if neuron A has a strong influence on both cells B and C, then the latter two may be highly correlated even if they are not directly interacting at all. The maximum entropy approach, conversely, overcomes this problem by finding the minimal model that satisfies all correlations simultaneously. The resulting interactions correspond only to direct functional dependencies between cells (Besag, 1974). In the above example, the interaction value between neurons B and C (β_{BC}) will be zero, as one would intuitively require (Martignon et al., 2000; Schneidman et al., 2003, 2006; Bettencourt et al., 2007; Yu et al., 2008). Moreover, because of the equivalence between these maximum entropy models and Markov networks, the pairwise interactions provide a natural candidate for edges in the graphical representation of the network (Koller and Friedman, 2009).

### The accuracy and structure of the pairwise maximum entropy model for large neuronal populations

The maximum entropy pairwise model provides a very accurate description of neural activity for small groups of neurons (Fig. 1*B*,*C*), in agreement with previous studies (Schneidman et al., 2006; Shlens et al., 2006, 2009; Tang et al., 2008). In addition, pairwise maximum entropy models are much more accurate than independent models of neural population activity, which assume *P*^{(1)}(*x*) = Π* _{i}P*(

*x*), also for larger networks as is reflected by the accuracy of the model in predicting the observed activity patterns (Fig. 1

^{i}*D*) and the overall synchrony in the population (Fig. 1

*E*) (Tkacik et al., 2006; Shlens et al., 2009). For small groups of neurons, we can also directly quantify the contribution of pairwise correlations to the total network correlation, measured by

*I*

^{(2)}

*/I*(Schneidman et al., 2003), which we found to be ∼90% for nearby cells as well as for widespread populations (Fig. 1

_{N}*C*).

Importantly, for large groups of neurons, direct sampling of the entire joint distribution is not feasible because the number of network states grows exponentially with the number of neurons. However, even for large groups of 50–100 neurons, the precise empirical probability of many of the network activity patterns, or states, can still be estimated within tight confidence limits (Fig. 1*F*). This is because of the low entropy of the joint response, which can be bounded from above using the entropy of the pairwise model [22.33 ± 0.02 bits, SD over 20 Monte-Carlo estimates with 500,000 samples (Schneidman et al., 2003)]. This means that the patterns that account for most of the probability mass can be sampled empirically in reasonable time. In addition, the pairwise model accurately predicts the distribution of synchronous events, a statistic of the distribution that can be accurately estimated from the data even for large groups (Fig. 1*E*).

The second-order maximum entropy distribution *P*^{(2)} is not only more accurate than *P*^{(1)} but can give a better prediction of network activity patterns than what can be achieved by empirical sampling (as verified by cross-validation; see Materials and Methods). Figure 1*C* shows this is true even for small networks in which the number of samples is much larger than the number of network states (our data consist of ∼350,000 samples). The inaccuracy of empirical sampling is further exacerbated for large networks. The reason for the accuracy of the pairwise model is that it relies only on accurate sampling of pairwise correlations, which can be reliably estimated from a relatively small sample, and so does not suffer a drop in accuracy as network size is increased as a result of sampling errors (Ganmor et al., 2009).

### Local structure in the functional interaction network of large neural populations

Although the second-order maximum entropy model presents a huge simplification compared with the exponential number of potential interactions among cells, it still requires a quadratic number of parameters. Finding additional structural organization in the network may help us identify underlying principles governing the coding properties of the network. Figure 2, *A* and *C*, presents the functional interaction maps or β* _{ij}* between cells in two networks of 99 and 63 ganglion cells responding to natural movie stimuli, overlaid on the physical locations of the cells' receptive fields.

Ordering the interaction matrices between cells (Fig. 2*B*,*D*), such that nearby cells in the matrix are also physically close to one another, reflects the local nature of the pairwise interactions, in which strong positive interactions tend to aggregate near the diagonal, suggesting that neurons recorded from nearby electrodes tend to strongly interact.

The distribution of interaction values in the full pairwise model of ganglion cells responding to natural and to full-field natural pixel movies (see Materials and Methods) has many values that are close to zero with a long tail of strong positive interactions (Fig. 3*A*). Although correlations are generally larger when presenting the full-field stimulus (Fig. 3*A*) and the exact interaction values are affected by the stimulus (Fig. 3*B*), we find similar dependence of interaction strength on correlation and receptive field distance in both cases. Namely, interaction strength decays with the distance between cells (Fig. 3*C*), and weak pairwise correlations often imply weak pairwise interactions (Fig. 3*D*) (supplemental Fig. S1*A*, available at www.jneurosci.org as supplemental material).

We asked then how distant or weakly correlated neurons interact and whether they are acting almost independently from one another, given the neurons in their close vicinity (we emphasize this is conditional independence given other cells in the network and not conditional independence given the stimulus). Direct testing for this kind of near conditional independence is statistically hard, because it requires an accurate estimation of the joint distribution of all the neurons in the network. Still, two lines of evidence suggest that many neurons in the network are conditionally independent given a sufficiently large group of their neighbors. First, the pairwise interaction strength in the full pairwise model decays with the distance between receptive fields, implying a finite interaction range (Fig. 3*C*). Second, the conditional entropy of a neuron given a growing set of its neighbors seems to reach its minimum value after considering only ∼10–20 neighboring neurons (Fig. 4*A*,*B*), suggesting that neurons are nearly conditionally independent of all the other neurons in the network given a small set of neighboring cells; this is also known as the Markov blanket of each of the cells (Pearl, 1988; Koller and Friedman, 2009) (see Materials and Methods). We find a similar Markov blanket organization, also when the retina was presented with a natural pixel movie, which has different spatial stimulus statistics, and pairwise interaction values (Fig. 4*B*). Figure 4*C* shows an example of the layout of the Markov blankets of a subset of 20 neurons at the center of the recording array reflecting the high overlap between the Markov blankets of the neurons.

### A sparse interaction network accurately approximates the full pairwise model

The pairwise maximum entropy model presents a huge reduction in the dimensionality of the network model of large neuronal populations: for a network of *n* neurons, it uses an order of *n*^{2} parameters instead of the exponential number of parameters required in general to describe the distribution of activity patterns. It is not immediately clear, however, that all the pairwise interactions are necessary for the model to be accurate. In fact, the results presented in the previous section suggest that many of the pairwise interactions can be safely ignored.

We therefore asked whether we could further reduce the complexity of the model, or the number of its parameters, by removing pairwise interactions from the functional interaction network, without sacrificing much of its accuracy. Nearest-neighbor models offer such a simplified network model, which can still exhibit long-range correlations in the network. We first explored a simple nearest-neighbor model in which each element is only connected to the four elements with nearest receptive field centers, analogous to a two-dimensional Ising lattice system or first-order Markov random field used in image processing (Li, 2001). We found that for groups of 20 retinal ganglion cells responding to natural movies, which contain long range correlations, a nearest-neighbor model covers only ∼41 ± 19% of the distance between the independent model and pairwise model, leaving much room for improvement. These results differ from those of Shlens et al. (2009) who reported that nearest-neighbor models are very successful in describing the responses of similarly sized groups of ganglion cells in primate retina. Although differences between species may contribute to the difference in results, a recent study points toward the type of visual stimulation as the most likely source (Cocco et al., 2009). Importantly, while Shlens et al. (2009) use spatiotemporal white noise, which by construction lacks correlation structure, here we focused on natural movies that are known to contain long-range correlations.

To further explore the structure of the local interaction map, we used a heuristic approach to identify the important or dominant interactions in the network. We built a family of network models, starting from the independent model, by adding at each step the strongest interaction, which is not yet part of the network (interaction strengths were derived from the full model). In each step, we recalculate the appropriate maximum entropy model, for the new set of pairwise constraints (see Materials and Methods). Our results indicate that most interactions in the full pairwise network can be discarded, with only a very slight effect on the accuracy of the network model. For networks of up to 99 retinal ganglion cells responding to a natural movie, we used a specially tailored Monte Carlo algorithm to construct these maximum entropy models (Broderick et al., 2007) and found that using less than half of the potential interactions in the network we can cover >95% of the distance between the independent model and pairwise model (Fig. 5*A*,*B*).

We conclude that an accurate model does not necessarily require a fully connected network. However, to identify which edges can be removed from the network, we first had to construct a fully connected model.

Next we asked whether we could infer a priori which interactions are essential for building a successful model. We found that, by including only pairwise interactions according to the magnitude of correlation between the corresponding cells or by the proximity of their receptive field centers, we can construct a very accurate model using relatively few parameters, much like using only strong interactions (Fig. 5*A*). Importantly, selecting neuronal interactions based on correlation between cells could be implemented biologically, using Hebbian plasticity. Adding interactions according to the distance between the receptive field centers of the cells also allowed us to examine the number of neighbors required to achieve an accurate approximation. On average, >20 neighbors per node were required to cover 95% of the distance between *P*^{(1)} and *P*^{(2)} (Fig. 5*C*), suggesting that a simple lattice structure, or a first-order nearest-neighbor structure, does not suffice to accurately approximate the functional interaction network of retinal ganglion cells responding to natural stimuli. The average number of neighbors required is similar to the number calculated using the conditional entropy measure in Figure 4.

We further examined how the removal of edges from the network affected the overall structure and connectivity of the induced graph. We found that, by removing interactions according to their strength, the network breaks up into several disjoint components significantly faster than if edges are randomly removed. Nevertheless, to achieve an accurate description of network activity, we require 400–500 edges (Fig. 5*A*), which implies a nearly fully connected network (Fig. 6*A*).

Interestingly, when removing edges according to the magnitude of correlation between cells, the network disintegrates into disjoint components much more rapidly (Fig. 6*A*). Closer inspection reveals that the network initially breaks up into singleton neurons and one large connected component (Fig. 6*B*,*C*). An accurate description of network activity requires ∼1000–2000 interactions between highly correlated neurons (Fig. 5*A*). Thus, analysis based on the correlations implies a network with one large (80 neuron) component and 19 singletons. This stands in contrast to the analysis based on pairwise interactions in which a fully connected, but much sparser, representation was derived, providing an example in which these two similar types of analysis lead to qualitatively different results.

### Discrete interaction valued networks capture most of the full network model

Another simplified network model we considered is a uniform interaction model, in which the network is fully connected, with the same interaction strength between each of the pairs. This model has a single interaction parameter instead of the quadratic number of the full pairwise model and has clear computational and practical benefits (Bohte et al., 2000; Montani et al., 2009). We found that such a model is a poor approximation of the network and covers only 24 ± 2% of the distance between the independent and pairwise models.

The uniform connectivity model can be extended to allow for a small set of different interaction values: more than one, as in the uniform connectivity model, and less than the number of pairs, ∼*n*^{2}, as in the full pairwise model. Because there is no analytical solution for how to optimally cluster the interaction values into groups, we used a heuristic approach, by which we grouped together the interactions according to their value in the full model using the *k*-means algorithm. We found that such a model can capture most of the network behavior but greatly reduce the complexity of the model. For example, for 20 neurons, we found that typically only five different interaction values (for the 190 interactions in the full pairwise model in this case) are required to cover ∼95% of the distance between the pairwise and independent models (Fig. 7). Moreover, this clustering approach vastly outperforms the uniform connectivity model, the nearest-neighbor model, and even the sparse interaction model (with interactions chosen *post hoc* according to the interaction strength). Although it is not immediately clear how this approach may be implemented biologically, it provides a clear indication that the complexity of pairwise models for neural network activity can be greatly reduced.

### Motifs in the functional interaction networks of neurons

To test whether a more intricate local structure exists in the network, and in particular whether there are certain subgraphs (“motifs”), which occur more frequently than we would expect from a random network with no local structure, we compared the functional interaction network with an ensemble of random networks with similar overall interactions but devoid of any local structure, as in previous studies (Milo et al., 2002; Shen-Orr et al., 2002).

Network motifs have been studied in many biological and other systems (Alon, 2007), yet most studies, and algorithmic tools, have focused on unweighted network interactions. Thus, studying interaction networks such as transcriptional regulation networks or metabolic networks, which are fundamentally weighted interaction networks, has relied on transforming the interaction map into an unweighted representation. We first converted the functional interaction network into an unweighted network by removing all interactions with magnitude below some threshold and “coloring” each of the remaining interactions according to their sign (red for positive, blue for negative). Random networks were then generated using the switching algorithm in the study by Milo et al. (2003), such that the colored degree of each node (i.e., the number of blue/red edges connected to each node) was preserved.

We found that, if two neurons both interact with a third neuron, they are significantly more likely to interact with each other than expected by chance (see graphs 1 and 5 in Table 1), regardless of the number of edges retained in the graph. In addition, it is much more likely to find triplets for which the product of the pairwise interaction values is positive than is expected by chance from the distribution of interaction values (see graphs 1 and 3 in Table 1). In the equivalent model in physics, the Ising spin model, such a set of interactions among a triplet of spins (neurons) would be “unfrustrated,” because it has two stable configurations in which all edges are “satisfied” (the spins at both ends of the edge are aligned for positive edges and unaligned for negative ones). Triplets for which the product of pairwise interaction values is negative were much rarer than expected by chance (see graphs 2 and 4 in Table 1). Such interactions correspond to “frustrated” triangles in the Ising model. Frustrated triangles do not have stable states in which all edges are satisfied and are usually indicative of higher entropy because many states have relatively similar energy or probability.

Most of our conclusions hold regardless of the arbitrary threshold values used to remove edges (Table 1). Furthermore, although the exact value of the interactions is affected by stimulus statistics (Fig. 3*B*), the qualitative results regarding the simple building blocks presented in Table 1 are essentially unaffected when we repeat the procedure with the same retina presented with a different naturalistic full-field stimulus (natural pixel movie; see Materials and Methods) (Table 1, bottom rows).

To study the architecture of the weighted interaction network directly, we used the approach presented by Onnela et al. (2005). This study suggested to measure the “abundance” of a certain weighted subgraph in a weighted network using the intensity of the subgraph, defined as *V*(*g*) denotes the set of nodes in the subgraph]. The mean intensity of a motif is the sum of the intensities of all subgraphs isomorphic to that motif, divided by the number of subgraphs. These intensity measures can then be compared with those recovered from an ensemble of random weighted networks. To handle negative edge weight product [for which *I*(*g*) is not defined], we considered motifs with negative edge weight products separately, and their intensity was defined simply by

The results of the analysis of the weighted graph were consistent with those of the unweighted graph analysis, which we described above. Specifically, we found that the mean intensity of unfrustrated triangles was significantly higher than chance, whereas the mean intensity of frustrated triangles was significantly lower than chance (Fig. 8*A*,*B*). Moreover, a clear dependence between subgraph intensity and receptive field distance was observed (Fig. 8*C*). For subgraphs of size 5–7, the mean intensity in the functional interaction network was slightly but significantly greater than chance (for both positive and negative edge weight products, see Fig. 8*A*,*B*).

## Discussion

A mathematical description of the joint activity patterns of many interacting elements may require an exponential number of parameters. Pairwise maximum entropy models have been shown to accurately capture the joint activity patterns of small groups of neurons using only pairwise interactions between them. Analysis of the functional interaction network defined by the full pairwise model revealed that a reduced sparse pairwise interaction network can accurately approximate the fully connected one. These reduced models may be built by eliminating many of the weak pairwise interactions (*post hoc*) or by including only pairs that are highly correlated or have nearby receptive fields, suggesting that the important interactions in the network are those between neurons that are highly correlated or have nearby receptive fields. Alternatively, a very good approximation to the full pairwise model can be achieved by using only a very small number of different interaction values. In addition, our results show that the functional interactions are predominantly local and organized as a set of overlapping modules, whereas immediate nearest-neighbor interactions alone are not sufficient to fully capture the network responses to natural stimuli.

### Implications of structure for coding and learning

We found that the dominant functional interactions in the network are between neurons that are highly correlated or have adjacent receptive fields. In other words, assuming no interaction between neurons that are weakly correlated or have distant receptive fields incurs only a slight penalty on accuracy. However, this does not imply that the network can be simply broken up into independent components but rather should be viewed as a collection of overlapping modules, each of which is relatively small. This implies that the joint activity and dependencies of a set of neurons can be exactly inferred observing only that set and its neighbors (Abbeel et al., 2006), and thus learning the joint response of highly connected subnetworks may be achieved by observing only a small subset of the network. This suggests that the retinal code is organized as a collection of modules, with considerable overlap between them, in which each module can perform error correction to some extent because of the correlation among its units (Barlow, 2001; Schneidman et al., 2006). Interestingly, a recent study reported that the local field potential (LFP) signals in cortical cultures can be accurately modeled using a functional interaction network of non-overlapping yet connected clusters (Santos et al., 2010). The difference between the spiking patterns of large groups of neurons in the retina and the synaptic inputs in cortical cultures (reflected by the LFP signals) may result from biophysical sources (spiking vs LFP), anatomical organization, or the naturalistic stimuli we presented to the retina. However, it would be especially interesting if these differences reflect a fundamental difference in the structure of the neural code of the two systems studied.

From a computational perspective, the complexity of a model of the joint activity of a group of retinal ganglion cells can be approximately quantified by the number of model parameters. Simpler models have an obvious advantage for experimental and theoretical analysis, but they also imply more efficient learning and inference. Moreover, downstream structures are also likely to benefit from a compact representation of retinal response for decoding purposes. It remains to be seen whether downstream structures actually take advantage of the organization of the population code that we have shown and what is the role of higher-order interactions (Tkacik et al., 2006; Montani et al., 2009), especially in large networks. Notably, downstream neurons receiving input from retinal ganglion cells that are highly correlated or have adjacent receptive fields seems very biologically plausible and have been reported experimentally (Torborg and Feller, 2005).

### Network motifs

The small network motifs that we identified give an initial decomposition of the subgraph structures that were much more abundant or rare than observed in similar networks devoid of any local structure. We suspect, however, that these motifs may be subgraphs of larger recurring elements, which we did not identify. Still, because we reached qualitatively similar conclusions when presenting the retina with two very different stimuli, we speculate that these reflect a basic structure of neuronal functional interaction networks and are invariant to the external stimulus.

### Beyond the retina and temporal structure

Although this study focused on the retina, pairwise interaction models have been shown to provide an accurate description of the joint activity in other neural systems (Tang et al., 2008; Yu et al., 2008). We believe the approaches implemented here can be directly applied to other structures, as was done recently in analysis of the properties of the interaction network, derived from the same model, in cat visual cortex (Yu et al., 2008). Although the notion of receptive field is characteristic of early sensory stages, the rest of the properties we rely on, such as correlation, are common to all neural systems, and therefore similar analysis can be performed in different brain areas. Future studies may also be extended to examine temporal correlations as well as spatial ones. Although we focused on purely spatial interactions, a similar modeling approach can also be applied to spatiotemporal patterns as well (Marre et al., 2009).

## Footnotes

R.S. was supported by the Israel Science Foundation and the Horowitz Center for Complexity Science. E.S. was supported by the Israel Science Foundation, Minerva Foundation, the Horowitz Center for Complexity Science, Clore Center for Biological Physics, and the Peter and Patricia Gruber Foundation.

- Correspondence should be addressed to Dr. Elad Schneidman, Department of Neurobiology, The Weizmann Institute of Science, 76100 Rehovot, Israel. elad.schneidman{at}weizmann.ac.il