In the cortex, the interactions among neurons give rise to transient coherent activity patterns that underlie perception, cognition, and action. Recently, it was actively debated whether the most basic interactions, i.e., the pairwise correlations between neurons or groups of neurons, suffice to explain those observed activity patterns. So far, the evidence reported is controversial. Importantly, the overall organization of neuronal interactions and the mechanisms underlying their generation, especially those of high-order interactions, have remained elusive. Here we show that higher-order interactions are required to properly account for cortical dynamics such as ongoing neuronal avalanches in the alert monkey and evoked visual responses in the anesthetized cat. A Gaussian interaction model that utilizes the observed pairwise correlations and event rates and that applies intrinsic thresholding identifies those higher-order interactions correctly, both in cortical local field potentials and spiking activities. This allows for accurate prediction of large neuronal population activities as required, e.g., in brain–machine interface paradigms. Our results demonstrate that higher-order interactions are inherent properties of cortical dynamics and suggest a simple solution to overcome the apparent formidable complexity previously thought to be intrinsic to those interactions.
The mammalian cortex forms a complex network consisting of >1010 neurons. Interactions between these neurons exist at many different scales ranging from microcircuits in cortical columns to cortical areas across the whole brain. Consequently, during perceptive, cognitive, and motor functions, cortical dynamics are characterized by spatially distributed coherent activity patterns that reflect these neuronal interactions (Gray et al., 1989; Abeles et al., 1993; Bressler et al., 1993; Vaadia et al., 1995; Riehle et al., 1997; Rodriguez et al., 1999). During the past two decades, the simplest form of interactions, i.e., pairwise correlations, has been well identified (Singer, 1999). However, the study of higher-order interactions, i.e., the ones that manifest only in triplets, quadruplets, etc., has been challenging. The fundamental problem is the number of potential higher-order interactions, which grows exponentially with system size, e.g., the number of neurons considered, and makes the interaction structure quickly intractable. So far it is not clear how to generally overcome this “curse of dimensionality.” Consequently, the structure of higher-order interactions in cortical activity and its underlying mechanisms have remained largely unexplored.
Previous work suggested that pairwise interactions alone provide a rather complete picture of neuronal activity, potentially circumventing the necessity of identifying higher-order interactions. Schneidman et al. (2006) and Shlens et al. (2006) showed that pairwise interactions explain most of the spiking activity in small (≤10 neurons) retina networks, a finding that was soon extended to include larger systems (Cocco et al., 2009; Shlens et al., 2009; Ganmor et al., 2011a), in vivo cortical networks (Yu et al., 2008; Ohiorhenuan et al., 2010), and correlations among population activities measured in the local field potentials (LFPs) (Tang et al., 2008; Santos et al., 2010). In contrast, a number of more recent studies (Montani et al., 2009; Ohiorhenuan et al., 2010; Santos et al., 2010; Ohiorhenuan and Victor, 2011; Ganmor et al., 2011b) have identified significant higher-order interactions in neuronal spiking and LFP activities, e.g., third-order interactions among closely neighbored (<300 μm) cortical neurons (Ohiorhenuan et al., 2010; Ohiorhenuan and Victor, 2011) and interactions of up to the eighth order among retinal ganglion cells (Ganmor et al., 2011b). Clearly, the structure of higher-order interactions and their contributions to cortical dynamics are still open to debate. Here we show that pairwise interactions alone are not sufficient to identify neuronal avalanche dynamics (Beggs and Plenz, 2003), measured as ongoing cortical activity in awake monkeys based on the LFP. We then demonstrate that the incorporation of a specific structure of higher-order interactions that results from thresholding (Amari et al., 2003; Macke et al., 2011) improves the accuracy of reconstructing neuronal avalanche dynamics by up to two orders of magnitude. The same method is then shown to significantly improve the approximation of ongoing spiking activity in awake monkeys as well as visually evoked spiking responses in anesthetized cats. Our results characterize higher-order interactions at different scales of cortical dynamics with an unprecedented efficiency and robustness and suggest a possible mechanism underlying their generation.
Materials and Methods
Electrophysiological recordings in monkeys.
All experiments were performed in accordance with NIH guidelines for animal use and care. Ongoing LFP and spiking activity were recorded from two adult monkeys (Macaca mulatta; monkey A, female; monkey B, male). Multi-electrode arrays (MEA; 96 channels; 10 × 10 without corners, 400 μm interelectrode distance, 1 mm electrode length; Blackrock Microsystems) were chronically implanted in the arm representative region of the left premotor cortex (Fig. 1A). Approximately 30 min of ongoing LFP (1–100 Hz) and extracellularly recorded spiking activity (300–3000 Hz) were simultaneously obtained from each electrode while the animals were sitting alert in a primate chair with their heads fixed, but not engaged in a behavioral task. Spike sorting was performed off-line (Offline Sorter, Plexon). Putative single units were identified when (1) clearly isolated clusters were present in the PCA or waveform feature space and (2) the percentage of interspike intervals <1 ms was <0.2%. Based on these criteria, 56 (of 91 channels; monkey A) and 42 (of 78 channels; monkey B) putative single units were identified. Detailed results are shown for monkey A only. Qualitatively similar results for monkey B are summarized in Figures 7 and 10.
Electrophysiological recordings in cats.
To study evoked spiking responses in vivo, we used datasets of visually evoked activities recorded from cortical area 17 in two, adult anesthetized cats published previously (Yu et al., 2008). In short, the animals were artificially ventilated and anesthesia was maintained with a mixture of 70% N2O and 30% O2, supplemented with 0.5–0.6% halothane. Extracellular spiking activity was recorded by one or two silicon-based MEAs (4 × 4 electrode array, 200 μm interelectrode distance; Neural Nexus). The probes were inserted approximately perpendicular to the cortical surface to cortical depths of ∼1 mm. Visual stimuli (presented by ActiveStim) consisted of full-contrast, drifting sinusoidal gratings that spatially covered the receptive fields of all recorded neurons. Each trial was completed with 3–4 s presentations of a drifting grating with a direction randomly chosen from a set of 12 directions (0 to 360° range; steps of 30 degrees). Approximately 30 min of visually evoked responses were recorded for each cat. Putative single units were identified off-line using a customized, PCA-based program. Three datasets were analyzed (two probes for cat A and one probe for cat B). Detailed results are only shown for cat A, probe one. Qualitatively similar results for the other two datasets are summarized in Figure 10.
LFP and avalanche analysis.
Negative deflections in the LFP (nLFPs) were detected by applying a threshold at −2.5 SD of the LFP fluctuations estimated for each electrode separately (Fig. 1B). The nLFP peak times were then binned using a time window, Δt, to generate corresponding binary time series (1 for nLFP and 0 for the lack of it). Results shown are based on Δt = 2 ms. Similar results were obtained with other values of Δt ranging from 4 to 64 ms (data not shown). Spatiotemporal clusters of nLFPs, i.e., avalanches, were defined by consecutive bins that contained at least one nLFP at any of the recording sites (Beggs and Plenz, 2003). The size of a cluster was defined as the number of nLFPs in the cluster (Fig. 1E). Cluster size distributions were obtained using linear bins (i.e., size equals 1, 2, 3, …) and plotted in double-logarithmic coordinates for visual inspection. Shuffled time series of nLFPs were obtained by randomly switching interevent times between nLFPs for each site, i.e., electrode. This approach preserves the local average rates and interevent interval distributions of nLFPs, but removes spatiotemporal correlations among sites on the array.
Avalanche patterns and instantaneous nLFP patterns.
The analysis of avalanches was simplified by collapsing all time bins within an avalanche to obtain a corresponding spatial pattern x = (x1, x2, x3, … xn), where n is the number of recording sites included in the analysis and xi = 1 if at least one nLFP occurred at site i and xi = 0 otherwise (Fig. 1E). These patterns will be referred to as “avalanche patterns” hereafter. Collapsed bins were replaced with the all zeros pattern such that the total duration of the dataset was preserved. We note that our results hold even if this replacement strategy is not used, as only <5% of bins were lost without replacement and our results do not take into account interpattern intervals (see also the analysis on instantaneous nLFP patterns below).
We also studied single bin nLFP patterns, obtained after binning nLFP peak times at temporal resolution Δt, but without concatenating successive bins with active sites into an avalanche. That is, x = (x1, x2, x3, … xn), where xi = 1 if at least one nLFP occurred at site i and xi = 0 otherwise. These spatial patterns will be referred as “instantaneous nLFP patterns” hereafter.
Because of the computational demands of the Ising model (see main text), most of our analysis was restricted to patterns obtained from a subset of n = 10 electrodes if not stated otherwise. For spatially compact subsets (i.e., the neighboring cluster of electrodes), detailed results are shown. Qualitatively similar results for randomly chosen subsets are summarized in Figure 7.
Coherence potential analysis.
The method was adopted from Thiagarajan et al. (2010) with slight modifications. Coherence potentials were identified if nLFP waveforms (baseline-to-baseline excursion) recorded simultaneously from different cortical sites were highly similar. For individual sites, nLFP waveforms were categorized according to their absolute peak amplitudes, a, binned at a resolution of 0.5 SD, i.e., a0= [0, …, 0.5) SD, a1 = [0.5, …, 1) SD, …, a9 = [4.5, …, 5) SD. For every amplitude level ai, up to 300 randomly chosen nLFP waveforms were compared with simultaneously recorded waveforms from all other sites. The pairwise Pearson correlation coefficient, r, between nLFPs was computed to quantify waveform similarity. The fraction of sites, FRmin, which had waveform similarities r > Rmin was plotted against ai. The resulting function represents a density function of coherence potentials, i.e., the extent of spatial coupling as a function of nLFP peak amplitude. This approach differs slightly from Thiagarajan et al. (2010), where a minimal amplitude threshold was used to categorize nLFPs, i.e., >0.5, >1, … , >4.5 SD and the resulting sigmoid function represented cumulative probabilities of coherence potentials. We systematically varied Rmin from 0.1 to 0.9 to investigate the effects of its value on the results regarding spatial coupling and nLFP amplitude. To estimate the a priori probability of finding LFP waveforms with similarity larger than Rmin, for every nLFP analyzed, we randomly chose a length-matched LFP segment from the same channel and applied the identical analysis. The excess occurrence of nLFPs with high waveform similarity was quantified as the difference between the spatial coupling among nLFP waveforms and that of the length-matched, random LFP segments.
To obtain activity patterns for spikes, spike trains were binned at temporal resolution Δt. “Instantaneous spike patterns” were taken as the spatial distribution of spikes within Δt without further temporal concatenation, as for the “instantaneous nLFP patterns.” That is, x = (x1, x2, x3, …, xn), where xi = 1 if at least one spike generated by unit i and xi = 0 otherwise. We used a Δt = 20 ms for ongoing activity in the monkey and a smaller Δt = 4 ms for evoked activities in the cat, because of the relatively high firing rates during evoked responses compared with ongoing activity (Yu et al., 2008). The range of Δt used here is consistent with previous studies using the Ising model (Schneidman et al., 2006; Yu et al., 2008; Ohiorhenuan et al., 2010), and similar results were obtained by varying Δt by at least two folds, i.e., 10 ms for ongoing activity and 8 ms for evoked activity (data not shown).
To study model performances as a function of the average level of pairwise correlation, subsets of units with strong pairwise correlations were identified (Figs. 10B and 11). For three-unit groups, we calculated the average pairwise correlation for all possible three-unit combinations and chose those 100–250 triplets with the highest average correlation. For 10-unit groups, we calculated the average pairwise correlation for all possible 10-unit combinations or up to 106 randomly chosen ones. Among those, we analyzed 15–30 groups with the highest average correlation value. We note that the majority of units (>60%) were included in the analysis of strongly coupled groups.
Peri-event time histogram of spikes.
For each electrode, nLFPs within a given amplitude range a0= [0.1, …, 0.5) SD, a1 = [0.5, …, 1) SD, …, a9 = [4.5, …, 5) SD were selected and used as triggers to construct peri-event time histograms (PETHs) of units recorded at the same electrode. Histograms were binned at 2 or 32 ms, converted to rates, and ranged from −2 to 2 s. The difference in PETH peak value and the unit's mean firing rate was obtained for each nLFP amplitude range, normalized by the maximal difference and averaged over all units (Fig. 2C). A bin width of 32 ms was used for the analysis shown in Fig. 2C, which reduces the effect of systematically overestimating peak PETH values due to “leaked” spike waveforms in the LFP.
The Ising model (Schneidman et al., 2006) is expressed as P(σ) = Z−1 exp (Σi hiσi + Σij Jijσiσj), where P(σ) is the probability of the pattern σ = (σ1, σ2, …, σn). σi equals 1 or −1, representing the state of the ith element (1 for active and −1 for inactive). The normalization factor Z, intrinsic property hi, and interaction Jij were determined according to the experimentally observed rate 〈σi〉 and pairwise correlation 〈σiσj〉. The Ising model is a maximum entropy model with given 〈σi〉 and 〈σiσj〉. For 3- and 10-element systems, the Ising model was numerically solved using custom-written MATLAB code incorporating the MATLAB optimization toolbox function fsolve. For systems with up to 10 elements, the Ising model can be solved exactly with an error in fitting the rates and pairwise correlations smaller than 10−9. This results in almost identical lower-order statistics, i.e., event rates and pairwise correlations, between the Ising model and the data. For example, for the 10-electrode group shown in Figure 1G, the approximation error defined as | vIsing − vdata |/vdata,was <10−10, where vdata is the value of the rates or pairwise correlations calculated from the data, and vIsing is the corresponding value calculated from the pattern probabilities of the Ising model.
Dichotomized Gaussian model.
The dichotomized Gaussian (DG) model (Amari et al., 2003; Macke et al., 2009, 2011) has a threshold operation based on multivariate Gaussian variables: yi = 1 when ui > 0 and yi = 0 otherwise, where u = (u1, u2, …, un) ∼ N (γ, Λ) and γ is the mean and Λ is the covariance of the Gaussian variables, respectively. To match the rate, r, and covariance, Σ, of the observed binary variables, γ and Λ need to be adjusted according to γi = Φ−1 (ri) and Λij as the solution for Σij = Φ2 (γi, γj, Λij) − Φ (γi) Φ (γj), where Φ and Φ−1 are cumulative probability function of standard Gaussian distribution (Φ for 1D and Φ2 for 2D) and its inverse function, respectively. Implementation of the model with MATLAB can be found in Macke et al. (2009). The pattern probabilities for the DG model were calculated using the cumulative distribution of multivariate Gaussians (MATLAB function mvncdf) or running simulations (for large systems). In terms of event rates and pairwise correlations, the fitting precision of the DG model was usually 10−3, which reaches or exceeds the criteria used in related studies (Ganmor et al., 2011a,b), but is inferior compared with that of the Ising model. This limit is mainly due to the noise introduced in estimating the cumulative probability distributions of the multivariate Gaussian, for which the tolerance error was set to <10−4. Therefore, the superior performance of the DG model over the Ising model in approximating pattern probability and pattern size (see Results) cannot be attributed to a higher accuracy of the DG model in fitting event rates and pairwise correlations.
Model performance measures.
To quantify the total interdependence of the data that can be explained when only considering pairwise interactions, we used an entropy-based approach (Schneidman et al., 2006). The entropy (H) was calculated as H = Σ−P(x) logP(x), where x = (x1, x2, …, xn). xi = 1 or 0/−1 indicates the status of element i to be active or inactive, respectively. Then, the proportion of correlation that can be explained by pairwise interactions is represented by the ratio (HInd − HIsing)/(HInd − Hdata). Here, HInd is the entropy obtained from the “independent model” that matches event rates of the data, but assumes no correlation among elements, HIsing is the entropy obtained from the Ising model and Hdata is the entropy obtained from the original data.
The Jensen–Shannon (JS) divergence was used to quantify the performance of different models in approximating the data. The JS divergence is a symmetric, bounded measure of the distance between two distributions and has been used in related studies (Schneidman et al., 2006; Ganmor et al., 2011a). More precisely, the JS divergence between probability distribution p and q is noted as DJS (p‖q) and calculated as DJS(p‖q) = 0.5 DKL(p‖m) + 0.5 DKL(q‖m), where m = (p + q)/2 and DKL is the Kullback–Leibler divergence, calculated as DKL(p‖q) = Σ pi log (pi/qi). A two-fold cross-validation (Santos et al., 2010) was used for model comparison. That is, time bins of the original dataset were randomly assigned to one of two sets. Model parameters were determined from one set only and predictions were made for the second set. In addition to the DG, Ising, and independent models, a “half-data” model was also included for comparison. The half-data model simply uses the results, i.e., pattern probabilities or pattern sizes, measured in one-half of the dataset to predict the other half.
In estimating the JS divergence, a decision had to be made regarding the handling of patterns that were not observed empirically. Simply assigning a probability of zero to nonobserved patterns introduces a systematic bias in assessing a model's performance, because this approach underestimates the true probability of these patterns. To avoid such bias, only the common patterns that were actually observed in both data and all models (including the half-data model) were taken into account in calculating DJS.
Three controls were conducted to ensure that our conclusions about model performances were not affected by the exclusion of nonobserved patterns. In the first control, we analyzed the DG model's prediction about nonobserved patterns, an aspect that is not reflected in DJS. We reasoned that a prediction for nonobserved patterns is consistent with the experimental finding, if the predicted pattern probability is considerably smaller than 1/N, where N is the total number of patterns observed. Indeed, the average prediction of the DG model for the probability of nonobserved patterns was (0.46 ± 0.04)/N (mean ± SD; across 30 10-electrode groups analyzed in Fig. 4C). In the second control, we empirically tested how the number of nonobserved patterns affected our model comparisons. We systematically changed the sample size from 10% to 100% of the recording length of the original dataset and computed the ratio DJS(pdata‖pIsing)/DJS(pdata‖pDG). We found that this ratio was consistently larger than 1 for small sample sizes (e.g., with 10% recording length, the ratio is ∼4 for avalanche patterns and ∼1.4 for instantaneous spike patterns) and increased further or remained relatively stable as the sample size increased. For example, for the full, 100% recording length, the ratio was >10 for avalanche patterns and ∼1.6 for instantaneous spike patterns. This demonstrates the superiority of the DG model over the Ising model across varying proportions of nonobserved patterns. It further suggests that our findings are likely to hold even if sample size would be sufficiently increased to reliably estimate the probabilities of all potential patterns in a system. In our third and final control, we assigned zero probability to nonobserved patterns, which allowed us to include all patterns in the computation of DJS for the DG and Ising models. Unlike the case for KL divergence, the additional zero-probability patterns increase the magnitude of the JS divergence, given that the corresponding probabilities are nonzero in the model. However, we found that this approach did not change our conclusions about model performances (data not shown).
The statistical significance of performance differences were assessed with the paired-sample Wilcoxon signed rank test.
Calculating interactions of different orders.
Calculation of interactions in the three-element groups were based on full log-linear expansion (Amari, 2001; Nakahara and Amari, 2002): log P(x) = Σθixi + Σθij xixj + Σθijk xixjxk − ψ, where P(x) is the probability of the pattern x = (x1, x2, x3). xi equals 1 or 0, representing the state of the ith element. ψ is a normalization factor. θi, θij, θijk, and ψ were determined according to P(x), which is either experimentally observed (for the data) or directly calculated (for both the DG and Ising models). In principle, the calculations can be done for arbitrarily high orders, i.e., up to the system size. However, the amount of data needed for accurate calculation increases exponentially with higher order. Therefore, although interactions of higher orders larger than three exist in the DG model (Amari et al., 2003), and may be also in the data, we restricted our analysis to maximal third-order interactions given the limited length of our datasets.
In the current study, “correlation” and “interaction” have different meanings and are usually not interchangeable. For random variables x1,x2, …, xn, correlation measures the degree of apparent dependency among them. For example, pairwise correlation refers to the quantity like 〈xixj〉 (or 〈xixj〉−〈xi〉〈xj〉). In contrast, interaction refers to the quantity θ as defined in the previous section, which does not have a simple relation to correlations. We also note that θ is derived from the observed statistics and does not necessarily reflect any physical, causal relation such as synaptic interaction between two or more elements. For the interested reader, correlation and interaction are related to the η and θ coordinates in information geometry, respectively (Nakahara and Amari, 2002).
Higher-order interactions are essential for neuronal avalanche dynamics
We first examined the interaction structure in a well identified type of cortical dynamics—neuronal avalanches. Neuronal avalanches describe the spatiotemporal organization of synchronized activity in superficial layers of cortex. They have been demonstrated in spontaneous activity in vitro (Beggs and Plenz, 2003; Stewart and Plenz, 2006, 2008) and in ongoing activity in vivo (Gireesh and Plenz, 2008; Petermann et al., 2009; Hahn et al., 2010; Ribeiro et al., 2010). Importantly, avalanche size, s, distributes according to a power law, P(s) ∼ sα with exponent α close to −1.5, a hallmark of critical state dynamics (Plenz and Thiagarajan, 2007; Klaus et al., 2011). Both theoretical (Kinouchi and Copelli, 2006; Rämö et al., 2007; Tanaka et al., 2009) and empirical studies (Shew et al., 2009, 2011) suggest that avalanches optimize various aspects of information processing in cortical networks.
Using implanted 10 × 10 microelectrode arrays, we measured ∼30 min of the ongoing LFP in premotor cortex of two alert macaque monkeys (Fig. 1A). Negative LFP deflections (Fig. 1B), which correlated in our recordings with neuronal firing (Fig. 1C,D) (Petermann et al., 2009), were identified using an amplitude threshold amin of −2.5 SD of the LFP calculated for each electrode separately. Importantly, when nLFPs on the array were concatenated into spatiotemporal nLFP clusters (Fig. 1E; top, temporal resolution Δt = 2 ms; see also Materials and Methods), the cluster size s distributed according to a power law with an exponent of −1.5 and the distribution showed finite-size scaling, i.e., the cutoff changed systematically with array size (Fig. 1F) (Klaus et al., 2011). The power law and corresponding exponent identify the ongoing activity as neuronal avalanche dynamics. The power law indicates the presence of long-range correlations between cortical sites and, accordingly, is destroyed when nLFPs are shuffled randomly (Fig. 1F, broken lines).
To investigate whether pairwise interactions are sufficient to explain the power law in avalanche sizes, we first constructed a maximum entropy model (Schneidman et al., 2006) based on the neural activity. Also known in physics as the “Ising” model, this is a widely used method that only utilizes the observed first order, i.e., event rate, and second order, i.e., pairwise correlation statistics. It ensures that no higher-order interactions are taken into consideration for reconstruction of the observed activities. The exact solution of the Ising model is computationally highly demanding for large neuronal groups (Cocco et al., 2009; Ganmor et al., 2011a,b), which limits our reconstruction of avalanche dynamics to arrays of size 10. The Ising model intrinsically lacks temporal structure and interactions are assumed to be instantaneous, whereas activities at different times are independent. To study neuronal avalanches with this model, we mapped the spatiotemporal organization of avalanches into corresponding spatial avalanche patterns (Fig. 1E, bottom; see also Materials and Methods). This approach is justified because nLFPs at a single site rarely recur during an avalanche as indicated by the sharp cutoff in avalanche size distributions at system size in vitro as well as in vivo (Beggs and Plenz, 2003; Petermann et al., 2009; Klaus et al., 2011). Accordingly, size distributions of spatial avalanche patterns well matched the power law distribution of spatiotemporal avalanches (compare Figs. 1F,G and 4F).
In line with previous studies (Tang et al., 2008; Santos et al., 2010), the Ising model captured >80% of interdependence for 10-electrode avalanche patterns (see Materials and Methods). More specifically, ∼85% of interdependence was captured for the example shown in Figure 1G and on average 83 ± 2% (mean ± SD) was captured for 30 randomly chosen, spatially compact 10-electrode patterns (Fig. 1H). However, the Ising model failed to reconstruct the power law distribution of avalanche patterns. It significantly underestimated the probability of medium-sized patterns and overestimated the probabilities of small and large patterns (Fig. 1G,H). We conclude that neuronal avalanche dynamics contain significant higher-order interactions beyond that of pairwise interactions.
Coherence potentials identify an intrinsic threshold for nLFP pattern generation
To identify the higher-order interactions required for the reconstruction of neuronal avalanches, we analyzed in detail the emergence of nLFPs among spatially distributed channels using our previously introduced coherence potential analysis (Thiagarajan et al., 2010). Coherence potentials have been identified in ongoing neuronal avalanche activity as dynamics with an intrinsic threshold at the neuronal population level (Thiagarajan et al., 2010). A coherence potential consists of highly similar nLFP waveforms (baseline-to-baseline excursion) that occur almost simultaneously at different cortical sites (Fig. 2A,B). Importantly, the spatial extent of a coherence potential depends nonlinearly on the amplitude of the participating nLFPs. More specifically, while local unit firing increases linearly with nLFP amplitude (Fig. 2C), the spatial coupling, i.e., fraction of cortical sites, F, that engage in a coherence potential follows a nonlinear function. Figure 2D demonstrates this function in our recordings for ongoing avalanche activity in monkey A using a minimal similarity requirement Rmin = 0.8. Its steepest change occurs for nLFP amplitudes ≅ −2.5 SD (Fig. 2D, arrow). A systematic exploration of Rmin reveals that the nonlinearity in F is robustly found for Rmin > 0.6 (Fig. 2E). The inflection point of the function is also clearly visible when correcting for the a priori random expectation of coherence potentials (Fig. 2F; see Materials and Methods). Accordingly, at an amplitude threshold of −2.5 SD, the majority of nLFPs in our avalanche analysis in Figure 1 were identified as coherence potentials (86 ± 8%, Rmin = 0.8; average across all electrodes), i.e., there was at least one other site with a highly similar nLFP waveform. These results suggest a threshold at ∼ −2.5 SD intrinsic to the generation of the spatiotemporal avalanche patterns shown in Figure 1, above which the similarity in LFP waveforms rapidly increases.
Accurate approximation of avalanche patterns by a simple parametric model that utilizes a threshold function
Theoretical studies proposed nonlinear, e.g., threshold, operations as a potential origin for higher-order interactions (Amari et al., 2003; Macke et al., 2011). Therefore, to investigate whether higher-order interactions in neuronal avalanches can be explained by thresholding, we fit our data with the DG model (Amari et al., 2003; Macke et al., 2009, 2011). Like the Ising model, the DG model is a stochastic model without free parameters, i.e., all parameters are determined by the observed nLFP rates at each electrode and pairwise correlations between nLFPs at different sites. This model is based on multidimensional Gaussian variables followed by a threshold operation that results in stochastic output sequences of binary events (Fig. 3). The DG model is fit to the data by (1) adjusting the threshold level for each dimension to match the observed nLFP or spike rates and (2) adjusting the covariance for individual pairs of dimensions to ensure that the resulting binary time series have the same pairwise correlations as the original nLFP or spike sequences (Fig. 3) (see also Material and Methods and Macke et al., 2009 for details). Whereas the Ising model contains only pairwise interactions, the DG model introduces, in addition, higher-order interactions among the output binary variables due to the threshold operations.
We found that the power law in the size distribution of avalanche patterns was correctly reconstructed when using the DG model (Fig. 4A). More precisely, the JS divergence (see Materials and Methods) between the predicted distribution and the true distribution was reduced by ∼98% (0.002 for the DG model vs 0.085 for the Ising model). Moreover, the DG model predicted the probability of avalanche patterns much more accurately than the Ising model (10 sites; Fig. 4B; 1.3 × 10−4 JS divergence for the DG model vs 1.8 × 10−3 for the Ising model). When applying the two models to 30 randomly chosen, spatially compact 10-electrode groups, we found that the DG model consistently outperformed the Ising model (on average 68 × better for size distribution and 11 × better for pattern probability; Fig. 4C). With respect to pattern probability, the accuracy of the DG model was even significantly better than the prediction based on the data directly (Fig. 4C, right; see Materials and Methods). This is possible because the DG model requires relatively few parameters to be determined, thereby reducing estimation errors that originate from finite sampling (see below for more analysis regarding this property). Similar results were found for monkey B (Fig. 7A).
For a more detailed comparison between the two models, we directly calculated first-, second-, and third-order interactions in the original data and the corresponding predictions from the DG and Ising models. Indeed, the DG model accurately predicted these interactions for neuronal avalanches as shown in Figure 4D. Moreover, this analysis not only confirmed that the Ising model failed to predict any third-order interactions, as expected (Fig. 4D, ellipsoid, arrow), it also demonstrated that the Ising model was much less accurate in identifying second-order interactions when compared with the DG model (Fig. 4D, upper right, arrow). As a result of the estimation error in the interaction terms, we found that the probabilities of avalanche patterns predicted by the Ising model significantly deviated from the observed probabilities, even in a small system with only three electrodes (Fig. 4E). Importantly, the DG model can be easily applied to larger systems (≫10 elements), for which the Ising model is computationally difficult. In Figure 4F, we demonstrated accurate predictions of the DG model for neuronal avalanche dynamics, i.e., the pattern size distributions, which engage up to ∼100 sites and cover ∼16 mm2 of cortex. We note that some of the results for large systems are well above the “perturbation regime” (Roudi et al., 2009), which is characterized as Nr ≪ 1, where N is the number of elements and r is the average probability of event. Even for large Δt, when Nr > 1 (e.g., Nr = 1.8 and 3.5 for Δt = 32 and 64 ms, respectively), the DG model accurately captured the corresponding size distributions of the 91-electrode array (data not shown). Thus, our findings may generalize to even larger systems in which correlations become the dominant force in shaping the neuronal dynamics (Schneidman et al., 2006).
The contribution of temporal correlations to higher-order interactions in neuronal avalanches
To demonstrate the specific contribution of temporal correlations longer than Δt to higher-order interactions in avalanche dynamics, we used “bin-shuffling.” More specifically, the instantaneous nLFP patterns (see Materials and Methods) at temporal resolution Δt were randomized with respect to bin position and then concatenated to obtain spatiotemporal clusters, i.e., bin-shuffled clusters. This approach maintains the original spatial correlation structure at temporal resolution Δt, but destroys temporal correlations longer than Δt. As shown in Figure 5, bin-shuffled spatiotemporal clusters do not show a power law in cluster sizes, demonstrating the presence of temporal correlations beyond Δt in neuronal avalanche dynamics. We then compared third-order interactions in the original avalanche patterns, θijkavl, to those of bin-shuffled clusters, θijkbinsh. We expect that |θijkavl| > |θijkbinsh |, which implies that a sizable amount of third-order interactions can be attributed to the temporal correlations intrinsic to neuronal avalanche dynamics. We further expect third-order interactions of instantaneous nLFP patterns, θijkinst, to be in the range of θijkbinsh, which implies that the contribution to higher-order interactions from random temporal concatenation is small. Indeed, |θijkavl| = 3.67 ± 0.33 (average and SD across 100, randomly chosen three-electrode groups from monkey A; the same as below), which was larger than |θijkbinsh | = 2.07 ± 0.46 (by 78%; p ≪ 10−6; t test), while |θijkbinsh| was only slightly larger than |θijkinst | = 1.94 ± 0.48 (by 7%; p = 0.054; t test). These results demonstrate that higher-order interactions, exemplified here by third-order interactions, arise from spatial as well as temporal correlations in the original avalanche dynamics.
Accurate approximation of instantaneous LFP patterns by the DG model and possible mechanistic link between the model and the data
So far, we mapped temporal correlations beyond Δt into the spatial domain before applying the Ising or DG model. Here, we repeat our analysis for instantaneous nLFP patterns (see Materials and Methods), which do not involve temporal concatenation and which have been investigated in previous studies on neuronal interactions (Tang et al., 2008; Santos et al., 2010). As shown in Figures 6 and 7B, the performances of the two models and the demonstration of higher-order interactions were highly similar to our results obtained with avalanche patterns.
Using the instantaneous nLFP patterns, we then investigated to which extent the DG model might reflect the actual data structure. Specifically, we examined whether the hidden layer of Gaussian variables in the model well approximates the recorded continuous LFP. First, we found that the 2D joint distribution of LFP amplitude from pairs of electrodes was well fit by a 2D Gaussian distribution (R2 = 0.9989 ± 0.0003 for 50 randomly chosen pairs). A similarly good fit was found for the marginal, i.e., 1D, distribution of LFP amplitude (Fig. 8A, left for an example; R2 is 0.9997 ± 0.0001 for all 91 electrodes). These results suggest that the multidimensional Gaussian used in the DG model provides a reasonable approximation for the actual LFP amplitude distribution. However, closer scrutiny revealed that this approximation only held for a certain range of LFP amplitudes. In fact, the LFP amplitude distribution systematically and significantly deviated from a Gaussian for amplitudes larger than ±2–3 SD. This deviation is not captured in the overall goodness of fit, but is evident in semilog plots (Fig. 8A, right) (Thiagarajan et al., 2010, their supplemental Fig. S10). We conclude that the Gaussian approximation of the DG model holds for LFP amplitudes up to ∼±2–3SD.
We then examined the relation between the hidden variables of the DG model and the correlation structure of LFPs using the full array. We note that when constructing the DG model, the covariances of the Gaussian variables are only determined by the pairwise correlations of the binary variables, i.e., nLFP patterns. In Figure 8B, we showed that by fitting those correlations, the DG model also captured to a large degree the pairwise correlations among continuous LFP signals. More precisely, at a threshold of −2.5 SD as used in our analysis, the pairwise correlation in the LFP between two sites correlated well with that of the corresponding pair of Gaussian variables (Fig. 8B; Spearman's r = 0.92). On the other hand, the correlation between the hidden variables was systematically lower compared with the correlation of the corresponding LFP sites.
Next we asked whether the covariance matrix of the DG model estimated at nLFP threshold T can be used to predict nLFP patterns obtained with threshold T′ (with proper rate adjustment). In Figure 8C and D, T′ was fixed to −2.5 SD, while T was changed from −1.5 to −3.5 SD. In Figure 8E, T was kept constant at −2.5 SD, while T′ was changed from −1.5 to −3 SD. Indeed, DG models constructed from mismatched, i.e., T ≠ T′, covariance matrices provided fairly good fits (Fig. 8C) and yielded several-fold smaller JS divergence when compared with the Ising model (Fig. 8D,E). On the other hand, the fitting accuracy was highest for the matched, i.e., T = T′, DG model and deteriorated quickly when the distance between T and T′ increased (Fig. 8D,E). These results demonstrate that the covariance matrixes of the DG model are similar, but not identical across different nLFP thresholds. If mismatched and matched DG models had provided equally good fits, the DG model could have been considered a purely mechanistic model in which the hidden Gaussian variables represent the true LFP distribution and the nLFP patterns simply arise from thresholding. However, the threshold dependency suggests that this view is too simplistic. That is, while the analysis shows that the DG model can account for a large portion of the higher-order interactions, some systematic differences remain. This in turn suggests that a simple mechanism — local nonlinear transformation (e.g., thresholding) — plays a large role in the generation of higher-order interactions. However, the literal interpretation that a threshold is applied to an underlying multivariate Gaussian is likely an oversimplification for the generation of nLFP patterns (see Discussion for more details on this issue).
The nature of higher-order interactions generated by thresholding
To better understand the nature of the higher-order interactions that are introduced by thresholding in the DG model, we systematically varied the mean and covariance of a 3D Gaussian. The 3D Gaussian served as the layer of continuous variables in the model, before applying a fixed threshold (>0). We then investigated how these changes affected the third-order interactions in the resulting binary variables. As shown in Figure 9A, the thresholding procedure indeed introduced third-order interactions that were absent in the pre-thresholded Gaussian, especially when event probabilities were far above or below 0.5 and pairwise correlations were strong. We point out that an event probability r = 0.5 corresponds to a Gaussian mean γ = 0. We then varied the event probability and pairwise correlations for 10-element systems (binary variables) and constructed both the DG and the Ising models using identical constraints. The difference between the two models indicates the total amount of higher-order interactions introduced by thresholding and was expressed as entropy difference (Fig. 9B). Consistent with our results shown in Figure 9A and previous findings (Macke et al., 2009, 2011), this difference was more pronounced when rates were low and pairwise correlations were strong, which is typically the case for nLFPs in ongoing activity (Petermann et al., 2009).
Accurate approximation of the ongoing and evoked spiking activity
We then extended our analysis to spiking activity of both ongoing and evoked cortical dynamics. We first analyzed extracellularly recorded spiking activity of up to 56 units during ongoing avalanche activity in the two awake monkeys. Similar to our nLFP analysis, in Figure 10A we show that the DG model significantly outperformed the Ising model in predicting both instantaneous spike patterns and their corresponding sizes, i.e., number of concurrent units, also referred to as neuronal synchrony (Montani et al., 2009). In line with the analysis shown in Figure 9, the improvement by the DG model over the Ising model was less pronounced compared with the LFP analysis, given the higher rates (0.075 vs 0.002, monkey A) and weaker pairwise correlations (0.04 vs 0.35) of spiking activity compared with nLFPs. Accordingly, when separately analyzing neuronal groups with relatively strong pairwise correlations (see Materials and Methods), predictions of the DG model for spike patterns, pattern sizes, and third-order interactions further improved over the Ising model (Figs. 10B and 11A–C). We note that these groups included the majority of recorded units (>60%). Again, the DG model was readily extended to accurately predict the sizes of instantaneous spike patterns, i.e., spike synchrony, for large number of units (n = 56; Fig. 11D). These results were not limited to ongoing activity only. We obtained similar results when analyzing spiking activity recorded from area 17 in anesthetized cats in response to grating stimuli (Figs. 10 and 11E–H). The main difference between the two datasets referred to the range in pattern probabilities, which extended for the cat datasets down to 10−6 (compare Fig. 11A,E). We believe this difference is mainly due to the smaller bin width used in the analysis and high firing rate of evoked responses compared with the ongoing activity in the monkey. This analysis demonstrates the general applicability of the DG model to spontaneous and evoked spiking activities.
Efficient characterization of population activity
The approximation of higher-order interactions by a simple threshold operation allows for an efficient characterization of neuronal activity with relatively little data sampling. This is particularly desirable in studies of neural coding or brain–machine interfaces that require higher-order statistics of the population activity, such as pattern probabilities. While these statistics in principle can be directly obtained from the data (Fig. 4B), this approach becomes readily unfeasible for larger system sizes as outlined in the Introduction. The DG model only requires event rates and pairwise correlations to approximate those pattern probabilities, which can be obtained within relatively short durations (<30 min in our analysis), and, importantly, those durations are independent of system size. The DG model is also simple enough to be treated analytically (Amari et al., 2003; Burak et al., 2009; Tchumatchenko et al., 2010; Macke et al., 2011), which allows for direct calculation of pattern probabilities. This advantage holds even for small systems (n = 10 elements) as shown in Figure 12A. As typically done in brain–machine interface studies, we used a training dataset to make inferences about future data, i.e., the testing set. We randomly separated our 30 min recordings of nLFPs into two 15 min sets, which served as the testing and training set, respectively. We then systematically shortened the length of the training set and compared the performance in predicting the pattern probability for the testing set between the DG model and the direct sampling. The DG model increasingly outperformed direct sampling as the duration of the training sets shortened (Fig. 12A,C). To achieve the same prediction accuracy, the DG model required much shorter recording length (<1/3) compared with direct sampling (see Fig. 12B and corresponding legend for details). Similar advantages of using the DG model in characterizing pattern probabilities were also found for both ongoing and evoked spiking activities (Fig. 12C).
Here we show that higher-order interactions are essential for understanding neuronal interdependence in the cortex, at the population level of neuronal avalanches and at the level of ongoing and evoked activities of individual neurons, especially when pairs of neurons are strongly coupled. This is consistent with recent findings that the inclusion of higher-order interactions yielded better approximation of LFP patterns (Santos et al., 2010) as well as neuronal synchrony (Montani et al., 2009). Our findings are also in line with the shortcomings reported for the Ising model to predict correlated activity of nearby neurons (Ohiorhenuan et al., 2010; Ohiorhenuan and Victor, 2011) or of large network activity (∼100 retina ganglion cells) in response to natural stimuli (Ganmor et al., 2011b). Moreover, our analysis provides new insights into these recent findings. For example, the relation between the pairwise and higher-order interactions as revealed by the DG model (Fig. 9) provides a possible explanation for the phenomenon that the more pronounced third-order interactions are usually accompanied by strong pairwise interaction or correlation (Ohiorhenuan and Victor, 2011). In addition, the same mechanism may account for the specific failure of the Ising model to capture responses to natural stimuli (Ganmor et al., 2011b), as such responses are often characterized by sparse firing and strong pairwise correlations (Stüttgen and Schwarz, 2008; Jadhav et al., 2009). More generally, our results give a satisfactory account for the reported lack of higher-order interactions in many previous studies (Schneidman et al., 2006; Shlens et al., 2006; Tang et al., 2008; Yu et al., 2008), as those interactions are most pronounced in the regime of low rates and high pairwise correlation, which these studies often did not discriminate.
The significance of our study lies also in the clear demonstration that higher-order interactions in cortical dynamics are well approximated by a simple nonlinear operation, which has been suggested in a recent theoretical analysis (Macke et al., 2011). In the DG model, higher-order interactions are solely determined by the rates and pairwise correlations, so they do not serve as free parameters of the system and, therefore, they do not add extra complexity to the model. Importantly, the DG and Ising models have the same number of parameters. Thus, our results suggest a simple solution for taming the daunting complexity that was previously thought to be intrinsic to higher-order interactions.
Using only a quadratic number of parameters, the DG model provides a very compact description of data commonly expected to yield an exponential complexity of higher-order interactions. This indicates that the model captures, at least to some degree, the actual structure that generates the data (Baum, 2004). What would be the possible mechanistic link between the DG model and the neural process underlying the generation of spiking or nLFP patterns? For spiking activities, the correspondence is readily visible. The DG model assumes a Gaussian structure to which a threshold is applied. It approximates the membrane potential on which the cellular action potential generating mechanism operates. In fact, a Gaussian distribution of membrane potentials has been reported in intracellular recordings (Rudolph et al., 2004; Okun et al., 2010; Constantinople and Bruno, 2011), but see DeWeese and Zador (2006) for non-Gaussian distributions found under anesthesia. This finding is often used to support theoretical studies on neuronal processing that assume weak correlations between neurons: as a neuron usually receives a large number of synaptic inputs, central limit theorem indicates that the distribution of the membrane potential will be approximately Gaussian, provided that the correlation among those inputs is not too strong.
Our LFP and coherence potential analysis (Fig. 2; Thiagarajan et al., 2010) also suggest a potential mechanistic link between the DG model and nLFP patterns. LFP amplitudes are Gaussian distributed for small and median values, i.e., less than ±2–3 SD, but deviate from a Gaussian at larger values (Fig. 8A). The deviation corresponds with the nonlinear, threshold regime of extraordinary spatial coherence, the coherence potential (Fig. 2). These coherence potentials have been identified as spatially distributed all-or-none population events of high fidelity in analogy to the action potential at the cellular level (Thiagarajan et al., 2010). As shown previously, the coherence potential threshold likely reflects intrinsic neuronal dynamics because it cannot be simply explained by (1) enhanced signal-to-noise ratio with high LFP amplitudes, (2) stronger volume conduction, or (3) common subcortical inputs as the dynamics also emerge in cortical cultures with neuronal avalanches (Thiagarajan et al., 2010). The neuronal mechanisms underlying such threshold-like behavior have yet to be identified. Because the nLFP amplitude correlates with the activity of local neuronal groups (Figs. 1C,D and 2C; Petermann et al., 2009), the putative threshold-like operation may reflect a nonlinear, recurrent recruitment of local suprathreshold activity that subsequently recruits distant sites (Douglas et al., 1995; Thiagarajan et al., 2010). On the other hand, despite such important mechanistic similarity between the DG model and the nLFP data, our exploration of the mismatched model suggested that the structural correspondence between the data and the model is not exact. This is not surprising as we know that the model is not intentionally constructed to reflect every process in the generation of the data. For example, in the DG model, all superthreshold values will result in the same binary state (active state; Fig. 3) while for nLFP patterns, only the peak of the negative LFP deflections were used to indicate the active state (i.e., the nonpeak values, although they might be superthreshold, will be indicated as inactive state). Furthermore, in the generation of avalanche patterns, there is a process mapping the spatiotemporal clusters to spatial patterns (Fig. 1), which is not used in the model.
Together, our results suggest that the DG model captured some essential features about the mechanisms underlying the formation of neuronal dynamics, at both the single unit and LFP level. Specifically, in terms of the origin of higher-order interactions, our results suggest that they are probably introduced by intrinsic thresholding such as in spike and nLFP pattern generation. However, to prove such a causal relation is awaiting future experiments, e.g., in which the level of pairwise correlation of the membrane potentials can be manipulated and the resulting changes in higher-order interactions among spike trains can be demonstrated to be consistent with the model's prediction.
For understanding spike-based neural code, we suggest studying the joint distributions of membrane potentials, as it may help to reveal fundamental constraints in neuronal processing. The multivariate Gaussian assumption is simple enough to allow a thorough analysis (Burak et al., 2009; Tchumatchenko et al., 2010) while we demonstrate that it accurately describes the data. If the actual distribution significantly deviates from a multivariate Gaussian, it is likely that other forms of higher-order interactions will be present among the spike trains.
The present results have direct implications, both conceptually and methodologically, for studying functional networks of neuronal systems (Bullmore and Sporns, 2009). On one hand, for binary time series such as spike trains, we have shown before that using interaction, rather than correlation, is necessary to extract the correct network architecture (Yu et al., 2008, their supplemental Fig. 2). Our current results demonstrate that the true pairwise interactions can only be captured by a model if potential higher-order interactions are taken into account. On the other hand, if one is interested in studying the interaction structure that underlies the observed binary time series (e.g., membrane potentials underlying spike trains) while this deeper level cannot be directly observed, the present results suggest that the DG model may serve as a useful tool to reveal this underlying structure. Studying this structure will significantly simplify the analysis of neuronal functional networks as it dramatically reduces the number of interactions that need to be considered, which, to a first approximation, eliminates the need for considering any interactions above the second order. It allows the conventional graph theoretical analysis to be applied, which likely will yield more incisive results since the dimensionality of the problem has been much reduced.
This study was supported by the Intramural Research Program of the National Institute of Mental Health. Experiments with cat recordings were supported by a Deutsche Forschungsgemeinschaft Grant NI 708/2-1, Hertie Stiftung, Max-Planck Society, and the Frankfurt Institute for Advanced Studies. We thank Andrew Mitz and Richard Saunders for help in data collection and Steven Wise for his support during the initial phase of the project.
- Correspondence should be addressed to Dietmar Plenz, Section on Critical Brain Dynamics, Laboratory of Systems Neuroscience, National Institute of Mental Health, 6001 Executive Boulevard, Bethesda, MD 20892-9663.