Abstract
Signal correlation (rs) is commonly defined as the correlation between the tuning curves of two neurons and is widely used as a metric of tuning similarity. It is fundamental to how populations of neurons represent stimuli and has been central to many studies of neural coding. Yet the classic estimate, Pearson's correlation coefficient,
SIGNIFICANCE STATEMENT Fundamental to how cortical neurons encode information about the environment is their functional similarity, that is, the redundancy in what they encode and their shared noise. These properties have been extensively studied theoretically and experimentally throughout the nervous system, but here we show that a common estimator of functional similarity has confounding biases. We characterize these biases and provide estimators that do not suffer from them. Using our improved estimators, we demonstrate a novel result, that is, there is a positive relationship between tuning curve similarity and amplitude for nearby neurons in the visual cortical motion area MT. We provide a simple stochastic model explaining this relationship and discuss how it would naturally regularize the dimensionality of neural encoding.
Introduction
Signal and noise correlation are fundamental metrics used widely by neurophysiologists to measure the relationship between stimulus-evoked responses of pairs of neurons (Cohen and Kohn, 2011). Signal correlation measures how similar the tuning of one neuron is to that of another across a set of stimuli (Gawne and Richmond, 1993), whereas noise correlation measures the relationship between trial-to-trial variability across two neurons. These metrics are often related—neuron pairs with higher signal correlation often have higher noise correlation (Lee et al., 1998; Bair et al., 2001; Averbeck and Lee, 2003; Cohen and Maunsell, 2009; Ecker et al., 2014). Furthermore, the interaction between signal and noise correlation is important for determining whether correlation increases or decreases information in population codes (Oram et al., 1998; Panzeri et al., 1999; Averbeck et al., 2006; Lyamzin et al., 2015; but see Moreno-Bote et al., 2014). By itself, signal correlation is used for functional clustering (Kiani et al., 2015; Power et al., 2011), measuring invariance (Nandy et al., 2013; Popovkina et al., 2019), and as a metric of coding redundancy (Gawne and Richmond, 1993; Gawne et al., 1996; Vinje and Gallant, 2000). However, the naive estimator of signal correlation has two types of bias, described below, that have not been corrected for in the literature but can lead to artifacts or obscure important relationships.
First, signal correlation is biased toward zero by independent trial-to-trial noise. This bias, mentioned by Gawne and Richmond (1993), arises when experimental estimates of two true underlying tuning curves (Fig. 1A, solid lines) are independently deformed by noise (dashed lines). The estimated relationship between the true tuning curves, for example, a positive correlation as demonstrated in Figure 1B, will weaken as the estimated means are spread separately along the two response axes (circles indicate independent spread). This bias is large when there are too few repeats of each stimulus or when recordings are excessively noisy. Second, signal correlation is biased toward noise correlation. Noted by Rothschild et al., (2010); this occurs when correlated noise induces spurious correlation between estimated tuning curves (Fig. 1C, dashed lines). Here, the trial-to-trial noise imparts a tilted elliptical distribution (Fig. 1D, ovals), strengthening the relationship between the estimated means. This bias can inflate signal correlation and spuriously induce the frequently reported positive relationship between naive estimators for signal and noise correlation. Here, we derive an equation that unifies these two biases, determine under what conditions the biases substantially influence results, and demonstrate how to correct the biases using a method for unbiasing Pearson's r2 (Pospisil and Bair, 2021).
Using our corrected estimator, we discovered a novel characteristic of neuronal tuning while reanalyzing data from a study of interneuronal correlation in pairs of simultaneously recorded neurons in the cortical motion area MT (Bair et al., 2001). Area MT contains a preponderance of direction selective neurons (Zeki, 1974; Maunsell and van Essen, 1983; Albright, 1984; Born and Bradley, 2005), and nearby neurons tend to prefer the same direction of motion (Albright, 1984). Nevertheless, considerable unexplained diversity in signal correlation exists for neurons recorded at the same location in cortex (Bair et al., 2001). We found a significant positive relationship between signal correlation and tuning curve modulation: neuron pairs with higher signal correlation tended to have tuning curves with higher signal-to-noise ratios. This relationship would have been trivial had it been observed using the downwardly noise-biased
Materials and Methods
Stochastic model of neuronal responses
Here we describe stochastic models used in our derivations and simulations to generate spike counts for pairs of neurons responding to simple stimulus ensembles. Our derivations below and the corrected estimator of signal correlation (Pospisil and Bair, 2021) assume homoscedasticity, that is, equal variance of responses across experimental (stimulus) conditions. In the case of neuronal data, a variance stabilizing transform can be applied to make this assumption reasonable. For Poisson distributed spiking, which is a useful approximation to trial-to-trial variability in neuronal firing, the square root is a variance stabilizing transform. Other variance stabilizing transforms may be used, for example, the Box-Cox transformation (Box and Cox, 1964), thus from here on, all appropriately transformed responses are assumed to have equal variance. We modeled the variance stabilized spike counts of neurons X and Y as Xi,j and Y i,j for the presentation of the jth repeat
In most cases, we will use a fixed, sinusoidal tuning curve model. In this case, the tuning curve for neuron X is no longer a function of rs, it is simply a cosine as in the following:
Sample correlation and SNR estimates
Given two sets of responses for two neurons, Xi,j and Y i,j, for m stimuli and n repeats, the typical estimators,
For a single neuron, the SNR can be estimated as follows:
Simulation of correlation between signal and noise correlation
To model a relationship between signal and noise correlation, we assume that zn and zs, the Fisher z transformations,
We use the inverse Fisher transformation to ensure that the support of rn and rs is in [–1, 1]. The correlation of zn and zs, rNS will give similar values to correlating rn and rs, especially if the bulk of the distribution is in [–0.7, 0.7]. The transformed correlation zn and zs, a Fisher's z is a variance stabilizing transform, will have the advantage of asymptotically known and equal variance across different levels of correlation. The estimate
Signal correlation estimate for random stimuli
The calculation of the asymptotic value of signal correlation (as
With the simplifying assumption that the dynamic ranges and noise amplitudes are the same for each neuron, that is,
This can be rewritten in terms of the product of n and the SNR, defined above as
Signal correlation estimate for fixed stimuli
The derivation of the asymptotic value of signal correlation is more complex in the case of fixed stimuli. We take the approach of separately calculating the expected value of the numerator and denominator of estimated signal correlation (Equation 10) then taking the ratio as an approximation to the expected value of estimated signal correlation. For reference, we note the following relevant moments, which follow from the definitions above:
Now consider the numerator of estimated signal correlation, the sample covariance between
For the denominator of the sample signal correlation, we consider the product of the sample variances of
We first find
For
Thus, our approximation of the expected value as the ratio of the expected values of the numerator and denominator of
Electrophysiological data
We reanalyzed data taken from a previous electrophysiological study in which pairs of neurons were isolated and recorded on a single electrode in two awake, fixating macaques (Zohary et al., 1994; Bair et al., 2001). Specifically, we reexamine the data for eight-point direction tuning curves in response to coherently moving dots for 81 pairs of MT neurons. For a detailed description of the visual stimuli, electrophysiological methods, and dataset, see Bair et al. (2001).
We also analyzed the publicly available data (http://dx.doi.org/10.6080/K0NC5Z4X) from V1 of anesthetized macaque monkeys. This dataset includes the spiking activity from three monkeys (n = 106, 88, and 112 single- and multiunit recordings) in response to drifting sinusoidal gratings across 12 directions (1.28 s presentation).
In addition, we analyzed simultaneously recorded responses from the awake mouse (A detailed description can be found at http://observatory.brain-map.org/visualcoding/). In the analyses here, we examined single-unit spiking activity from 75 neurons recorded in VISp of mouse visual cortex in response to a set of 118 natural images. Each natural image was presented for 0.25 s in random order with no intervening blank to achieve ∼50 repeats. For details of electrodes and spike sorting, see the Allen Brain Observatory website (http://observatory.brain-map.org/visualcoding/).
Experimental design and statistical analysis
For experimental design please see above, Electrophysiological data and the original publications (Zohary et al., 1994; Bair et al., 2001). All statistical tests (Figs. 10, 12) are Spearman's rank-order r value two-tailed t tests (Zwillinger and Kokoska, 2000) as implemented in SciPy (Virtanen et al., 2020).
Data availability
Code for calculating estimates of
Results
Our results are organized into five sections. First, we describe analytically the origin of the biases in signal correlation, and second, we describe how to correct for these biases. Third, we examine how the correlation between signal and noise correlation can become inflated spuriously. Fourth, we demonstrate the presence of these confounds in the analysis of neuronal data. Finally, we describe a novel result obtained by using an unbiased estimator of signal correlation.
Signal correlation confounds
The de facto estimators of signal (
Simulations using sinusoidal tuning curves provide intuition into the attenuation and inflation of estimated signal, that is, tuning curve, correlation (
Here we describe the analytic relationship between the typical estimator,
Analytically derived relationship between typical estimators of signal (rs) and noise correlation (rn) for electrophysiologically plausible parameters. The asymptotic value of
The nature of this bias is depicted in Figure 2, which plots
Corrected estimator of signal correlation
A simple strategy to remove the bias toward rn is to compute rs based on repeats that are not recorded simultaneously. After all, rs is a measure of tuning curve similarity, and tuning curves for different neurons do not have to be acquired at the same time. Specifically, the estimated tuning curves from the odd repeats for one neuron and those from the even repeats for the other can be used to measure
The above strategy means that rn is now zero, thus leaving
Simulation-based comparison of signal correlation estimators under varying degrees of SNR, noise correlation, and true signal correlation. The estimators are the naive measure of signal correlation
Whereas the naive r2 is heavily biased downward by trial-to-trial variability,
Correction for the attenuation of correlation coefficients by measurement error has received considerable attention from fields outside neuroscience (Thouless, 1939; Beaton et al., 1979; Rosner and Willett, 1988; Adolph and Hardin, 2007). The most popular correction is given in Spearman (1904). An additional correction for noise correlation is given in Saccenti et al. (2020). Rothschild et al. (2010) to our knowledge are the first to recognize the confound of noise correlation inflating
In Figure 3 (top left) we plot the estimated signal correlation squared as a function of the true signal correlation squared (
When noise correlation is introduced (Fig. 3, top middle, rn = 0.25) we observe the expected increase in
One disadvantage of
Spurious correlation between signal and noise correlation
The inflation of
An example of a simulation where a spurious relation between
A simulation elucidating the inflating effect of noise correlation on the estimate of rNS (Equation 13). A, The true signal and noise correlation (z transformed) for neuronal pairs are randomly sampled from a bivariate normal distribution with correlation 0. The moments of the z transformed distribution of rs are σ = 0.5 and µ = 0.5. The moments of rn are σ = 0.3, µ = 0.1. The SNR of all neurons is low (SNR = 0.1). B, Estimates of signal correlation have a noisy linear relationship to rn (Equation 26, corresponding roughly to top row, middle, purple trace, Fig. 2) despite there being no relationship between the true parameters. C, Estimated signal correlation (y axis) has little relation to true signal correlation (x axis) because of low SNR and the influence of noise correlation. D, Noise correlation estimates (y axis) are themselves noisy, thus there is an imperfect relationship with true noise correlation (x axis) across the population and the estimate. E, Overall, there is a significant correlation between the simulated experimental estimates of signal and noise correlation, despite there being no true correlation between the parameters being estimated (compare to A). The correlation is lower than that of
A simulation demonstrating the attenuating effect of noise on the estimate of rNS. The moments of the z transformed distribution of rs are σ = 0.5 and µ = 0.5. The moments of rn are σ = 0.3, µ = 0.1. The SNR of the neurons are higher here (SNR = 1) than in the previous figure. A, Here the true rn and rs are jointly distributed to have a strong correlation (0.8). B, Signal correlation estimates (y axis) have a strong linear relationship to rn (Equation 26, corresponding roughly to row 1, column 2, orange trace of Fig. 2), yet this is not the result of the spurious correlation in Figure 3: the gray line shows the theoretically predicted linear relationship is weak compared with the relationship in the data points. C, Signal correlation estimates have a strong relation to true signal correlation because of high SNR and corresponding lack of influence of noise correlation. D, Estimates of noise correlation are as noisy as Figure 3 as SNR does not reduce variance in the estimate
Simulation of experiment estimating rNS (Equation 13) as a function of m, n, and SNR. Default values of simulation are: m = 100, n = 10, and SNR = 0.1. Solid lines show
Alternatively, because the estimates
We performed simulations like those in Figures 4 and 5 over a wide range of the key parameters n, m, and SNR and plotted
In summary, split-trial estimation of signal correlation removes the upward bias of
Demonstration of attenuation of signal correlation by trial-to-trial variability and its correction by
Demonstration of confounds in neural data
We have demonstrated the confounds of signal correlation both theoretically and in simulation. Here we demonstrate these confounds in neural data. We first examine the downward bias of signal correlation caused by trial-to-trial variability and then the inflation of
To demonstrate the effect of trial-to-trial variability in attenuating
Demonstration of inflation of
Next, we demonstrate the effect of noise correlation inflating
In a previous comparison of SNRs across datasets (Pospisil and Bair, 2020), we observed that the SNR of the Allen Institute Neuropixel recordings from area VISp of mouse visual cortex was substantially lower than that of the MT data analyzed here (respective median SNRs = 0.16 and 4.0). We repeated the above analysis with this lower SNR dataset and again found that
See below, Discussion, in which we cover a variety of examples where controlling for SNR could potentially make a difference in the scientific conclusions of published studies.
Novel relationship between tuning strength and signal correlation in area MT
Applying the improved estimators described above, we reanalyzed direction-tuning data from a study of signal and noise correlation in simultaneously recorded pairs of well-isolated MT neurons in awake macaques (Zohary et al., 1994; Bair et al., 2001). The dataset from these studies showed wide variation in
Low
We begin by giving insight into our results on the basis of tuning curves for representative pairs of neurons. The direction-tuning curves for a pair of MT neurons are plotted in the top left panel A of Figure 9. Both neurons were strongly modulated by the stimulus (geometric mean SNR = 18.66) and have the same preferred direction (orange and blue curves both peak near 270∘) and similar classic single-cycle sinusoidal tuning profiles. The similarity in tuning is reflected unambiguously by the high
Examples of estimated direction tuning curves for pairs of MT neurons (error bars show SEM; Zohary et al., 1994). A–D, Examples of pairs of tuning curves that have high joint SNR (geometric mean), large
To quantitatively test whether more strongly stimulus-modulated pairs tended to have more closely matched tuning curves, we examined the relationship between SNR and
A positive correlation between signal correlation and SNR. Our unbiased estimate,
We also used
To demonstrate that the SNR-rs relationship is not a trivial result of trial-to-trial variability, we conducted stochastic simulations of a concrete population of pairs of nearby cortical neurons. First, consider the simple scenario in which all MT neurons have sinusoidal direction-tuning curves, and only the amplitude and phase (i.e., preferred direction) can vary. Here we do not induce noise correlation. Figure 11A (left) shows example tuning curves drawn from a uniform distribution of tuning curve amplitudes and preferred directions that are chosen independently, with preferred direction limited to a narrow range (
Simulation of MT neuronal data under different assumptions of tuning statistics across a population. A, Plots of example tuning curves. Parameters of tuning curves are phase, which sets preferred direction, and amplitude. The distribution of phase is uniform between 135° and 225°, amplitude is uniformly distributed from 1 to 30 spikes, and the distributions are independent. Spike counts are drawn from Poisson distributions with means set by tuning curves. In right column, tuning curves are not smooth because noise is added. B, Results of simulation of an experiment where there are 8 directions of motion, 10 repeats, and 100 pairs of neurons. Spearman's ranked correlation is measured between the geometric mean of SNR and
In a second scenario, we made a simple addition to reflect our observation that real neurons in area MT with lower dynamic range often have more irregular tuning curves (e.g., Fig. 9E–H, blue curves). Specifically, we added constant SD Gaussian noise to each simulated tuning curve (thus we modified the tuning curve shape), independent of the amplitude or phase of the tuning curves. An example of a pair of tuning curves with moderate SNR (Fig. 11A, right) reveals that the underlying true tuning curves are no longer perfectly sinusoidal. Crucially, the relative deviation from sinusoidal is larger at lower SNR because the average magnitude of the deforming component is constant. Under this scenario, the correlation between SNR and signal correlation is positive when measured using our corrected estimate,
If this simple model is correct, it predicts that sinusoidal fits to tuning curves should improve with SNR. If we used the naive estimator,
Relationship between SNR and sinusoid model fit quality. Color keys give Spearman's rank-order r and p value of two-tailed t test. A, Fit quality (
Here we have uncovered a new relationship between SNR and signal correlation in area MT. If the relationship had been estimated using classic measures, it would be confounded. We proposed a simple model sufficient to explain this trend, and we demonstrated in simulation that the corrected estimator both avoids confounds and reveals the relationship between SNR and signal correlation. We provided evidence that the data were consistent with our model by showing that low SNR MT neurons were less well fit by a sinusoid. To test whether this generalizes beyond this one dataset, we conducted a similar analysis of V1 data (Kohn laboratory, (http://dx.doi.org/10.6080/K0NC5Z4X) for three monkeys, but for orientation tuning, which is more common in V1, rather than for direction tuning (see Methods). Figure 12B shows that a similar trend held for all three animals; recordings with higher SNR tended to be better fit by sinusoidal tuning curves after factoring out the influence of noise.
Discussion
We have examined two biases in the estimation of signal correlation—attenuation by trial-to-trial variability and a bias toward the noise correlation. To address these biases, we measured signal correlation across trials that were not simultaneously recorded (
Practical advantages and further applications of r ̂ ER-split 2
Neurophysiologists can, and have, tempered the biases examined here by averaging across stimulus repeats, but this never wholly removes the biases and it restricts the number of stimuli (m) that can be presented during an experiment. A major practical benefit of using the estimator
As the simultaneous recording of large neuronal populations becomes more common, accounting for the effects of correlated and independent trial-to-trial variability is increasingly important for analyses that seek to understand similarities and differences in tuning across neuronal populations. Signal correlation is the basis of several multivariate data analysis methods. Examples, often used in the neurosciences, include canonical correlation analysis (for review see Zhuang et al., 2020); representational similarity analysis (for review see Kriegeskorte and Kievit, 2013); and principal component analysis. Further work should aim to quantify the effect of correlated noise on the estimation of these quantities and to determine whether applying our estimator,
Prior work
The estimator
To our knowledge, we are the first to show a relationship between SNR and signal correlation in vivo, controlling for trial-to-trial variability. The confounds we have discussed have rarely been addressed in the neuroscience literature.
Reviewing prior experimental studies, we found the most common potential confound involves comparing signal correlation across conditions where SNR could plausibly differ, which is often the case when one stimulus set drives weaker tuning curve modulation than another. For example Vinje and Gallant (2000) report less signal correlation in macaque V1 for full field stimuli compared with stimuli restricted to the RF; however, the suppressive surround could have driven down the dynamic range of responses (Angelucci et al., 2017). Gawne et al. (1996) reported higher signal correlation in V1 for bars than for Walsh patterns; however, they do not control for differences in dynamic range for the two stimuli. Averbeck and Lee (2003); study macaque supplementary motor area, reporting signal correlation increases with spike counting bin width; however, SNR would also likely increase with bin width. Solarana et al. (2019) examined the effects of sensory deprivation in mouse primary auditory cortex, finding a decrease in signal correlation with deprivation; however, they did not determine whether this could result from differences in dynamic range. These are just a few of the relevant examples where there is great potential value in controlling for trial-to-trial variability when studying signal correlation.
The bias of signal correlation by trial-to-trial variability that we have examined tends to be downward, with an appreciable upward bias only for low SNR and atypically high noise correlation (e.g., Fig. 3, top right, blue trace). On the other hand, the bias of
SNR-signal correlation relationship
We found a positive relationship between SNR and signal correlation in area MT for the robustly encoded stimulus dimension of motion direction. This raises the possibility of a potentially ubiquitous connection between signal correlation and stimulus choice. Specifically, nearby neurons robustly encoding the same stimulus dimensions would tend to have similar tuning along those dimensions, and those that weakly encode those stimulus dimensions would not. A consequence is that signal correlation for one set of stimuli may not generalize to other sets. Although variation in noise correlation as a function of the stimulus set has been well studied (Cohen and Kohn, 2011), variation in signal correlation, unconfounded by trial-to-trial variability, has not been explored.
We modeled this relationship as the result of fixed-amplitude noise added to tuning curves of varying amplitude. We do not specify the source of this tuning curve noise, but future theoretical work could focus on its relationship to the maintenance of synaptic connectivity (Lin and Koleske, 2010) or ongoing plasticity in response to variable inputs (Rokni et al., 2007; Duffy et al., 2019).
In terms of sensory encoding, our results can be interpreted with respect to the dimensionality of the population representation of stimuli. Our MT data suggests that tuning of neurons with higher SNR (more strongly modulated) can be well captured by a low dimensional model, whereas lower SNR (poorly modulated) neurons cannot. Specifically, strongly modulated MT neurons were well fit by a sinusoidal model and thus were constrained to the two-dimensional (2D) plane defined by amplitude and phase, whereas less modulated neurons diverged from this plane (Fig. 12). This has potential consequences for downstream computation. Consider a downstream neuron building a new tuning curve as a linear function of these inputs; if inputs were only high SNR neurons in the 2D plane, then the resulting tuning of the neuron would remain in the 2D plane. However, more complex tuning would require lower SNR inputs with more diverse tuning curves, and achieving high SNR for this would require combining more of the low SNR inputs. Assuming some cost to synapse formation, the steeper the relation between SNR and signal correlation, the greater the cost to increase dimensionality. Low- versus high-dimensional representations have different advantages. In low-dimensional representations, functions can be learned with fewer samples, and estimation of underlying input drive is more resistant to noise. Whereas high-dimensional representations can be more flexible in the functions they compute. Further research is required to determine whether the form of dimensionality regularization for a given cortical region implied by a particular SNR-rs relationship has normative advantages for cortical encoding.
Footnotes
This work was supported by a National Science Foundation Graduate Research Fellowship DGE-1256082 (D.A.P.), National Institutes of Health Grant NEI R01-EY02999, and National Institutes of Health Grant NEI R01-EY027023 (W.B.). We thank Adam Kohn for reading and providing comments on the manuscript. We thank Ehud Zohary and William T. Newsome for sharing data and making it publicly available. We thank Greg Horwitz, Anitha Pasupathy, and Matthew Farrell for helpful discussions.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dean A. Pospisil at deanp3{at}uw.edu