## Abstract

Two sensory neurons usually display trial-by-trial spike-count correlations given the repeated representations of a stimulus. The effects of such response correlations on population-level sensory coding have been the focal contention in computational neuroscience over the past few years. In the meantime, multivariate pattern analysis (MVPA) has become the leading analysis approach in functional magnetic resonance imaging (fMRI), but the effects of response correlations among voxel populations remain underexplored. Here, instead of conventional MVPA analysis, we calculate linear Fisher information of population responses in human visual cortex (five males, one female) and hypothetically remove response correlations between voxels. We found that voxelwise response correlations generally enhance stimulus information, a result standing in stark contrast to the detrimental effects of response correlations reported in empirical neurophysiological studies. By voxel-encoding modeling, we further show that these two seemingly opposite effects actually can coexist within the primate visual system. Furthermore, we use principal component analysis to decompose stimulus information in population responses onto different principal dimensions in a high-dimensional representational space. Interestingly, response correlations simultaneously reduce and enhance information on higher- and lower-variance principal dimensions, respectively. The relative strength of the two antagonistic effects within the same computational framework produces the apparent discrepancy in the effects of response correlations in neuronal and voxel populations. Our results suggest that multivariate fMRI data contain rich statistical structures that are directly related to sensory information representation, and the general computational framework to analyze neuronal and voxel population responses can be applied in many types of neural measurements.

**SIGNIFICANCE STATEMENT** Despite the vast research interest in the effect of spike-count noise correlations on population codes in neurophysiology, it remains unclear how the response correlations between voxels influence MVPA in human imaging. We used an information-theoretic approach and showed that unlike the detrimental effects of response correlations reported in neurophysiology, voxelwise response correlations generally improve sensory coding. We conducted a series of in-depth analyses and demonstrated that neuronal and voxel response correlations can coexist within the visual system and share some common computational mechanisms. These results shed new light on how the population codes of sensory information can be evaluated via different neural measurements.

- functional magnetic resonance imaging
- multivariate pattern analysis
- population codes
- spike-count noise correlation
- visual cortex

## Introduction

It is well established that even simple sensory stimuli can activate a large group of neurons, a phenomenon called population codes in computational neuroscience (Averbeck et al., 2006; Kohn et al., 2016). The most distinctive feature of population codes, compared with single-unit recording data, is the trial-by-trial response correlations (RCs) between units. How correlated variability affects the fidelity of population codes has been a focal contention in neuroscience in recent years. On the one hand, theoretical studies have shown that the effects of neuronal RCs depend on many factors, such as the form of the RC, tuning heterogeneity, or its relevance to behavior (Zohary et al., 1994; Sompolinsky et al., 2001; Averbeck et al., 2006; Shamir and Sompolinsky, 2006; Haefner et al., 2013; Kohn et al., 2016). On the other hand, empirical physiological studies have consistently shown that improved behavioral performance is accompanied by a reduction of RC in a number of cognitive processes, such as locomotion (Vinck et al., 2015), task engagement (Downer et al., 2015), wakefulness (Reimer et al., 2014), and perceptual training (Gu et al., 2011), suggesting the detrimental role of RC in sensory coding.

In contrast to invasive neurophysiology, functional magnetic resonance imaging (fMRI) is a noninvasive technique and naturally measures the responses of many units in the brain, although the basic unit is voxel rather than neuron. Here, we use “population codes” as a unified term for both neurophysiology (i.e., a population of neurons) and fMRI (i.e., a population of voxels). In fMRI research, multivariate pattern analysis (MVPA) has become a mainstream approach to interpret sensory coding (Haxby et al., 2014). In the fMRI literature, correlations between the trial-by-trial responses (i.e., beta weights) of two voxels (Fig. 1*D*) are referred to as Beta-series correlation (Rissman et al., 2004; Di et al., 2021). Similarly, trial-by-trial spike-count correlations between neurons are called noise correlation in neurophysiology (Cohen and Kohn, 2011). We hereafter unify both Beta-series correlation in fMRI and noise correlation in neurophysiology as response correlation.

In fMRI, the quality of stimulus encoding is usually operationally defined as multivariate decoding accuracy; that is, a higher decoding accuracy indicates higher fidelity of stimulus encoding (Haynes and Rees, 2005; Kamitani and Tong, 2005). This multivariate decoding approach has been widely used to study the neural mechanisms of various cognitive processes, such as attention (Ling et al., 2015) and learning (Jehee et al., 2011). Compared with the rich theoretical and empirical evidence for the effects of RCs in neurophysiology, little is known about how voxelwise RCs affect MVPA decoding accuracy in fMRI. Unveiling the effects of RCs can provide insight into the mechanisms of many cognitive processes. For example, if decoding accuracy is elevated by a particular cognitive process (e.g., attention), it is unclear whether it warps response correlations (Fig. 2*C*) or simply reduces the response variability of individual voxels (Fig. 2*D*).

Here, we combine information-theoretic and conventional decoding approaches to analyze the fidelity of stimulus encoding in multivariate fMRI data. Specifically, we directly calculate the linear Fisher information of stimuli encoded by trial-by-trial multivariate voxel responses. Surprisingly, in contrast to the detrimental effects documented in neurophysiological literature, stimulus information follows a U-shaped function of RC strength and ultimately is higher than the scenario without RC. We built voxel-encoding models to show that these two apparently opposite effects can coexist in the visual system. Further computational analyses reveal that the apparent difference in the effects of RCs in neuronal and voxel populations share some common mechanisms; RCs reduce and simultaneously enhance information in high- and medium- or low-variance eigendimensions, respectively. This unified mechanism of RC not only helps to resolve the long-standing debate about the effects of RCs on sensory coding but also extends the conventional understanding of the computational nature of MVPA. Moreover, the methods used here to analyze population responses are general and can be applied to a wide range of scenarios for deciphering multivariate neural data.

## Materials and Methods

#### fMRI experiment

##### Stimuli and task

We analyzed the fMRI data collected by Sengupta et al. (2017). The datasets are publicly available on the OpenNeuro website (https://openneuro.org/datasets/ds000113/versions/1.3.0). Six subjects (five males) participated in the study. Briefly, two flickering sine-wave gratings (0.8°–7.6° eccentricity, 160° angular width on each visual hemifield with a 20° gap on the vertical meridian) were presented on both sides of the fixation point. The orientations of the two gratings varied independently across trials. The orientations were drawn from 0°, 45°, 90°, and 135° with equal probabilities. A stream of letters was presented at the center of gaze throughout each scanning run. Subjects were instructed to maintain their fixation and perform a reading task. To ensure subjects' task engagement, they were tested on a question related to the reading text at the end of each run.

On each trial, an orientation stimulus lasted 3 s and was followed by a 5 s blank. Each scanning run consisted of 30 trials. The 30 trials also included 10 randomized blank trials. The first trial could not be a blank trial, and there were no two consecutive blank trials. Blank trials could appear on either side, whereas the grating on the other side was intact. Each subject completed 10 scanning runs. Because of the setting of blank trials setting, each stimulus was presented for 60–70 trials in each subject.

##### MRI data acquisition and processing

A T2*-weighed echo-planar imaging (TR/TE = 2000/22 ms) sequence was used to acquire fMRI data. In the original experiment, the subjects were scanned at four different resolutions. Here, we only analyzed the 2 mm isotropic data because they gave the best decoding accuracy (Sengupta et al., 2017). One hundred twenty-one functional volumes were acquired in each run (FOV = 200 mm, matrix size 100 × 100, 37 slices, GRAPPA acceleration factor 3). The echo-planar images covered the occipital and parietal lobes. Each subject was also scanned for a high-resolution T1-weighted image (0.67 mm isotropic). Additionally, we conducted standard retinotopic scans to define low-level visual areas.

The pial and white surfaces were reconstructed based on the high-resolution T1-weighted images using the standard FreeSurfer software pipeline. All functional volumes underwent slice-timing correction, motion correction, and registration to T1 images. Retinotopic data were analyzed using the 3dRetinophase tool in Analysis of Functional NeuroImages (AFNI) software to generate polar angle and eccentricity maps on cortical surfaces.

Bilateral V1–V3 were defined on spherical cortical surfaces based on polar angle and eccentricity maps. The vertices in each region of interest (ROI) on cortical surfaces were then transformed back to EPI space using the AFNI 3dSurf2Vol function to locate corresponding voxels.

We used the AFNI 3dDeconvolve function to build general linear models (GLMs). Particularly, the -stim_times_IM option of the function was used to model each presentation of stimuli (i.e., trial) as an independent predictor. We implemented two GLMs separately for two hemispheres because of independent stimuli presentations for each hemisphere, and each GLM only included the stimuli presented contralaterally. In each GLM, demeaned head motion parameters and constant, linear, and quadratic polynomial terms were also included as nuisance predictors. In each subject, we concatenated the time series of 10 scanning runs and fitted a single GLM to the combined data.

#### Calculation of linear Fisher information in population responses

In computational neuroscience, Fisher information is a standard metric that assesses the amount of information carried by observed neuronal responses *r* with respect to stimulus variable *s*. Two signatures of *r* should be noted. First, the neuronal response *r* is high dimensional (i.e., a vector) because it represents the responses of many neurons or voxels. Second, because of trial-by-trial variability, the *r* across trials follows a multivariate response distribution in the high-dimensional space. If the response distribution belongs to the exponential family with linear sufficient statistics (i.e., information can be optimally decoded via a linear decoder), Fisher information can be further simplified as linear Fisher information (Beck et al., 2011). Hereafter, we call it “information” for short.

Suppose that in an experiment we attempt to discriminate two stimuli, *s _{1}* and

*s*

_{2}, based on the measured responses of

*N*voxels in

*T*trials for each stimulus. This is a typical binary classification scenario in fMRI (Haynes and Rees, 2005; Kamitani and Tong, 2005). Here, Fisher information can be understood as the extent to which the two stimuli can be discriminated based on population responses.

In the ideal case that we have an infinite number of trials *T*, the information delivered by population responses can be written as follows:
*N* × 1, indicating the standard deviation of responses of *N* units. C is the response correlation matrix.

Equation 1 has several important implications. First, stimulus information depends on (1) the Euclidean distance between the two high-dimensional response distributions [i.e.,

Any real experiment, however, must have a finite number of trials *T*. The estimations of the statistical properties (e.g., *i*th voxel across trials, and *i*th voxel across trials. We used Equation 4 to calculate stimulus information in the empirical data (Fig. 3*B*, puce bars) and Equation 5 to calculate stimulus information when RCs are removed (Fig. 3*B*, gray bars). Note that although linear Fisher information per se (Eq. 1) has no assumption of Gaussian variability, the analytical solution for bias correction assumes Gaussian variability (Eqs. 4, 5). But it has also been shown that this bias correction solution is also robust to non-Gaussian cases (Kanitscheider et al., 2015). Also, the bias-correction method is only valid when *T* > (*N* + 2)/2. In our experiment, each stimulus was presented for 60–70 trials. We thus chose *N* = 50 in all neuron- and voxel-encoding modeling below and also when analyzing empirical fMRI data.

For each orientation, we defined the information of each orientation as the averaged information of pairwise discriminating that orientation and the other three orientations. For example, the information of 0° is defined as follows:

##### Multivariate Gaussian variability of trial-by-trial voxel responses

Linear Fisher information has no assumption about Gaussian variability of trial-by-trial responses. The analytical solution of bias correction given a limited number of units and samples, however, rests on the assumption that the trial-by-trial population responses follow a multivariate Gaussian distribution. Although Gaussian variability of voxel responses has been either implicitly assumed or explicitly formulated in several previous studies (van Bergen et al., 2015; van Bergen and Jehee, 2018), it remains a presumption rather than a grounded finding.

We tested whether trial-by-trial population responses follow multivariate Gaussian distribution in our data. For each stimulus, the data can be represented as an *N* (voxels) × *T* (trials) matrix (Fig. 1*B*). Here, voxels and trials can be viewed as features and samples, respectively. We performed the Henze–Zirkler multivariate normality test (implemented by the multivariate_normality function from the pingouin Python package) on the data matrix. The Henze–Zirkler test has good power against alternatives to normality and is feasible for any dimension and sample size (Henze and Zirkler, 1990). We performed the Henze–Zirkler multivariate normality test for the voxels in each ROI, hemisphere, and subject. We found no significant violation (all *p* values > 0.05) of multivariate Gaussian variability in a total of 144 tests (6 subjects × 2 hemispheres × 3 ROIs × 4 stimuli), revealing multivariate Gaussian distributions as an appropriate approximation of trial-by-trial voxel responses.

#### The titration approach to derive information as a function of response correlation strength

We parametrically manipulated the strength of RCs between voxels. We multiplied the off-diagonal items in the covariance matrices *i*th row and the *j*th column in *i* = *j* and

#### Neuron- and voxel-encoding modeling

##### Neuron-encoding model

We simulated 50 orientation-selective neurons whose preferred orientations equally span within (0, π]. The von Mises tuning curve of the *k*th neuron is the following:

Two types of RC structures were investigated here. The first one is tuning-compatible RC (TCRC), indicating that the RC between a pair of neurons is proportional to the similarity of their tuning functions as follows:
*i*th and the *j*th neurons, and *i* = *j* and

We also investigated a control type of RCs called tuning-irrelevant response correlation (TIRC). We shuffled the pairwise RCs in

We also applied the titration approach (response correlation coefficient

##### Voxel-encoding model

The voxel-encoding model is built based on the neuron-encoding model. Given the 50 orientation-selective neurons, we further simulated 50 voxels using the voxel-encoding model as follows:
*i*th voxel, and *k*th neuron and the *i*th voxel. The weighting matrix *W* maps neuronal tuning curves to voxel tuning curves. Note that Equation 10 also suggests that voxel tuning curves look irregular (e.g., multimodal tuning curves) compared with the unimodal bell-shaped tuning curve of neurons (Zhang et al., 2019). However, orientation decoding per se does not require formal tuning curves as long as the voxels show differential responses to two orientations. Instead of Poisson noise, we assumed additive noise on voxel activity the response variance (*a* = 1.5 and *b* = 4 are the scale and the shape parameters corresponding to the gamma distribution with mean = 6, and variance = 24.

Similar to the neuron-encoding model, we also assumed tuning-compatible response correlations between voxels as follows:
*i*th and *j*th voxels. The voxel correlation coefficient

We manipulated *D*). Because the voxel tuning curves and response variance depend on the weighting matrix *W* (Eq. 10) and the gamma distribution, respectively. We run the simulation 100 times, and in each simulation, a new weighting matrix *W* is created by the following:

#### Eigen-decomposition of information

Equation 1 can be further reformulated as follows:
*i*th unit eigenvector and its corresponding eigenvalue; *i*th eigendimension; *i*th eigendimension. Thus *i*th eigendimension. Because eigendimensions are linearly uncorrelated, the total information should be the sum of information across all eigendimensions.

#### Theoretical derivation of tuning-compatible response correlations in voxel populations

To derivate the theoretical feasibility of tuning-compatible RCs in voxel populations, we simulated the RC matrix using the voxel-encoding model 1000 times. In each simulation, we randomly generated the weighting matrix *W* (using Eq. 12) and only considered the variability propagated from the neuronal to the voxel level. Tuning curves of voxels can be computed based on *W* (Eq. 10), and the tuning similarity between the *i*th and the *j*th voxels can be computed by *s*, because of Poisson variance, the covariance matrix (

And the RC matrix *A*. For the distribution of the *r* value of 1000 simulations, see Figure 6*B*.

### The link between linear Fisher information and the discrimination thresholds of the optimal linear decoder

Fisher information can be converted to orientation discrimination threshold (*I* is Fisher information, ACC is the accuracy to which the threshold corresponds, and ^{−2} information corresponds to the discrimination threshold

## Results

### Voxelwise response correlations enhance stimulus information in human early visual cortex

We analyzed the trial-by-trial voxel activity in early visual cortex when six human participants viewed gratings with four orientations (i.e., 0°, 45°, 90°, and 135°; see above, Materials and Methods; Fig. 3*A*). As in classical decoding experiments (Kamitani and Tong, 2005), each orientation was presented for several trials, allowing us to estimate the single-trial activity of individual voxels. Instead of the conventional decoding approach, we calculated linear Fisher information—how well two stimuli can be discriminated based on population responses—in early visual areas (i.e., V1–V3) of all subjects. The calculation of linear Fisher information requires estimates of the response mean and variance of individual voxels as well as the response correlation matrices between voxels (Eqs. 1–3). As we show later, this information-theoretic approach is advantageous because it allows one to symmetrically manipulate RC strength in the data and examine its consequences.

To demonstrate the effects of trial-by-trial RCs, we compared stimulus information in each brain region under two regimes. In one regime, linear Fisher information was computed directly from empirically measured voxel responses with all voxelwise RCs fully preserved. In the other regime, linear Fisher information was calculated with all voxelwise RCs being hypothetically removed. To do so, we manually set the RC matrices to identity matrices (Eqs. 1–3). Notably, this procedure forces the RCs between voxels to be zero (i.e., removes all RCs) without changing the marginal response distributions (i.e., mean and variance) of individual voxels. Intuitively, this is equivalent to warping the two response distributions in Figure 2, *B* and *C* (i.e., RCs are nonzero), to Figure 2*A* (i.e., RCs are zero). This method has been used to illustrate the effects of RCs in previous studies (Kanitscheider et al., 2015; Montijn et al., 2019). Note that the estimated statistical properties (e.g., mean, covariance) given a limited number of units and trials may introduce bias in the estimation of information. Therefore, we computed bias-corrected linear Fisher information under these two regimes (Kanitscheider et al., 2015).

We found that the information with RCs being preserved is significantly higher than that with RCs being hypothetically removed, suggesting that voxelwise RCs in general enhance stimulus information. Our finding here stands in contrast to the findings in neurophysiology, where the majority of animal studies demonstrated the detrimental effects of neuronal RCs. This discrepancy is explained later.

### Information as a U-shaped function of correlation strength in voxel populations

The above analyses demonstrate the potential beneficial effects of voxelwise RCs by contrasting two regimes; voxelwise RCs are completely preserved or removed. The result shows that voxelwise RCs seem to enhance stimulus information. However, according to our previous theoretical work (Di et al., 2021), the relationships between RCs and stimulus strength may not be monotonic. To gain a full picture of the possible effects of RCs, we further used a titration approach; we computed stimulus information while progressively manipulating RC strength. Specifically, all off-diagonal elements of empirically measured RC matrices were multiplied by an RC coefficient (i.e., *A*). In other words, the two regimes in Figure 3 can be viewed as the two special cases (i.e.,

Interestingly, we found that although RCs improve information when all RCs are preserved, and the effects of RCs are nonmonotonic; stimulus information first declines then rises with increasing RC strength, manifesting as a U-shaped function. The ebb and flow can reach half and double magnitude of that without RCs. This result held for all four orientations and in all three ROIs. Note that some previous studies examined the effects of RCs by shuffling population responses across trials to remove RCs and then comparing decoding accuracy between unshuffled and shuffled data (Cohen and Maunsell, 2009). That approach is similar to the analyses in Figure 3. However, the shuffling approach has two major shortcomings, namely, it has been proven to be less efficient when the number of trials and units are relatively small (Kanitscheider et al., 2015), and it cannot easily reveal the full changing trend of information when RC strength is progressively manipulated (e.g., the U-shaped function observed here).

### Voxel-encoding models explain the discrepancy between the effects of response correlations in neuronal and voxel populations

Why does stimulus information exhibit a U-shaped function as RC strength increases? More importantly, why do the effects of RCs manifest differently in neurophysiological and fMRI data? One notable difference between neuronal and voxel activity is that voxel activity is thought to reflect the aggregation of the activity of many neurons. We then show that this neuron-to-voxel activity summation naturally produces the observed U-shaped function.

We first sought to replicate the classical detrimental effects of neuronal RCs using a neuron-encoding model and applied the same data analyses to fMRI data. One well-established finding in neurophysiology is that the response correlation between two neurons is proportional to their tuning similarity, called tuning-compatible RC (Kohn et al., 2016). In the neuron-encoding model, we simulated 50 orientation-selective neurons with Poisson response statistics and assumed the tuning-compatible RCs between the neurons (Fig. 5*A*; Eq. 7). We also simulated a control type of RCs, called tuning-irrelevant response correlations. For tuning-irrelevant RC, the elements in RC matrices are identical to those in tuning-compatible RC matrices but are rearranged across columns and rows so that the RC between a pair of voxels bears no resemblance to their tuning similarity (van Bergen and Jehee, 2018). We found that tuning-compatible RCs always degrade information (Fig. 5*C*, green line), whereas tuning-irrelevant RCs always enhance information (Fig. 5*C*, orange line) in the neuron-encoding model. These results are consistent with a wide range of empirical or theoretical studies in neurophysiology (Zohary et al., 1994; Cohen and Kohn, 2011; Downer et al., 2015; Vinck et al., 2015).

Next, we turn to modeling the effects of voxelwise RCs on sensory information in fMRI data. Before doing so, we first investigated the structure of voxelwise RCs. It remains unclear whether tuning-compatible RCs also exist in fMRI data. To our best knowledge, only one study found evidence for the existence of tuning-compatible RCs in fMRI (van Bergen and Jehee, 2018). We thus quantified the relationship between tuning similarity and response correlations in our empirical data (see above, Materials and Methods). We found that there is a significantly positive relationship between tuning similarity and response correlations in all 144 tests (all correlation *p* values < 0.05, 6 subjects × 2 hemispheres × 3 ROIs × 4 stimuli; see above, Materials and Methods), supporting the existence of tuning-compatible RCs in realistic fMRI data. Furthermore, previous fMRI studies have suggested that spatial distance may be an another important factor mediating RCs between voxels (Haak et al., 2013; Ryu and Lee, 2018). Therefore, we run a regression analysis with tuning similarity and 3D spatial distance of two voxels as the predictors of their response correlations. The significant positive relationship between tuning similarity and response correlation was still observed in 138 of 144 tests, further supporting their functional links.

With tuning-compatible RCs being established in fMRI, we next established a voxel-encoding model that incorporates an additional layer of voxel activity on top of the neuron-encoding model and assumes voxel activity as linear combinations of underlying neuronal activity (see above, Materials and Methods; Fig. 5*B*). This encoding-model approach has been widely used in fMRI studies on a variety of topics, including attention (Sprague and Serences, 2013), memory (Ester et al., 2015), and learning (Chen et al., 2015). Note that in this voxel-encoding model the parameters (e.g., amplitude, baseline firing rate, tuning width) at the neuron-encoding stage are identical to those in the neuron-encoding model above. Similarly, we assumed that the voxelwise RCs were (1) either proportional to their tuning similarity (i.e., tuning-compatible RCs) or as a control and (2) unrelated to their tunings (i.e., tuning-irrelevant RCs). Interestingly, the model with tuning-compatible RCs can well reproduce the U-shaped information function (Fig. 5*D*, green line). Like neurons, the tuning-irrelevant RCs between voxels always improve stimulus information (Fig. 5*D*, orange line). This quantitative voxel-encoding modeling suggests that the detrimental effects of RCs at the neuronal level and the potential beneficial effects of RCs at the voxel level can coexist and that we obtain opposite results simply because brain activity is acquired at different spatial scales (i.e., neurons at the microscopic level and voxels at the mesoscopic level).

These results are also consistent with our previous theoretical work that voxel tuning heterogeneity contributes to the U-shaped function of the effects of voxelwise RCs (Zhang et al., 2020). In contrast to the relatively homogeneous tuning curves in the neuron-encoding model (i.e., similar peak, baseline of tuning functions) of individual neurons, voxel tuning curves computed by the aggregations of neuronal activity are remarkably heterogeneous (i.e., different baseline, response range). And such tuning heterogeneity attenuates the detrimental effects of RC. We show this more formally later.

Using the voxel-encoding model, we also investigated the nature of tuning-compatible RCs. Although we found tuning-compatible RCs in our empirical voxel responses, it remains theoretically elusive why they should exist at all. We hypothesized that the mapping from neuronal to voxel activity naturally produces tuning-compatible RCs between voxels. We first showed analytically that the weighting matrix *W* bridges the (co)variability from neurons to voxels (Eq. 13; van Bergen et al., 2015). Then, using the above voxel-encoding model and a weighting matrix *W*, we calculated the tuning similarity and response correlation between each pair of voxels in the voxel-encoding model and examined their relationships (Fig. 6*A*, example scatter plot). We repeated this calculation 1000 times and found a significant positive relationship between tuning similarity and response correlations between voxels (bootstrap test, *p* < 0.013; Fig. 6*B*). In other words, we theoretically proved that there should exist tuning-compatible RCs between voxels as long as voxel activity is considered as an aggregation of underlying neuronal activity.

### Opposite effects of response correlations on stimulus information in neuronal and voxel populations revealed by eigen information decomposition analysis

The above results only show that neuron- and voxel-encoding models, which assume voxel activity as aggregations of underlying neuronal activity, can well reproduce the effects of RCs observed in empirical data. However, what are the exact computational mechanisms underlying this discrepancy? Why does the transformation from the neuronal to the voxel level alter the effect of RCs? To address this question, we first introduce the method of information decomposition and then describe how it can serve as a unified mathematical framework for understanding information coding in multivariate data.

The key formulation of information decomposition (as shown below; Eqs. 1, 13) is to convert the computation of linear Fisher information (I) in the original neuron/voxel space (*C*, the red vector). Q is the averaged covariance matrix of the voxel responses toward two stimuli (Eq. 3); *i*th eigenvector and its corresponding eigenvalue (i.e., variance) of Q, respectively. This approach avoids the tedious matrix inversion (

The intuition of this formulation can be illustrated in the two-voxel scenario in Figure 7. In Figure 7*A*, the two voxels/neurons have substantial RCs, and the majority of the variance lies in the first eigendimension [i.e., principal component (PC)]. Here, *i*th PC, and

We first performed the information decomposition analysis to examine the effects of tuning-compatible RCs in the neuron-encoding model. Specifically, we calculated the eigenvalue (*A*), the squared projected signal (*B*), their ratio (i.e., the information on that PC, *C*) on each PC, the cumulated information across the first few PCs (Fig. 8*D*), and the total information (Fig. 8*E*) as a function of RC strength. We found that the variance of population responses is concentrated in the first few PCs as RC strength increases (Fig. 8*A*). This is not surprising because increasing RCs inevitably produce a low-dimensional manifold along which population activity fluctuates. Interestingly, increasing tuning-compatible RCs also heightens the projected signals *C*). Although there are some nonmonotonic effects (e.g., information increased and then decreased on the first PC; medium strength RCs enhance information on a few low-variance PCs), the overall effect of information reduction is much more dominant, resulting in less total information in neuronal populations (Fig. 8*D*). In other words, the detrimental effects of tuning-compatible RCs in neurons (Fig. 8*E*) are primarily driven by information reduction on high-variance PCs.

We then applied the same analyses to examine the effects of tuning-compatible RCs in the voxel-encoding model. Similar to neurons, tuning-compatible RCs between voxels induce higher variance (Fig. 8*F*) and higher projected signals (Fig. 8*G*) on the first few PCs. Thus, it is nontrivial to predict the changing trend of their ratio *H*, blue arrow) and low-variance PCs (Fig. 8*H*, green arrow), respectively. In contrast to the scenario of neurons, these two antagonistic effects together produce a U-shaped function (Fig. 8*I*,*J*). In addition, the information enhancements on low-variance PCs dominate, eventually resulting in an overall larger amount of information (Fig. 8*I*,*J*).

We further performed the same analyses on the empirical fMRI data in V1. We found similar results as predicted by the voxel-encoding model. Increasing RCs inevitably produces a few high-variance PCs (Fig. 8*K*) and stronger projected signals on these PCs (Fig. 8*L*). Although RCs reduce the information on those high-variance PCs (Fig. 8*M*,*N*, blue arrows), they also disproportionately enhance information on low-variance PCs (Fig. 8*M*,*N*, green arrows), producing a U-shaped function of total information (Fig. 8*O*). These results are consistent with the above voxel-encoding model. Analyses in V2 (Fig. 8*P*) and V3 (Fig. 8*Q*) show highly similar results to those in V1. As a control, we also performed the information decomposition analysis on the neuron- and voxel-encoding models assuming tuning-irrelevant RCs. The results showed that tuning-irrelevant RCs significantly attenuate the information reduction effect on high-variance PCs, and the overall enhanced information comes mainly from the information enhancements on low-variance PCs (Fig. 9).

In summary, despite the seemingly drastically opposite effects of tuning-compatible RCs in neuronal and voxel populations, we found a unified mechanism underlying the two scenarios. In both types of data, increasing tuning-compatible RCs has two antagonistic consequences—information enhancement on high variance PCs and information enhancement on low-variance PCs. The relative balance of these two effects determines the exact shape of the information function as RC strength increases because a continuum of possible effects (i.e., monotonic increasing/decreasing, U-shaped function) can occur. For neurons, the effect of information reduction dominates, and thus tuning-compatible RCs monotonically reduce information. For voxels, the effect of information enhancement is more pronounced, resulting in the U-shaped function of information. These effects are further validated in the empirical fMRI data.

These results complicate the interpretations of improved MVPA accuracy. We know that linear Fisher information is monotonically related to MVPA accuracy and sensory information is a U-shape function of RC here. Thus, enhanced sensory information (i.e., improved MVPA accuracy) thus can arise from many possibilities. For example, both increasing and decreasing RCs can lead to enhanced information as long as the baseline condition lies at the ridge part of the U-shaped function. It remains largely unclear how cognitive factors modulate population response properties in fMRI. More broadly, the brain can attain a better coding scheme by modulating tuning, noise, and response correlations, or combinations of all of these factors, a much more flexible mechanism than previously thought. This implication also invites deeper consideration of previous empirical studies using MVPA accuracy to characterize brain functions.

### U-shaped information function in larger voxel pools

It has been shown that detrimental response correlations may masquerade as beneficial given noisy measurements and a limited number of neurons in an empirical study (Fig. 10*A*; Kohn et al., 2016). We further systematically modeled the effects of correlation strength and pool sizes. As shown in Figure 10*C*, the U-shaped function held when the number of voxels increased up to 2000, which is much larger than the number of voxels in a typical MVPA analysis. These results are also consistent with our past theoretical work (Zhang et al., 2020). We cannot completely exclude the possibility that sensory information will eventually saturate at a lower asymptote level when pool size goes infinite (i.e., the signature of information-limiting correlation). But our results should hold for the majority of MVPA fMRI studies.

## Discussion

In this study, we investigate the effects of RCs in fMRI by calculating the bias-corrected linear Fisher information between the data where RCs are intact or hypothetically removed. We find higher stimulus information in the former case, supporting the beneficial role of RCs in fMRI data. We then systematically manipulate RC strength and find that stimulus information follows a U-shaped function of RC strength. These results stand in stark contrast to the monotonically detrimental effects of RCs documented in neurophysiology. Interestingly, this discrepancy between neurophysiology and fMRI can be well explained by a voxel-encoding model that bridges neuronal and voxel activity. Most importantly, information decomposition analyses further suggest that RCs reduce information on higher-variance PCs and, in the meantime, may enhance information on lower-variance PCs in most scenarios. The two antagonistic effects together may result in increasing, decreasing, or U-shaped information functions, producing a wide range of theoretical and empirical findings. This information decomposition approach also highlights the complexity of quantifying information and can serve as a unified mathematical framework that helps resolve debates in computational neuroscience.

### The effects of response correlations in neurophysiology

The effects of RCs on stimulus information have been a matter of debate over the decades (Kohn et al., 2016). One major contribution of our work is to use a unified framework—information decomposition—to quantify stimulus information in fMRI data. This method in theory can be used on different modalities (e.g., neuronal spikes, fMRI, EEG sensors). It has been shown that only a specific type of correlation—differential correlation—limits information but its presence might be hard to detect because of the limited number of trials and neurons recorded in empirical data (Moreno-Bote et al., 2014). The information decomposition analysis has been previously proposed as a feasible solution to circumvent this limitation and help detect differential correlations using a much smaller number of units and trials (Montijn et al., 2019). Instead of the reduced information on high-variance PCs, we find a significant contribution of enhanced information on low-variance PCs, which is usually ignored in the previous literature. This phenomenon indicates that careful quantification of covariance structure in fMRI data and formal calculation of information are needed in future empirical studies.

### The comparisons between MVPA and information-theoretic approaches

Fisher information is tightly linked to MVPA decoding accuracy because the inverse of Fisher information is defined as the lower bound of (co)variance of an unbiased maximum likelihood decoder (Abbott and Dayan, 1999).

Both Fisher information and MVPA have their respective advantages and disadvantages. Decoding employs a discriminative modeling approach requiring no assumption of the generative distributions of fMRI data. But decoding only provides a single accuracy value without showing sufficient details of representational geometry (Naselaris and Kay, 2015). In particular, decoding falls short in disentangling the effects of signal, noise, and response correlations on population codes (e.g., differentiate the cases in Fig. 2*C*,*D*). In contrast, linear Fisher information is based on the generative mechanism of data because by definition it is related to maximum likelihood decoding and thus guaranteed to be statistically optimal. However, using a limited number of units and trials may yield a biased estimation of linear Fisher information. The analytical solution to correct the bias is based on the assumption of Gaussian variability.

### The interpretation of linear Fisher information

Fisher information can be converted to the discrimination threshold of an optimal linear decoder. We emphasize that this information-behavior comparison in typical unit-recording studies is inappropriate in fMRI because of the intrinsic low signal-to-noise ratio of fMRI data. Unlike neurophysiology, decoding accuracy in fMRI rarely reaches human behavioral performance on the same stimuli. For example, to discriminate two high-contrast gratings with orthogonal orientations, decoding accuracy on fMRI data can only achieve ∼70−80% accuracy, but human behavior can easily reach nearly 100% (Haynes and Rees, 2005; Kamitani and Tong, 2005). Here, we obtained ∼1 rad^{−2} information (Fig. 3), and this corresponds to the ∼77° orientation discrimination threshold given the 75% accuracy. This is consistent with classical fMRI decoding results (Haynes and Rees, 2005; Kamitani and Tong, 2005) but far worse than the human behavioral threshold of orientation discrimination (∼1–2°). Like MVPA, we can only compare the relative change of linear Fisher information across different cognitive states (e.g., attention vs unattention).

The low signal-to-noise ratio of fMRI data is also a potential limitation of this study. Here, large orientation differences (i.e., 45° and 90°) are used in the fMRI experiment and for the calculation of linear Fisher information. This is indeed consistent with the majority of decoding studies in fMRI. However, Fisher information by definition only measures the local curvature of log-probability distributions, and neurophysiological studies typically use a pair of stimuli with small differences (e.g., <10°). Theoretically, the calculation of the covariance matrix via Equation 3 may be imprecise when a pair of stimuli with large disparities are used. We argue that this should not be a problem for fMRI because (1) the signal-to-noise ratio is intrinsically low in fMRI signals, and (2) Figure 3*B* shows that the empirically measured covariance matrices for different stimuli with large disparities are highly similar (i.e., stimulus-invariant covariance matrices). Equation 3 is therefore appropriate here. The calculation of linear Fisher information in the neuron-encoding model may be imprecise (Fig. 5*C*), but this does not affect the conclusion as here we only focus on the qualitatively detrimental effects of tuning-compatible noise correlations (Fig. 5*C*, decreasing line) in neuronal populations, a phenomenon that is consistent with the case of stimuli with slight differences.

Voxel trial-by-trial variability may reflect underlying neuronal variability as well as many other sources of noise. There exist at least three types of noise that can cause the variability of voxel activity. The first is the measurement noise related to fMRI data acquisition, such as thermal noise, electronic noise, and so on. These types of noise are unlikely related to voxel tuning and thus unlikely produce tuning-compatible RCs. How to best control and minimize the influences of head motion or measurement noise during data acquisition and processing remains an active field of research in fMRI (Kay et al., 2013). The second type of noise is related to some global brain functions (e.g., arousal), and this type of noise is unlikely related to voxel tuning either. The third type of variability comes from the underlying neuronal variability in a voxel and should produce tuning-compatible RCs as we show analytically and empirically above. The existence of tuning-compatible RCs also suggests that voxel variability contains a substantial fraction of underlying neuronal variability in addition to measurement noise and global brain functions. The third type of variability is of most interest to neuroscientists because it usually reflects important aspects of stimuli (Cohen and Kohn, 2011; Kohn et al., 2016). Unfortunately, current fMRI technology does not allow us to distinguish between different sources of noise, and it remains a challenge for future fMRI studies to analyze the effects of noise on empirical data.

### Implications for future fMRI studies

What are the implications for future fMRI practice? We would like to emphasize three possible areas where our framework is useful.

First, we manipulated the strength of RC and then calculated the resultant linear Fisher information of stimuli. Our results suggest that one should use the decoding method that takes into consideration the correlation structure in data if RCs in general improve decoding. For example, some recent efforts on Bayesian decoding require explicit modeling of the correlation structure. Thus, measuring and quantifying voxel correlations is an essential step toward more robust brain decoding (van Bergen and Jehee, 2018).

Second, although many factors (e.g., acquisition noise) may result in voxel RCs, and these RCs may help decoding, we still want to carefully control these factors during fMRI data acquisition and processing. This is because (1) they may lower the signal-to-noise ratio of individual voxels and (2) more importantly lead to inaccurate neuroscientific interpretations of fMRI data (Kay et al., 2019).

Third, cognitive states of the brain may have substantial impacts on RC structures. Despite the profound evidence of top-down modulations on RCs in neurophysiology (Ruff and Cohen, 2014), it remains unclear whether altered RC structures underpin cognitive processes in humans. For example, attention and perceptual training have been shown to enhance decoding accuracy in human visual cortex (Jehee et al., 2011; Chen et al., 2016). But it remains unclear whether the enhanced decoding accuracy arises because of altered RCs. The effects of RCs on stimulus information suggest that modulating RCs is an effective way to alter stimulus information in multivariate fMRI data. In particular, in this article we isolate the effects of RCs by keeping other aspects of voxel responses intact. In realistic experiments, cognitive processes (e.g., attention) can alter signals, noises, RCs, or combinations of these factors. It is the interactions among these factors that produce the outcome stimulus information. Future studies are needed to further dissect the computational mechanisms of altered decoding accuracy in human fMRI.

## Footnotes

This work was supported by National Natural Science Foundation of China Grant 32100901, Shanghai Pujiang Program Grant 21PJ1407800), Natural Science Foundation of Shanghai Grant 21ZR1434700, Research Project of Shanghai Science and Technology Commission and Fundamental Research Funds for the Central Universities Grant 20dz2260300 (to R.-Y.Z.). We thank the team of the StudyForrest project (http://www.studyforrest.org/) that acquired and shared the fMRI data. We thank Kendrick Kay and Peter Bandetinni for comments on the manuscript.

The authors declare no competing financial interests.

- Correspondence should be addressed to Ru-Yuan Zhang at ruyuanzhang{at}sjtu.edu.cn