## Abstract

Attention is commonly thought to improve behavioral performance by increasing response gain and suppressing shared variability in neuronal populations. However, both the focus and the strength of attention are likely to vary from one experimental trial to the next, thereby inducing response variability unknown to the experimenter. Here we study analytically how fluctuations in attentional state affect the structure of population responses in a simple model of spatial and feature attention. In our model, attention acts on the neural response exclusively by modulating each neuron's gain. Neurons are conditionally independent given the stimulus and the attentional gain, and correlated activity arises only from trial-to-trial fluctuations of the attentional state, which are unknown to the experimenter. We find that this simple model can readily explain many aspects of neural response modulation under attention, such as increased response gain, reduced individual and shared variability, increased correlations with firing rates, limited range correlations, and differential correlations. We therefore suggest that attention may act primarily by increasing response gain of individual neurons without affecting their correlation structure. The experimentally observed reduction in correlations may instead result from reduced variability of the attentional gain when a stimulus is attended. Moreover, we show that attentional gain fluctuations, even if unknown to a downstream readout, do not impair the readout accuracy despite inducing limited-range correlations, whereas fluctuations of the attended feature can in principle limit behavioral performance.

**SIGNIFICANCE STATEMENT** Covert attention is one of the most widely studied examples of top-down modulation of neural activity in the visual system. Recent studies argue that attention improves behavioral performance by shaping of the noise distribution to suppress shared variability rather than by increasing response gain. Our work shows, however, that latent, trial-to-trial fluctuations of the focus and strength of attention lead to shared variability that is highly consistent with known experimental observations. Interestingly, fluctuations in the strength of attention do not affect coding performance. As a consequence, the experimentally observed changes in response variability may not be a mechanism of attention, but rather a side effect of attentional allocation strategies in different behavioral contexts.

## Introduction

Attention was traditionally thought of as acting by increasing the response gain of a relevant population of neurons (Reynolds and Chelazzi, 2004; Maunsell and Treue, 2006). More recent studies found that attention also reduces pairwise correlations between neurons (Cohen and Maunsell, 2009; Mitchell et al., 2009; Herrero et al., 2013). Based on a simple pooling model (Zohary et al., 1994), these authors argued that the benefits of increased gain are dwarfed by the effects of reduced correlations; therefore, attention is more appropriately viewed as shaping the noise distribution.

However, in an experiment, the subject's state of attention can be controlled only indirectly and is bound to vary from one trial to the next. As a consequence, measuring neuronal variability or correlations under attention has a fundamental caveat: it is unclear to what extent the observed neuronal covariability reflects interesting aspects of information processing in the neuronal population or simply trial-to-trial fluctuations in the subject's state of attention, which is unknown to the experimenter. Despite ample evidence that attention fluctuates from trial to trial (Cohen and Maunsell, 2010, 2011), the effects of such fluctuations on neuronal population activity have so far not been investigated.

Here we analyze a simple neural population model, where neurons with overlapping receptive fields encode the direction of motion of a stimulus (see Fig. 1*A*). We assume that neurons produce spikes independently according to a Poisson process with rate λ* _{i}* and treat attention as a process that modulates the neurons' gain (see Fig. 1

*B*). The firing rate of neuron

*i*is given by where

*g*is the attentional gain (a combination of spatial and feature attention) and

_{i}*f*(θ) is the direction tuning curve. We assume that there is always a stimulus in the neurons' receptive field, but this stimulus is not necessarily attended. Crucially, in our model, the subject's attentional state is not constant across trials, even within the same attentional condition. Thus,

_{i}*g*is a random variable that varies from trial to trial (see Fig. 1

_{i}*C*), and its precise value is unknown to the experimenter. As a consequence, the correlations in

*g*across neurons will induce correlations between the observed neural responses.

_{i}In the following, we analyze this correlation structure in detail. We find that the correlations induced by attentional fluctuations resemble many experimentally observed aspects of correlated variability, such as correlations that increase with firing rates, limited range correlations, and differential correlations. In addition, we investigate the consequences of correlations induced by fluctuating attentional gain for reading out the direction of motion of the stimulus from the population response. We show that such correlations do not impair readout, even if the decoder does not have access to the attentional state. Finally, we show that our model can account for a number of nontrivial experimental findings on correlated variability in attention paradigms.

A preliminary account of these findings has been presented previously at the Cosyne Meeting 2012 (Ecker et al., 2012). Related ideas have been developed independently by another group, whose results have been published recently (Rabinowitz et al., 2015).

## Materials and Methods

This section contains a detailed description of the model and the derivations of the main results. In an effort to make the paper as accessible as possible, the Results section is self-contained. Readers not interested in the detailed derivations can skip ahead directly to Results.

### Notation

We use uppercase italic letters to denote matrices, lowercase italic letters for scalar values, and lowercase boldface letters for vectors. Thus, *M* is a matrix and *v _{i}* is the

*i*

^{th}element of vector

**v**. We write the expectation of a random variable

*x*as 〈

*x*〉 and the conditional expectation of

*x*given

*y*as 〈

*x*|

*y*〉. By defining δ

*x*=

*x*− 〈

*x*〉, we can write the variance of

*x*compactly as 〈δ

*x*

^{2}〉. All probabilities, expectations, etc. used herein are conditioned on the stimulus θ, which we sometimes omit to simplify the notation.

### Model setup

We model a population of direction-selective neurons with identical receptive field locations and a diverse range of preferred directions φ* _{i}*. We use a simple model of spatial and feature attention, where λ

*, the firing rate of neuron*

_{i}*i*, is the product of a gain

*g*(ψ) and a tuning function

_{i}*f*(θ): Here, ψ is the attended direction of motion and θ the direction of the stimulus that is shown. We assume bell-shaped tuning curves of the form where κ controls the tuning width, φ

_{i}*is the preferred direction of neuron*

_{i}*i*, and γ

*controls its mean firing rate. Although this choice of tuning curve simplifies the mathematical treatment considerably, the results do not depend on it qualitatively. Indeed, all results on fluctuations of the attentional gain hold for arbitrary tuning curves.*

_{i}Neurons are assumed to produce spikes independently according to a Poisson process with rate λ* _{i}*. Thus, the only source for noise correlations in our model is the fluctuating attentional state, which comodulates the firing rates through the gain

*g*.

_{i}The gain depends on whether attention is directed to the neurons' receptive field and on the attended direction of motion. For spatial attention, we use *g* = exp(α), which is the same for all neurons because they all have identical receptive field locations; we refer to α as the spatial gain (see Fig. 1). For feature attention, we use *g _{i}*(ψ) = exp(β

*h*(ψ)), where β is the feature gain and

_{i}*h*(ψ) the gain profile (see Fig. 3). We follow the feature similarity gain model (Treue and Martínez Trujillo, 1999), where a neuron's gain is enhanced if the attended feature matches the neuron's preference and suppressed otherwise. We use a cosine gain profile:

_{i}*h*(ψ) = cos(ψ − φ

_{i}*).*

_{i}From the perspective of the model, there is no fundamental difference between spatial and feature attention. However, because we consider a local population with identical receptive field locations, spatial attention is a special case with a constant gain profile *h _{i}* = 1 and, consequently, a single common gain

*g*= exp(α). Thus, whenever we refer to spatial attention, our results apply to a situation where all neurons in the population under consideration share the same preferred feature (i.e., receptive field location in our case). Likewise, when we refer to feature attention, our results apply to any situation where the neurons in the population span a large range of preferred features (i.e., preferred direction in our case). We chose this somewhat arbitrary distinction because it reflects the typical situation encountered in experiments in areas such as V1 or MT, where neurons with similar retinotopic locations are recorded, which typically span a large range of preferred orientations or directions of motion.

### Effect of fluctuating gains on spike count statistics

To study the effect of a fluctuating attentional gain on the spike count statistics, we treat the more general case of feature attention (see Fig. 4); the results for spatial attention (see Fig. 2) follow as a special case with β = α and *h _{i}* = 1. We assume that the attended feature is fixed and that the experimenter does not have access to the attentional gain β on individual trials but can control only its average 〈β〉 over many trials (e.g., by cuing the subject). We denote the variance of the trial-to-trial fluctuations of β by 〈δβ

^{2}〉.

To obtain the mean, variance, and covariance of the spike counts *y _{i}*, we need mean, variance, and covariance of the gain

*g*. However, due to the exponential nonlinearity in

*g*, the exact values depend on the distribution of β. We therefore simply assume that mean and variance of β are sufficiently small that we can linearize λ

*around 〈β〉: We note that this approximation is not strictly necessary. One could in principle obtain exact analytical results by, for example, assuming β to be Gaussian. However, because attentional modulations are usually relatively small (β ≈ 0.1), an exact treatment would add only complicated correction terms that would blur the key results without making any practical difference. We therefore favored the approximate framework due to the simplicity of its results.*

_{i}We obtain for the average spike count:
Variances and covariances are obtained by application of the Law of Total Covariance:
where the outer expectation (covariance) is taken over β and the inner covariance (expectation) over *y _{i}* and

*y*, we plugged in the definitions of λ

_{j}*= 〈*

_{i}*y*| β〉, and used the assumption of conditionally independent Poisson spiking, Cov[

_{i}*y*,

_{i}*y*| β] = δ

_{j}*λ*

_{ij}*.*

_{i}By taking the ratio of the variance divided by the mean, we obtain the Fano factor as follows:

### Fluctuations in attended feature induce differential correlations

Calculating the means and covariances under fluctuations in the attended direction ψ follows the same approach as above. We start with the case where the variance 〈δψ^{2}〉 is small (see Fig. 5). Assuming that the subject attends to the direction that is shown (i.e., 〈ψ〉 = 0), we linearize λ* _{i}* around ψ = θ:
where

*h*is the derivative with respect to ψ. Using this approximation, we obtain for the average spike count: Again, by applying the Law of Total Covariance, we obtain the spike count covariance: where μ

_{i}′*=*

_{i}′*d*μ

*(θ)/*

_{i}*d*θ and we used the definitions

*h*(θ) = cos(θ − φ

_{i}*) and*

_{i}*f*(θ) = exp(κcos(θ − φ

_{i}*) + γ*

_{i}*) to make the substitution*

_{i}*h*μ

_{i}′*= μ*

_{i}*/κ. Thus, fluctuations of the attended direction create differential correlations (Moreno-Bote et al., 2014), that is, response variability that is identical to variability induced by changes in the stimulus (sometimes also referred to as input noise).*

_{i}′Next, we treat the case where the attended direction fluctuates between two discrete alternatives ψ_{1} and ψ_{2}, as would be expected for a two-alternative forced-choice discrimination task (see Fig. 6). We define Δ**μ** as the difference between the expected spike counts for the two attention targets:
where we have assumed that there is no net motion in the stimulus and *f*_{0} is the neurons' firing rate for this zero-coherence condition. Again, applying the Law of Total Covariance, we obtain the covariance:

### Derivation of Fisher information for modulated Poisson distribution

The joint distribution of spike counts *P*(**y** | θ) is a compound Poisson distribution, obtained by marginalizing over the latent gain β:
We assume that β is drawn from a normal distribution with mean 〈β〉 and variance 〈δβ^{2}〉. Approximating as above λ* _{i}* ≈ (1 + δβ

*h*)μ

_{i}*, we obtain: We can solve the integral by collecting the terms related to δβ and completing the squares: Thus,*

_{i}*P*(

**y**| θ) is in the exponential family with sufficient statistics

*T*(

**y**) =

**y**. Therefore, the Fisher information with respect to the stimulus θ is given by (Beck et al., 2011) This expression is sometimes also referred to as the linear Fisher information or

*J*

_{mean}because of its close relationship to both the variance of a locally optimal linear estimator and the linear discriminability of two nearby stimuli (

*J*∝

*d*′

^{2}).

Despite the fact that the covariance matrix *C* depends on the stimulus, *J* is the full Fisher information of the population. This is unlike the Gaussian case, where a stimulus-dependent covariance matrix introduces a second term into the Fisher information (Kay, 1993). This term, sometimes referred to as *J*_{cov}, is absent in the modulated Poisson distribution, which means that (1) fine discrimination can be performed optimally using linear methods and (2) the linear Fisher information defines the Cramér-Rao bound (i.e., the minimum variance of any unbiased estimator).

### Coding accuracy under fluctuations of attentional gain

Here we show that fluctuations of attentional gains do not impair the coding accuracy of a population of neurons. We start by considering a population of conditionally independent neurons. The first ingredient to calculating the Fisher information is the inverse of the covariance matrix, which we obtain by applying the Sherman-Morrison formula to Equation 7:
where *M* = Diag(**μ**(θ)) and *u _{i}* =

*h*(ψ)μ

_{i}*(θ). Plugging into the formula for Fisher information, we obtain: The first term*

_{i}*J*

_{ind}in the above equation is the Fisher information of an independent population of neurons: It is therefore

*O*(

*N*), whereas the second term is zero for homogeneous populations of neurons, where

*f*(θ) =

_{i}*f*(θ − φ

*), and nonzero but*

_{i}*O*(1) for heterogeneous populations. To show that the second term above is

*O*(1), we assume that the amplitudes of the neurons' tuning curves are independent random variables (Shamir and Sompolinsky, 2006; Ecker et al., 2011). In this case, the quantity of interest is the expected value with respect to different realizations of the heterogeneity: Here the approximation holds because for large

*N*the denominator is

*O*(

*N*) and its SD becomes narrower relative to its mean. Therefore, the expected value of the ratio converges to the ratio of the expected values of numerator and denominator. For the numerator, we simplify: which holds because 〈∑

*h*μ

_{i}*〉 = 0, both for spatial attention (where*

_{i}′*h*= 1) and feature attention when the correct feature is attended (where

_{i}**h**is even and 〈

**μ**〉 is odd). Thus, fluctuations in attentional gains do not impair the coding accuracy of the population with respect to direction of motion, as they result in only an

*O*(1) reduction of the Fisher information, which becomes irrelevant for large populations.

To study the physiologically more realistic situation where the amount of information entering the brain is finite, we model input noise by treating the stimulus direction θ itself as a random variable with variance 〈δθ^{2}〉. In this case, the covariance of the spike counts is given by
where *C*_{0} is the covariance matrix in the absence of input noise. The Fisher information is then (Moreno-Bote et al., 2014)
where *J*_{0} is the Fisher information in the absence of input noise. Because *J*_{0} is *O*(*N*), *J* → 1/〈δθ^{2}〉. Thus, in the presence of input noise, the *O*(1) correction term from above vanishes for large *N* and the Fisher information converges to the limit imposed by the input noise.

### Coding accuracy under fluctuations of attended feature

We have shown above (Eq. 11) that fluctuations of the attended feature have the same effect as input noise. Including input noise with variance 〈δθ^{2}〉 as in the previous section, the covariance of the spike counts is therefore given by
where *M* = Diag(**μ**) as before. Analogous to above, the Fisher information is
and for large *N*, it converges to

### Simulation of experimental results

To simulate the results of Cohen and Maunsell (2011) (see Fig. 8), we used populations of *N*^{2} neurons, tuned to both orientation and spatial frequency, with the preferred stimuli spaced regularly on an *N* × *N* grid. For the illustration of the correlation matrices, we used *N* = 50 and homogeneous tuning curves as in all other figures showing covariance or correlation matrices. For computing the relationship between firing rate changes and correlation changes, we used *N* = 64 and heterogeneous tuning curves. For convenience and symmetry, we modeled both variables as periodic (although not strictly correct, this simplification does not affect the results qualitatively). We defined the overall attentional gain *g̃ _{i}* = α

_{1}

*h*(ψ

_{i}_{1}) + α

_{2}

*h*(ψ

_{i}_{2}). The neuronal gain

*g*= exp(

_{i}*g̃*). Assuming attention gains α

_{i}*and features ψ*

_{k}*are independent random variables, mean and covariance of*

_{k}*g̃*are given by: where we estimated the expectations over ψ

*via numerical integration. Without loss of generality, we assumed the first feature to be attended and set 〈α*

_{k}_{1}〉 = 0.1, 〈δα

_{1}

^{2}〉 = 0.05 and ψ Gaussian with 〈δψ

_{1}

^{2}〉 = (10°)

^{2}. For model I (see Fig. 8

*C*,

*D*), we used 〈α

_{2}〉 = 0, 〈δα

_{2}

^{2}〉 = 0.1 and ψ

_{2}, as for the attended feature, Gaussian with 〈δψ

_{2}

^{2}〉 = (10°)

^{2}. For model II (see Fig. 8

*E*,

*F*), we used 〈α

_{2}〉 = 0.1, 〈δα

_{2}

^{2}〉 = 0.05 as for the attended feature but chose ψ

_{2}as uniformly distributed. To obtain the spike count covariances, we linearized λ

*around 〈*

_{i}*g*〉 as above and obtained For the illustrations of the correlation matrices, we normalized the covariance matrices to correlation coefficients and marginalized over the irrelevant feature (i.e., for the attended condition [orientation], we averaged over all neurons with the same preferred orientation but different preferred spatial frequencies).

_{i}To simulate the results of Ruff and Cohen (2014) (see Fig. 9), we used a population of *N* neurons with receptive field locations arranged linearly in the range [0, 1]. We assumed that receptive fields have a Gaussian shape with SD 0.5 and a peak firing rate of 20 spikes/s. We placed the two stimuli at 0.35 and 0.65. For simplicity, we assumed that the firing rate in response to the two stimuli presented simultaneously is equal to the average firing rate elicited by the two stimuli presented individually. Because the receptive field locations vary within the population, we treat spatial location as a feature. For the gain profile, we used a Mexican hat:
which corresponds to an excitatory center with an SD of 0.5 and a suppressive surround with an SD of 1. We set the distribution of the attentional gain to 〈β〉 = 0.2 (reasoning that attentional effects are typically stronger when multiple stimuli compete in the receptive field; Treue and Maunsell, 1996) and 〈δβ^{2}〉 = 0.05 for both attended and unattended conditions. The difference between the two conditions is only the distribution of attended locations ψ. We assumed the attentional focus to be centered on 0.5 on average, but with a smaller SD for the attended condition (0.2) than for the unattended condition (0.5). As before, we obtained the covariance of the gain by numerical integration over the distribution of ψ and the covariance matrix by application of the Law of Total Covariance (see Eq. 34). We used a homogeneous population of 50 neurons for all illustrations and a heterogeneous population of 512 neurons for simulating the relationship between task tuning similarity (TTS) and correlations. TTS was computed as in Ruff and Cohen (2014) as the *d*′ between the responses to the two individual stimuli, assuming Poisson statistics (i.e., variance equal to the average spike count).

### Generalized linear population model

Our population model can be recast as a GLM by a simple reparameterization of the stimulus. Consider the log firing rate
where α and β are the spatial and feature gains as before. By representing angles as two-dimensional vectors of unit length (e.g., **x** = [cosθ, sinθ]^{T}), we can rewrite the log firing rate as a linear function of the attentional state and the stimulus:
Here, α and **b** = β/κ · [cosψ, sinψ]^{T} represent the state of spatial and feature attention, respectively, **x** is the stimulus, and **k*** _{i}* = κ · [cosφ

*, sinφ*

_{i}*]*

_{i}^{T}is the neuron's preferred direction. This model is a GLM with Poisson observations and log(

*x*) as the link function.

## Results

### Fluctuations in spatial attention

Our goal is to characterize the effect of fluctuating attentional signals on the population response in sensory areas. To simplify the exposition of the basic concepts and results, we start with the simplest possible case: that of spatial attention in a population of neurons with identical receptive field locations (Fig. 1*A*). We assume that neurons encode the direction of motion of a stimulus through bell-shaped tuning curves *f _{i}*(θ) (Fig. 1

*B*, dotted line) and that their firing rates are modulated by a common gain e

^{α}: We assume that α fluctuates from trial to trial and is drawn from a normal distribution with mean 〈α〉 and variance 〈δα

^{2}〉 (Fig. 1

*C*). Using this parameterization 〈α〉 = 0 corresponds to no attentional allocation and a neuronal gain of 1; we refer to this as the sensory response (Fig. 1

*D*, dotted line). In contrast, when the stimulus is attended, 〈α〉 > 0 (Fig. 1

*D*, solid line). Under this model, the average spike count of a neuron is approximately Although we use homogeneous neural populations in the figures (all neurons have the same tuning curve up to a preferred direction φ

*, i.e.,*

_{i}*f*(θ) =

_{i}*f*(θ − φ

*)), all results in this section hold more generally for arbitrary tuning curves.*

_{i}Because the attentional state fluctuates from trial to trial, the underlying firing rate also fluctuates. These fluctuations are represented in the model by the variance of the gain term. By applying the Law of Total Covariance, we obtain the spike count variance (Fig. 2*A*):
where 〈δα^{2}〉 is the variance of the attentional gain. The first term is equal to the average spike count and results from the Poisson process assumption, whereas the second term is quadratic in the firing rate, which results from the multiplicative nature of the fluctuating gain α (Goris et al., 2014), causing the spike count variance to grow more quickly than the mean. Such an expanding mean variance relation has been observed in many experimental studies (Dean, 1981; Tolhurst et al., 1983; Britten et al., 1993; Goris et al., 2014). If the attentional gain does not fluctuate, we recover the Poisson process.

As the neurons' firing rates are comodulated by a common gain, the gain fluctuations induce correlations between the neurons. We find that the resulting covariance matrix *C* has a simple form. It can be expressed as the sum of a diagonal matrix *M* and a rank-one matrix (Fig. 2*C*):
where *M* = Diag (**μ**) contains the independent variances resulting from the Poisson noise and the second term results from the gain fluctuations. (The assumption of conditional independence could be relaxed without affecting any of the major results qualitatively: the diagonal matrix in the equation above would simply be replaced by an alternative, nondiagonal covariance matrix.)

Hence, the covariance between two neurons is proportional to the product of the firing rates, with the constant of proportionality given by the variance of the attentional gain (Fig. 2*B*). There is no such simple expression for the correlation coefficient, which is more typically quantified in experimental studies. We find that spike count correlations induced by a fluctuating attentional gain increase with firing rates (Fig. 2*D*), as observed in numerous experimental studies (Cohen and Maunsell, 2009; Mitchell et al., 2009; Smith and Sommer, 2013; Ecker et al., 2014). This effect arises because the independent (Poisson) variability is linear in the firing rate, whereas the covariance induced by gain fluctuations is quadratic and therefore dominates for large firing rates. However, although correlations increase with the geometric mean firing rate, there is no simple one-to-one mapping between the two quantities: it also depends on the ratio of the firing rates (Fig. 2*C*). Thus, our analysis suggests that in the presence of gain fluctuations covariances are more appropriate to consider when analyzing experimental data than correlation coefficients. Alternatively, it would be appropriate to normalize the covariance by the product of the firing rates μ* _{i}*μ

*if some kind of normalization is desired.*

_{j}In addition, the correlation structure induced by gain fluctuations is nontrivial even if all neurons share the same gain (Fig. 2*E*,*F*) (see also Ecker et al., 2014). Because of the nonlinear shape of the tuning function and the nonlinear way the neurons' tuning functions affect spike count correlations, the correlations decrease with increased difference in two neurons' preferred directions (Fig. 2*E*). The slope of the decay depends mainly on the dynamic range of the tuning curve (Fig. 2*F*). If neurons have a high baseline firing rate compared with their peak firing rate, correlations decrease only marginally with preferred direction. In contrast, sharply tuned neurons with close to zero baseline firing rates exhibit strong limited-range structure. This limited-range correlation structure has been observed in numerous experimental studies (Zohary et al., 1994; Bair et al., 2001; Smith and Kohn, 2008; Cohen and Maunsell, 2009; Ecker et al., 2010) and has been hypothesized to reflect shared input among similarly tuned neurons. However, our simple model shows that these seemingly structured correlations can arise from a very simple, nonspecific mechanism: a common fluctuating gain that drives all neurons equally, regardless of their tuning properties.

### Fluctuations of feature attention

Feature attention is different from spatial attention in that the sign of the modulation depends on the similarity of the attended direction to the neuron's preferred direction of motion (Fig. 3). Following the feature-similarity gain model (Treue and Martínez Trujillo, 1999), we model feature attention by
where we refer to β as the feature gain that controls how strongly the feature ψ (in this case, direction of motion) is attended on the given trial and *h _{i}*(ψ) is the gain profile that determines the sign and relative strength of modulation for each neuron depending on the similarity of its preferred direction to the attended direction (Fig. 3

*B*).

In this model, we can think of feature attention as a prior on the direction of motion. The attentional term *e*^{βhi} is a population hill centered on the attended direction of motion. The gain β controls its width and amplitude (the strength of the prior), whereas the profile *h _{i}*(ψ) controls its location. Thus, feature attention biases the population response toward the attended direction by enhancing the response of neurons with preferred directions close to the attended direction and suppressing those with opposite preferred directions (Fig. 3

*B*). As a result, unlike in the case of spatial attention, the shape of the population response is no longer identical to that of the individual neurons' tuning curve (Fig. 3

*D*).

We start by assuming that the subject always attends the same direction (i.e., ψ is constant) and consider the effect of fluctuations in the strength of attention, that is the gain β. We will come back to fluctuations in the attended direction below.

Similar to spatial attention, fluctuations in feature attention lead to overdispersion of the spike counts relative to a Poisson process. This means that the ratio of variance to mean (the Fano factor) is >1. The degree of overdispersion not only increases with the neuron's firing rate but also depends on the neuron's preferred direction relative to the attended direction (Fig. 4*A*). Spike counts are most overdispersed at the preferred and the null directions (Fig. 4*A*, red and blue). Moreover, the feature similarity gain model predicts that neurons with preferred directions close to orthogonal to the attended direction should be the least overdispersed (Fig. 4*A*, purple).

As feature attention induces both increases as well as decreases in neuronal gain, the induced correlation structure is different from that induced by spatial attention. However, because each neuron's gain is driven by a common feature gain, the covariance matrix can again be decomposed into a diagonal matrix *M* plus a rank-one component:
where *u _{i}* =

*h*(ψ)μ

_{i}*(θ). The sign of the covariance is determined by the product of*

_{i}*h*and

_{i}*h*, which depends on the attended direction and the preferred directions of the two neurons (Fig. 4

_{j}*B*). The covariance is always positive for two neurons with identical preferred directions, whereas it is always negative for two neurons with orthogonal preferred directions. For any pair of neurons in between, it can be both positive and negative depending on the stimulus (Fig. 4

*B*).

As for spatial attention, averaging correlations over multiple stimulus conditions to represent the correlation structure as a function of the neurons' tuning similarity misses much of the underlying structure (Fig. 4*C*): spike count correlations are positively correlated with tuning similarity (Fig. 4*D*), but the stimulus dependence (Fig. 4*C*) is again ignored. As before, the exact shape of the decay depends on the tuning width: for narrow tuning curves, neurons with opposite preferred directions are only weakly anticorrelated, whereas for broad tuning curves, those neurons are strongly anticorrelated (Fig. 4*D*, blue to yellow lines).

So far, we have assumed that the attended direction of motion is constant and only the strength of attention fluctuates from trial to trial. Now we turn to the case where the attended direction itself fluctuates from trial to trial. Assuming that the subject attends on average the correct direction 〈ψ〉 = θ, but with some variance 〈δψ^{2}〉, we find that the covariance matrix can again be written as diagonal plus rank one (Fig. 5; for the derivation, see Materials and Methods):
Here β is the strength of the attentional modulation, κ the tuning width of the neurons, and **μ**′ the derivative of the average firing rate with respect to the stimulus. This pattern of correlations is known as differential correlations (Moreno-Bote et al., 2014), as the variability resembles that induced by small changes in the stimulus. This result is indeed expected: as we mentioned above, feature attention can be thought of as a prior on direction of motion, therefore biasing the population response toward the attended direction. If the attended direction changes from trial to trial, this will perturb the population response in the same direction in multineuronal space as small changes in the stimulus itself.

Interestingly, when plotted as a function of tuning similarity, the correlation structure resembles that induced by gain fluctuations (Fig. 5*C*), except for very narrow tuning curves. This finding is quite striking because the correlation matrices look quite different (compare Fig. 4*C* with Fig. 5*B*) and, as we will see below, the two correlation structures have quite dramatically different effects on the population code. However, the correlations induced by fluctuations in the attended direction are substantially weaker than those induced by gain fluctuations (Figs. 2, 4), even if the distribution of attended directions is fairly wide (SD: 10° in Fig. 5).

A second example with relevance to experimental studies is the situation where feature attention fluctuates between two discrete alternatives (Fig. 6). Consider the classic random dot motion discrimination paradigm (Newsome and Paré, 1988), where the subject has to decide whether the net motion in the display is rightward or leftward (Fig. 6*A*). Most interesting to the experimenter are the trials where the net motion is zero (zero coherence). On average, the population response is flat on those trials (Fig. 6*B*, dashed line), as there is no net motion signal in the stimulus. However, on any given trial, the subjects may have expectations about the stimulus that is to come, for example, because of the past stimuli they have observed. They may therefore decide to attend to one direction of motion or the other. As a consequence, in trials where the subject attends to leftward motion, neurons with preferred directions around leftward (rightward) motion will be enhanced (suppressed) and vice versa (Fig. 6*B*, red and blue). That is, the population response will fluctuate between attend-left and attend-right. The covariance structure induced by such fluctuations (Fig. 6*C*) is very similar to that observed before, except that we have to replace the derivative of the response **μ**′ by the difference between the responses when attending left versus right, Δ**μ**:
An interesting difference to above is that, in this case, the correlation matrix (Fig. 6*D*) is more or less a scaled version of the covariance matrix because the population response is flat; therefore, the normalization does not affect its shape very much.

### Effects of attention-induced correlations on population coding

How interneuronal correlations affect the representational accuracy of neuronal populations has been a matter of immense interest (and debate) over the last years, and changes in the correlation structure have been suggested to underlie the improved behavioral performance under attention (Cohen and Maunsell, 2009; Mitchell et al., 2009). Thus, we want to briefly consider how correlations induced by attentional fluctuations affect the coding accuracy of a population code.

Before doing so, we need to make a choice: does the downstream readout have access to the state of attention or not? If it does, the picture is fairly simple: attentional fluctuations do not affect the readout accuracy because the attentional state can be accounted for and there is no additional noise compared with a scenario without attentional fluctuations. The only downside is a potentially more complex readout. In contrast, if we assume that the readout does not have access to the attentional state, the situation becomes more interesting. In this case, the attentional fluctuations act like additional (internally generated) noise, which could impair the readout. In the following, we consider this latter scenario.

To quantify the accuracy of a population code, we use the Fisher information (Kay, 1993) with respect to direction of motion. Fisher information is proportional to the square of *d*′ (Berens et al., 2011), which quantifies the detectability of small changes in a stimulus parameter from the resulting changes in the responses of a population of neurons. For a population of neurons with independent noise, the Fisher information of individual neurons adds up linearly.

We start by considering spatial attention. Because the gain is the same for all neurons, gain fluctuations should not affect the coding accuracy of the population with respect to the direction of the stimulus, which is encoded in the differential activation pattern of the neurons. This is indeed the case. As shown in Materials and Methods, the Fisher information of a population of Poisson neurons whose firing rates are comodulated by a common fluctuating gain is given by
where *J*_{ind} is the Fisher information of an independent population without fluctuating gain (i.e., 〈δα^{2}〉 = 0). Thus, unobserved gain fluctuations reduce the information in the population only by a constant term, which is negligible for reasonably large populations (Fig. 7*A*, solid blue line vs circles). This result can be understood intuitively by considering the structure of the covariance matrix (Eq. 41): the dominant eigenvector points in the direction of the neural response μ, which is orthogonal to changes in the response due to changes in the stimulus, μ′. Therefore, gain fluctuations do not impair the readout of the direction of motion. This result concerns only the fluctuations in the gain. The increased gain due to attention still leads to a higher Fisher information in the attended condition compared with the unattended condition (Fig. 7*A*, solid vs dashed blue line).

The same result holds for fluctuations in the feature gain, so long as the attended direction matches the one shown and does not fluctuate from trial to trial (Fig. 7*A*, blue pluses). A fluctuating gain sharpens or broadens the population hill from trial to trial but leaves its peak unchanged. Again, the dominant eigenvector (*u _{i}* =

*h*μ

_{i}*, Eq. 43) points in a direction that is orthogonal to changes in the stimulus (for details, see Materials and Methods).*

_{i}The situation changes if the focus of attention (i.e., the attended direction) fluctuates from trial to trial. Because feature attention biases the population response toward the attended direction of motion (Fig. 3*D*), such attentional fluctuations induce differential correlations: the dominant eigenvector is the derivative of the neural response with respect to the stimulus, μ′. Therefore, the Fisher information saturates at a finite value (Fig. 7*A*, red lines):
Thus, for sufficiently large populations, the Fisher information is determined only by the degree of attentional modulation β and the variance of the attended direction 〈δψ^{2}〉 relative to the tuning width κ (for derivation, see Materials and Methods). We call the term 〈δψ^{2}〉β^{2}/κ^{2} the effective stimulus variance, as it has exactly the same effect as unobserved variability in the stimulus of the same magnitude.

So far, we have assumed that there is no limit on the amount of information entering the brain. However, in practice, the amount of information is finite due to sensory noise (e.g., in the photoreceptors). In some experimental settings, this sensory noise is likely to be very small (e.g., orientation discrimination tasks with high-contrast gratings), whereas in other situations it can be substantial (e.g., random dot motion paradigms at low coherence). Therefore, we also briefly considered how input noise with variance 〈δθ^{2}〉 affects the Fisher information in the presence of attentional fluctuations. The two main results from above also hold in this case: whereas gain fluctuations (both spatial and feature gain) have a negligible effect on the Fisher information (Fig. 7*B*, solid blue line, circles and pluses), fluctuations of the attended direction do impair the code (Fig. 7*B*, red lines).

A few observations are noteworthy. First, when the input noise is small (e.g., 75% threshold of an ideal observer: 0.5°), the results are qualitatively similar to the approximation of no input noise (Fig. 7*B*, left) for population sizes of a few hundred neurons. Second, if the population response is dominated by the amount of input noise (threshold: 3°), we can consider the asymptotic value at which the Fisher information saturates:
This value depends only on the tuning width κ, the amount of input noise 〈δθ^{2}〉, the degree of attentional modulation β, and the variance of the attended direction 〈δψ^{2}〉. Thus, gain fluctuations (both spatial and feature gain) also have no effect in large populations when the Fisher information is bounded by the input noise (Fig. 7*B*, right); the *O*(1) correction term from Equation 46 disappears. Moreover, in the absence of fluctuations of the attended direction, the Fisher information reduces to 1/〈δθ^{2}〉, the inverse variance of the input noise, which is the bound given by the data processing inequality (Moreno-Bote et al., 2014). Third, fluctuations of the attended direction further reduce the Fisher information below the limit imposed by the data processing inequality (Fig. 7*B*, right, red lines).

In summary, attentional gain fluctuations generally do not impair coding accuracy, but fluctuations of the attended feature can have a major effect, in particular when the input noise is small. Importantly, this finding does not apply only to voluntary variability in the attended direction. Even if the animal tries to attend to the same direction on every trial, 〈δψ^{2}〉 is non-zero in any realistic scenario because the attended direction ψ is represented by a finite number of neurons in the brain. Therefore, the very existence of an attentional mechanism places a limit on how accurately a stimulus can be represented, and this limit can be substantially lower than that imposed by the information in the feedforward signal (see also Discussion).

### A new view on correlated variability under attention

There is ample experimental evidence that attention fluctuates from trial to trial (Cohen and Maunsell, 2010, 2011), and we showed in the previous sections that such fluctuations induce patterns of (correlated) variability that are highly consistent with the reported data on attention (Cohen and Maunsell, 2009; Mitchell et al., 2009; Herrero et al., 2013). Interestingly, in our model, both the magnitude of overdispersion in single neurons' spike counts and the average level of correlations depend on the variance of the attentional gain 〈δα^{2}〉, which in the model is completely independent of the average modulation 〈α〉. This observation suggests that the average attentional modulation between an attended and an unattended condition (which can be reliably measured based on average responses) may not predict the level of correlations in either condition because the latter is controlled by an independent variable.

This dissociation between attention effects on firing rates and correlations is indeed a central experimental finding for which our model can account. In many cases, directing spatial attention to a certain location increases the average responses of neurons whose receptive fields represent this location, but reduces their individual and shared variability (Cohen and Maunsell, 2009; Mitchell et al., 2009; Herrero et al., 2013). Thus, if our model is correct, then the data suggest that attention not only increases response gain but also reduces the trial-to-trial fluctuations of the gain.

Attentional fluctuations can also account for more recent experimental results on modulation of correlations under attention, two of which (Cohen and Maunsell, 2011; Ruff and Cohen, 2014) we reproduce with our model in the following.

In the first study, Cohen and Maunsell (2011) investigated how feature attention modulates firing rates and interneuronal correlations. In their paradigm, monkeys have to attend to either the orientation of a grating or its spatial frequency. Their findings resemble the patterns observed for spatial attention: correlations are reduced for neurons whose firing rates increase when attending to orientation compared with attending to spatial frequency, and vice versa (Fig. 8*D*).

To reproduce this pattern of results in our model, we consider a population of neurons that is tuned to both orientation and spatial frequency (Fig. 8*A*; see Materials and Methods). We assume, without loss of generality, that the monkey attends to orientation. Moreover, we assume that he attends to (approximately) the correct value (0°) and that the gain fluctuates moderately from trial to trial (Fig. 8*B*). The resulting correlation structure with respect to the neurons' preferred orientation resembles what we showed earlier (compare Fig. 4*C*). Interestingly, the pattern of results reported by Cohen and Maunsell (2011) (Fig. 8*D*,*F*) can be reproduced by two entirely different scenarios with respect to the unattended feature, spatial frequency. The first possibility is to assume that the attentional gain for spatial frequency fluctuates more strongly from trial to trial, but the attentional focus remains on the stimulus that is shown (albeit with a lower average gain). This hypothesis would lead to essentially the same correlation structure as for orientation, but with higher magnitude (Fig. 8*C*). The second possibility is not to invoke increased gain fluctuations, but instead assume that the focus of attention fluctuates from trial to trial (Fig. 8*E*; i.e., a random spatial frequency is attended). This hypothesis would lead to a different correlation structure with respect to the neurons' preferred spatial frequency, but the same pattern of results (Fig. 8*F*) when plotted as a function of firing rate changes as in Cohen and Maunsell (2011). Resolving which of these two hypotheses (if any) is correct would advance our understanding of how attentional resources are allocated in the brain. Unfortunately, how correlation changes relate to firing rate changes is uninformative in this respect.

This dissociation between attention effects on firing rates and correlations is further supported by a second recent study showing that attention-induced increases in firing rates can be associated with either increased or decreased correlations, depending on how similarly two neurons respond to a pair of nearby gratings in a contrast discrimination task (Ruff and Cohen, 2014). The authors' similarity measure (TTS) depends largely on the degree to which the neurons' receptive fields overlap, such that spike count correlations between neurons with overlapping receptive fields decrease in attended relative to unattended conditions, whereas those between neurons with nonoverlapping receptive fields increase in attended conditions (Fig. 9*E*).

To reconcile this pattern of results, we model a population of neurons whose receptive field centers cover both stimulus locations. We chose the receptive field sizes large enough such that all neurons respond to both stimuli, albeit with different intensities (Fig. 9*A*). When considering nearby locations, spatial attention behaves much like feature attention in that the central enhancement of the attended location is accompanied by a suppressive surround (Bahcall and Kowler, 1999; Intriligator and Cavanagh, 2001; Müller and Kleinschmidt, 2004; Hopf et al., 2006; Sundberg et al., 2009). We therefore incorporate a center-surround gain profile as in our discussion of feature attention (Fig. 9*B*). Now, if we assume that in the attended condition the spatial focus of attention is less variable than in the unattended condition, we can reproduce the pattern of results obtained by Ruff and Cohen (2014). Specifically, the interaction of fluctuations in the attended feature with the gain profile leads to the unique pattern of correlation changes (Fig. 9*E*): although all neurons increase their firing rates under attention (Fig. 9*A*), pairs with strongly overlapping receptive fields have decreased correlations while pairs with less overlapping receptive fields have increased correlations (Fig. 9*D*,*E*).

Notably, this model does not only reproduce the qualitative changes of correlations, but also a number of more subtle patterns in the results of Ruff and Cohen (2014): the crossing of the two lines in Figure 9*E* is shifted slightly to negative values; firing rates increase much more modestly for neurons with negative TTS than for those with positive TTS; and the changes in correlation are stronger for neurons with positive TTS. Interestingly, fluctuations in the focus of spatial attention were the only mechanism that we found could account for the observed pattern of correlation changes as a function of TTS. Increased fluctuations of the attentional gain (as above) could not account for the pattern of results in this study. However, from the perspective of a single neuron (or multiple neurons with identical receptive fields), increased fluctuations of the attended features look identical to increased gain variability.

### Identifying attentional fluctuations in experimental data

We saw above that fluctuations in attentional state can introduce interesting patterns of correlation in neural activity, which are consistent with the published literature on attention. However, as long as one considers only single neurons and pairwise statistics, any result can be consistent with many hypotheses. For instance, attentional fluctuations induce correlations that depend on firing rates (Fig. 2*C*), but the same result is also predicted by the thresholding nonlinearity of neurons (de la Rocha et al., 2007) and therefore need not result from attentional fluctuations. Similarly, all types of attentional fluctuations considered above lead to correlations that decrease with the difference of two neurons' stimulus preferences (limited range correlations; Figs. 2*E*, 4*D*, 5*C*), but this correlation structure can also arise from shared sensory noise (Shadlen et al., 1998). Finally, changes in the correlation structure between attended and unattended conditions could either arise from attention-induced changes in effective connectivity between neurons or, as our model suggests, from the fluctuating attentional state unknown to the experimenter.

How would one go about identifying attentional fluctuations in experimental data? To do so, we have to consider the response patterns of simultaneously recorded populations of neurons rather than just pairwise correlations. In the following, we discuss some predictions our model makes for the structure of the neural population response.

A first approach is suggested by our analyses above: we showed that, in all cases we analyzed, the covariance matrix induced by attentional fluctuations is diagonal plus rank one. Thus, each type of attentional fluctuation is restricted to a one-dimensional subspace (sometimes referred to as the attention axis; Cohen and Maunsell, 2010), which could be inferred from simultaneously recorded neurons by Factor Analysis or perhaps directly measured by appropriately cuing the animal. However, because attentional modulation is multiplicative, this subspace depends on the stimulus: Figure 10*A* shows how the three attention axes defined by attentional fluctuations (corresponding to spatial gain, feature gain, and attended feature) change for different stimulus directions (left: θ = 0°; right: θ = 60°). As these axes are not simply shifted versions of each other, one cannot pool data over multiple stimulus conditions. Moreover, if the attended direction of motion does not match the stimulus direction, the attention axes related to feature attention do not peak at either neurons tuned to the stimulus or the attended direction, but somewhere in between (Fig. 10*A*, right, blue lines, where ψ = 0° and ψ = 60°). Thus, it is nontrivial to recover the quantities of interest to the experimenter: the attended feature (direction) and the degree of attention allocated (the gain).

A model that could directly extract attentional gains (spatial and feature gain) and the attended feature would be desirable. Fortunately, it turns out that such models exist and are relatively straightforward to apply. As shown in Materials and Methods, we can convert our model into a GLM by a simple reparameterization of the stimulus. Essentially, both the stimulus and the attentional modulation affect the log firing rate additively and independently. As a consequence, we can infer the linear subspace (attention axes) corresponding to attentional fluctuations from population activity using methods, such as Exponential Family Principal Component Analysis (Collins et al., 2001; Mohamed et al., 2009) or Poisson Linear Dynamical Systems (Macke et al., 2011; Buesing et al., 2012). Moreover, this subspace is independent of the stimulus (Fig. 10*B*). Fluctuations of the spatial attention gain correspond to an additive offset common to all neurons (Fig. 10*B*, red), whereas the subspace spanned by fluctuations in the attended direction and its gain is given by [cosφ* _{i}*, sinφ

*] (Fig. 10*

_{i}*B*, light and dark blue).

## Discussion

We have presented a simple model of neuronal responses under attention, which is built on just two key ingredients: that attention acts as a multiplicative gain factor on neuronal responses (Maunsell and Treue, 2006) and that the state of attention fluctuates from trial to trial (Cohen and Maunsell, 2010, 2011). Although both assumptions are fairly uncontroversial, the importance of their combined effects when studying correlations in neuronal population responses has not been fully appreciated. We have shown that such a simple model can account for a range of empirically observed phenomena, such as super-Poisson variability (Ecker and Tolias, 2014; Goris et al., 2014) as well as patterns of (correlated) variability under attention (Mitchell et al., 2009; Cohen and Maunsell, 2010, 2011; Herrero et al., 2013; Ruff and Cohen, 2014).

Our results argue that it is likely that some fraction of variability in the neuronal response can be attributed to fluctuations in behaviorally relevant, internally generated signals, such as attention, rather than shared sensory noise (Nienborg and Cumming, 2009; Ecker et al., 2010, 2014; Ecker and Tolias, 2014; Goris et al., 2014; Haefner et al., 2014). However, exactly what fraction of correlated variability observed in experimental studies can be attributed to such attentional fluctuations remains an empirical question that cannot be answered based on the available published data. We have suggested ways to address this question by identifying attentional fluctuations directly from simultaneous population recordings using latent variable models (Collins et al., 2001; Mohamed et al., 2009; Macke et al., 2011; Buesing et al., 2012; Pillow and Scott, 2012).

We stress that our model incorporates a number of simplifications and therefore cannot capture all aspects of interneuronal correlations. First, we have deliberately ignored any correlations arising from common feedforward inputs or recurrent connectivity, mostly because we expect them to be small for the majority of pairs (Ecker et al., 2010; Renart et al., 2010). However, we expect our analysis to remain valid at least qualitatively even if there are substantial correlations in the data that are due to other sources. Second, by modeling attention on the phenomenological level and treating it as a common gain, we have ignored the question of how such a gain modulation may be implemented in a neural network (Bejjanki et al., 2011) and reduced attentional fluctuations to modulations in one-dimensional subspaces. Although this simplification will miss any changes in the correlation structure that are due to the underlying network mechanisms, we note that there are very few experimental data available to constrain more mechanistic, network-level models. We therefore favored the more simplistic approach, which can already account for a remarkable variety of nontrivial experimental findings. Third, we have considered each type of attentional fluctuation (spatial gain, feature gain, attended feature) individually. However, in practice, all types of fluctuations, as well as many others, are likely to occur at the same time. The combined effects of different types of attentional fluctuations will depend on the correlations between different attentional processes. Although there is some experimental evidence that spatial attention is uncorrelated between hemispheres (Cohen and Maunsell, 2010) and that spatial and feature attention are uncorrelated (Cohen and Maunsell, 2011), the correlations between different attentional processes are likely to depend on the task in general. Because the variability in the population response that is due to attentional fluctuations will lie within the subspace spanned by the individual components, the dependencies within this subspace can inform us about the (in)dependence of different attentional processes.

Further, although it is generally accepted that both spatial and feature attention act as gain-modulating signals (McAdams and Maunsell, 1999; Maunsell and Treue, 2006), they are not the only factors capable of modulating neuronal gain. As such, our results apply more generally to any gain-modulating signal, rather than exclusively to attention. It should also be cautioned that, in certain contexts, the effects of attention may extend beyond gain modulation (e.g., shifting contrast response functions) (Reynolds et al., 2000). Whether such effects can be attributed to differences in stimulus parameters and how such differences interact with attentional and other task strategies that subjects use (Reynolds and Heeger, 2009) or whether certain paradigms engage internal signals in addition to attention is an important empirical question in need of conclusive resolution. Our gain model of attention does not make any assumption about what mechanisms are causing gain fluctuations. Instead, it serves as a parsimonious model that is sufficient to reproduce a number of the main neurophysiological effects described in the attention literature, and accounts for a number of findings on neuronal variability as well that have not been sufficiently appreciated before.

Our model can also be interpreted in the context of the hypothesis that the brain performs probabilistic inference (Mumford, 1991; Lee and Mumford, 2003). In this view, neuronal populations represent not the sensory stimulus itself, but instead the subject's belief about certain features of the stimulus. In other words, the neural response depends not only on the stimulus, but also on the subject's prior, which could be implemented by the attentional gain. Haefner et al. (2014) further developed this idea and proposed a model in which the brain implements Bayesian inference by neural sampling. Their model makes predictions for the correlation structure during discrimination tasks that are very similar to those of our model under fluctuations in the attended feature (Fig. 6). However, the sources of correlation differ between the two models. In our model, attention is treated as a prior, and the correlations between neurons arise from trial-to-trial variability of the prior. In their model, in contrast, differential correlations arise because neural activity represents samples from the posterior. The magnitude of differential correlations is therefore directly related to the width of the subject's posterior, and their timescale is determined by the dynamics of the sampling process. The difference between the two models is seen most clearly if we assume that the prior was constant across trials (i.e., no attentional fluctuations). In this case, their model predictions would not change at all, whereas our model would not predict any correlations. Thus, our model puts the emphasis on the trial-to-trial fluctuations of the prior, which, despite being suboptimal, seem to be present. For instance, it has long been known that there are serial dependencies in subjects' responses (Fernberger, 1920; Senders and Sowards, 1952), which indicate that subjects bias their estimates depending on the past stimuli they have seen, despite the fact that there is no real dependence in the stimuli that are shown. Our model also resonates well with recent behavioral data showing that noise in the prior is an important component for models of human probabilistic inference (Acerbi et al., 2014). Therefore, in our model, the timescale of correlations corresponds to the timescale at which subjects adapt their prior expectations about the stimuli they see. Ultimately, we expect that both processes (the mechanisms of the inference process itself and the variability in the subject's priors from trial to trial) create correlated variability. Separating these aspects through their underlying timescales and/or manipulations in the task contingencies is an interesting avenue for further theoretical and experimental work.

In addition to offering a parsimonious account of neuronal variability and covariability, our model has implications for how we should interpret the effect of attention as it relates to improvements in perceptual performance. Recent studies argue that spatial attention improves behavioral performance primarily by reducing correlations (Cohen and Maunsell, 2009; Mitchell et al., 2009). However, if the reduction of correlations observed under attention is indeed due to a suppression of attentional gain fluctuations, as our model would suggest, this reduction of correlations is irrelevant for the coding accuracy of the population and cannot be the mechanism improving behavioral performance. In terms of Fisher information, the only difference that matters in this case is the increase in response gain, which leads to a proportional increase in Fisher information, at least so long as performance is not limited by input noise.

Although gain fluctuations are irrelevant for coding, this is not true for all types of attentional fluctuations: when the fluctuations in attention occur around a specific feature value rather than in the gain, they introduce differential correlations, a pattern of correlations that leads to information saturation (Moreno-Bote et al., 2014). Thus, our model leads to an interesting, but at the same time puzzling, observation: because the attentional state is represented by a finite number of neurons, there is necessarily some trial-to-trial variability in the attended feature itself. Such fluctuations indeed impair the readout because it cannot have exact (i.e., noiseless) access to the attended feature when implemented in neural hardware. Thus, the precision of the attentional mechanism itself places a limit on how accurately a stimulus can be represented by a sensory population, and this limit can at least in principle be substantially lower than the amount of sensory information entering the brain through the eye.

If attentional fluctuations can limit behavioral performance, one may ask why there should be an attentional mechanism in the first place. We will give two speculative answers in the following.

First, as discussed earlier, we can think of attention as the subject's prior. Using prior information to bias an estimate toward more likely solutions improves the estimate on average (over all possible stimuli) if the distribution of stimuli is nonuniform. In the real world, in situations where the sensory input is noisy or ambiguous and decisions have to be made quickly, such a bias is usually beneficial and outweighs the small extra noise added due to variability in the prior.

Second, the primary goal of attention may not be to improve the sensory representation, as suggested by recent studies characterizing neural responses (Cohen and Maunsell, 2009; Mitchell et al., 2009). In typical laboratory experiments involving two-alternative forced-choice discrimination at perceptual threshold, there is no prior information that could be used to improve performance. The feedforward sensory signal contains all the information available to the organism, so it is unclear where the additional information should come from. However, in the real world, the sensory signal usually contains a lot of nuisance variables, which are irrelevant to the task and thus act as a crucial noise source. Therefore, the goal of attention may be instead, as suggested by the traditional view (Broadbent, 1958), to select the relevant pieces of information and suppress the irrelevant. This selection is precisely what the gain profile implements: it enhances attended stimuli and suppresses unattended ones. The behavioral improvement due to attention may therefore not be a result of an improved representation at the level of sensory populations but instead result from the fact that the decision is not corrupted by irrelevant information (noise or nuisance) from the distractors.

## Notes

Supplemental material for this article is available at http://bethgelab.org/code/ecker2015. It contains MATLAB code to reproduce all figures and numerical simulations in this paper. This material has not been peer reviewed.

## Footnotes

This work was supported by the Bernstein Center for Computational Neuroscience, Tübingen, Germany (FKZ 01GQ1002), the German Excellency Initiative through the Centre for Integrative Neuroscience Tübingen (EXC307), National Eye Institute–NIH Grants R01-EY018847 and P30- EY002520-33 to A.S.T., and the National Institutes of Health Pioneer Award DP1-OD008301 to A.S.T. We thank Ralf Haefner and Philipp Berens for helpful discussions and comments.

The authors declare no competing financial interests.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

- Correspondence should be addressed to Alexander S. Ecker, Centre for Integrative Neuroscience, University of Tübingen, Otfried-Müller-Strasse 25, 72076 Tübingen, Germany. alexander.ecker{at}uni-tuebingen.de

This article is freely available online through the *J Neurosci* Author Open Choice option.