On the structure of neuronal population activity under fluctuations in attentional state

Attention is commonly thought to improve behavioral performance by increasing response gain and suppressing shared variability in neuronal populations. However, both the focus and the strength of attention are likely to vary from one experimental trial to the next, thereby inducing response variability unknown to the experimenter. Here we study analytically how fluctuations in attentional state affect the structure of population responses in a simple model of spatial and feature attention. In our model, attention acts on the neural response exclusively by modulating each neuron’s gain. Neurons are conditionally independent given the stimulus and the attentional gain, and correlated activity arises only from trial-to-trial fluctuations of the attentional state, which are unknown to the experimenter. We find that this simple model can readily explain many aspects of neural response modulation under attention, such as increased response gain, reduced individual and shared variability, increased correlations with firing rates, limited range correlations, and differential correlations. We therefore suggest that attention may act primarily by increasing response gain of individual neurons without affecting their correlation structure. The experimentally observed reduction in correlations may instead result from reduced variability of the attentional gain when a stimulus is attended. Moreover, we show that attentional gain fluctuations – even if unknown to a downstream readout – do not impair the readout accuracy despite inducing limited-range correlations.

1 Introduction situation where the neurons in the population span a large range of preferred features. We chose 80 this (somewhat arbitrary) distinction, because it reflects the typical situation in an experiment, 81 where neurons with similar retinotopic locations are recorded, which typically span a large range 82 of preferred orientations or directions. Throughout this paper we assume that spatial and feature attention are independent processes and consider them in isolation. We further assume that the experimenter does not have access to the attentional state on individual trials, but can only control its average over many trials: In addition the attentional state fluctuates from trial to trial with unknown variance where the outer expectation (covariance) is taken over α and the inner covariance (expectation) . For reasonably small q 2 we can approximate the gain profile by its first-order 97 Taylor expansion where h ′ i is the derivative with respect to ψ. Using this approximation we can write E[h i (ψ)] ≈ information about the orientation of the stimulus. For simplicity we assume that neurons produce 104 spikes conditionally independently given the stimulus orientation θ and the attentional gain g: The attentional gain g is shared among all neurons and drawn from a Gamma distribution with shape µ 2 /σ 2 and scale σ 2 /µ, which implies E[g] = µ and Var[g] = σ 2 . Assuming that the experimenter does not know the attentional gain, the distribution P (y|θ) obtained by marginalizing over g is a multivariate negative binomial distribution: For the Fisher information J = E d 2 dθ 2 log P (y|θ) we need the derivatives of the log-likelihood: Plugging into the formula for Fisher information, re-ordering the summations over y and i, and using the facts y P (y|θ) = 1 and y P (y|θ)y i = E[y i ] = µf i , we obtain The first term in the above equation is the Fisher information of an independent population of 106 neurons and therefore O(N ), while the second term is O(1): for homogeneous population of neurons, Here the approximation holds because for large N the width of the distribution of f i becomes 115 narrower relative to its mean and therefore the expected value of the second term converges to the 116 ratio of the expected values of numerator and denominator. The equality holds because and where F ii = (1 + νh i (ψ))f i (θ) and u i = h i (ψ)f i (θ) as above. Plugging in and simplifying we obtain As above for spatial attention, the O(1) correction term is exactly zero for homogeneous populations 129 and the derivation for heterogeneous populations follows the same line of argument as above. Fluctuations of the attended feature create differential correlations, i. e. response variability that is identical to variability induced by changes in the stimulus. Here we derive this result using a Generalized Linear Model formulation (see also Eqs. 52, 53 in Results): Sincex is independent of 132 the neurons, it is obvious that attention has exactly the same effect as a change in the stimulus.

133
Assuming E[ψ] = θ, Var[ψ] is small, and (without loss of generality) θ = 0, we have Moreover, we can write the attention-perturbed stimulusθ as For large N the Poisson noise averages out and therefore the resulting Fisher information is simply 136 the inverse of the variance of the (attention-perturbed) stimulus: where α > 0 is the amount of spatial attention allocated to the stimulus in the neurons' receptive 146 field. We do not require any distributional assumptions on α, except for its mean E[α] = µ and 147 variance Var[α] = σ 2 (Fig. 1C). Under this model, the average spike count of a neuron is given by By convention we refer to the case of µ = 1 as the sensory response, which is the neural response 149 to the stimulus in the absence of any attentional modulation. In experimental conditions where 150 the stimulus is attended µ a > 1 (Fig. 1D). When attention is directed towards a different stimulus  Because the attentional state fluctuates from trial to trial, the underlying firing rate also fluc-156 tuates. By applying the law of total variance we obtain the spike count variance ( Fig. 2A):    Similar to the variances, we can compute the covariance between two neurons, which is given 163 by the product of the firing rates and the variance of the attentional gain ( Fig. 2B): Recall that neurons are assumed to be conditionally independent given the attentional gain. Thus, 165 any covariability arises exclusively from gain fluctuations. As a result, the covariance matrix 166 (Fig. 2D) can be expressed as a diagonal matrix plus a rank-one matrix: Note that the assumption of conditional independence could be relaxed without affecting any of 168 the major results qualitatively: the diagonal matrix in the equation above would simply be replaced 169 by the (non-diagonal) point process covariance matrix.
Experimental studies more typically quantify spike count correlations rather than covariances.
The spike count correlations induced by a fluctuating attentional gain increase with firing rates where β is the feature gain that controls how strongly the feature ψ (in this case direction of Attending to a direction of motion biases the population response towards this attended stimulus. While each neuron's tuning curve is gain-modulated as a whole (panel A), the population response is no longer equal to the individual neurons' tuning curves, but instead sharpened/broadened and its peak is moved.
shape of the population response is no longer identical to that of the individual neuron's tuning 211 curve. We start by assuming that the subject always attends the same direction (i. e. ψ is constant) 212 and consider the effect of fluctuations in the strength of attention, that is the gain β. We will come 213 back to fluctuations in the attended direction below.

214
Similar to spatial attention, fluctuations in feature attention lead to overdispersion of the spike counts relative to a Poisson process (because rate variability is added).
where ν = E[β] and τ 2 = Var[β] are the mean and the variance of the feature attention gain, 215 respectively. The degree of overdispersion not only increases with the neuron's firing rate, but also 216 depends on the neuron's preferred direction relative to the attended direction (Fig. 4A). Inter-217 estingly, spike counts are more overdispersed at the null direction than at the preferred direction 218 (Fig. 4A: compare blue vs. black and green vs. yellow). The Fano factor (variance/mean) is given Cov The sign of the covariance is determined by the product of h i and h j , which depends on the 226 attended direction and the preferred directions of the two neurons (Fig. 4B). For two neurons 227 with identical preferred directions, the covariance is always positive while for two neurons with 228 orthogonal preferred directions it is always negative. For any pair of neurons in between, it can be 229 both positive and negative, depending on the stimulus (Fig. 4B). Again, the covariance matrix can 230 be written as diagonal plus rank one: where F ii = (1 + νh i (ψ))f i (θ) and u i = h i (ψ)f i (θ).

232
As for spatial attention, averaging correlations over multiple stimulus conditions to represent where h ′ i = d dψ h i and we have abbreviated h i ≡ h i (θ) and f i ≡ f i (θ). As before, we can write the 240 covariance matrix as diagonal plus rank one: where We can combine the two cosine terms and obtain: where J 0 is again the information in an independent population and ε = Var  that depend on firing rates (Fig. 2C), but the same result is also predicted by the thresholding non- which can be rewritten as a linear function of the attentional state and the stimulus: where α and b = β ·[cos ψ, sin ψ] T represent the state of spatial and feature attention, respectively,

413
Our model leads to a second interesting observation: It is likely that not only the attentional 414 gain fluctuates from trial to trial, but also the attended feature itself. Such fluctuations introduce 415 differential correlations, which indeed impair the readout (unless it has exact access to the attended 416 feature). Thus, the attentional mechanism itself places a limit on how accurately a stimulus can 417 be represented by a sensory population, and this limit can at least in principle be substantially 418 lower than the amount of sensory information entering the brain through the eye. This insight may 419 trigger the question: why, then, should there be an attentional mechanism in the first place? There 420 are a number of possible answers to this question.

421
First, we can think of attention as a prior. Using prior information to bias an estimate towards 422 more likely solutions will on average improve the estimate. In situations where the stimulus is noisy 423 and decisions have to be made fast, such a bias is most beneficial and outweighs the small extra 424 noise added due to variability in the prior. Conversely, in situations where there is lots of sensory 425 evidence, the full information content present in the eye is rarely necessary in real-world situations, 426 and, therefore, the noise added due to attentional fluctuations does not matter either.

427
Second, it should be noted that for change-detection paradigms that are typically employed 428 in attention experiments, the estimation framework that asks how well a stimulus value can be 429 reconstructed (e. g. Fisher information) is not quite appropriate. In such tasks the subject never 430 judges the absolute direction (or any other feature) of the stimulus, but instead has to detect a small 431 change, that is the difference between two subsequent stimuli. In this case any errors introduced 432 due to fluctuations in the attended direction cancel out, since they affect both stimuli roughly 433 equally, at least so long as attentional fluctuations occur at a timescale that is slow enough, such 434 that the attentional state is approximately the same for both the pre-and post-change stimulus.