Attention Induced Gain Stabilization in Broad and Narrow-Spiking Cells in the Frontal Eye-Field of Macaque Monkeys

Top-down attention increases coding abilities by altering firing rates and rate variability. In the frontal eye field (FEF), a key area enabling top-down attention, attention induced firing rate changes are profound, but its effect on different cell types is unknown. Moreover, FEF is the only cortical area investigated in which attention does not affect rate variability, as assessed by the Fano factor, suggesting that task engagement affects cortical state nonuniformly. We show that putative interneurons in FEF of Macaca mulatta show stronger attentional rate modulation than putative pyramidal cells. Partitioning rate variability reveals that both cell types reduce rate variability with attention, but more strongly so in narrow-spiking cells. The effects are captured by a model in which attention stabilizes neuronal excitability, thereby reducing the expansive nonlinearity that links firing rate and variance. These results show that the effect of attention on different cell classes and different coding properties are consistent across the cortical hierarchy, acting through increased and stabilized neuronal excitability. SIGNIFICANCE STATEMENT Cortical processing is critically modulated by attention. A key feature of this influence is a modulation of “cortical state,” resulting in increased neuronal excitability and resilience of the network against perturbations, lower rate variability, and an increased signal-to-noise ratio. In the frontal eye field (FEF), an area assumed to control spatial attention in human and nonhuman primates, firing rate changes with attention occur, but rate variability, quantified by the Fano factor, appears to be unaffected by attention. Using recently developed analysis tools and models to quantify attention effects on narrow- and broad-spiking cell activity, we show that attention alters cortical state strongly in the FEF, demonstrating that its effect on the neuronal network is consistent across the cortical hierarchy.


Introduction
Attention alters neuronal firing rates (Moran and Desimone, 1985;Treue and Maunsell, 1996;McAdams and Maunsell, 2000;Reynolds et al., 2000;Roberts et al., 2007;Herrero et al., 2008), reduces trial-to-trial rate variability (Mitchell et al., 2007;Hussar and Pasternak, 2010;Falkner et al., 2013;Herrero et al., 2013), affects correlations of rate variability Mitchell et al., 2009;Ruff and Cohen, 2014), and the strength of oscillatory coupling in specific frequency bands within and between areas (Fries et al., 2001;Gregoriou et al., 2009;Chalk et al., 2010;Buschman et al., 2012). These alterations increase the ability of the network to encode sensory information and form sensory representations. In many areas of the visual and frontal cortex, different cell types are differently affected by attention and task engagement (Mitchell et al., 2007;Diester and Nieder, 2008;Johnston et al., 2009;Hussar and Pasternak, 2010;Kaufman et al., 2010;Viswanathan and Nieder, 2015). In visual cortex, narrowspiking cells (putative inhibitory interneurons) show stronger attention-induced rate modulations and stronger rate variability reduction than broad-spiking cells (putative pyramidal neurons) (Mitchell et al., 2007). Changes in rate variability are often analyzed through quantification of the Fano factor (FF) (Tolhurst et al., 1983;Vogels and Orban, 1991;Mitchell et al., 2007;, defined as the rate variance divided by the mean rate. Reduction in rate variability not only occurs with attention, but also prominently with sudden stimulus onset , movement planning , and is generally linked to a change in the state of the processing network Harris and Thiele, 2011;Ecker et al., 2014). Therefore, the effect of attention on rate variability should be consistent across cortical areas. Indeed, in most occipital, parietal, and frontal cortical areas, stimulus onset or motor-and attention-related activity changes reduce the FF, with the notable exception of the FEF (Chang et al., 2012;Purcell et al., 2012). In the FEF, stimulus presentation reduces the FF, but spatial attention or visual search (a form of successive spatial attention) do not (Chang et al., 2012;Purcell et al., 2012). FEF is the source of attentional signals, affecting upstream sensory processing, and effects of attention could thus differ from those seen in sensory areas. Conversely, the discrepancy could be due to a nonlinear relationship between the variance and the mean of the firing rate (Vogels and Orban, 1991;Gur et al., 1997;Zinke et al., 2006) that cannot be captured by FF. A nonlinear relationship between mean rate and rate variance could cause FF to change as firing rates change, whereby the details depend on the form of the nonlinearity. Recently developed methods that partition rate variability Goris et al., 2014) address this, aiming to quantify the sources of variability that arise from fluctuations in cell excitability. The analysis assumes that neuronal firing rate is subject to a doubly stochastic process and variability arises from a component inherent to the spike generation process (Poisson in nature) and a component arising from overall changes in excitatory drive to the neuron that occur over time. Gain variance quantifies the latter (Goris et al., 2014).
Here, we report the effects of attention on firing rate modulations and on gain variance in narrow-spiking and broad-spiking cells in area FEF. Narrow-spiking cells have higher firing rates, stronger attentional rate modulation, lower gain variance, and stronger attention-induced gain variance reduction than broadspiking cells. However, both cell classes show reduced gain variance, a signature of altered cortical state, when attention is directed to their receptive field. The results are captured by a simple model in which attention reduces the exponent of the expansive nonlinearity that links firing rate and rate variance. This demonstrates that altered cortical state through task engagement is a general feature of the cortical architecture regardless of the hierarchical level.

Materials and Methods
Procedures and animals. All procedures were performed in accordance with the European Communities Council Directive RL 2010/63/EC, the National Institutes of Health's Guidelines for the Care and Use of Animals for Experimental Procedures, and the UK Animals Scientific Procedures Act. In the present investigation, two adult awake male macaques (Macaca mulatta, age 5-9 years, weight 11-15 kg) were used.
Surgical preparation. The monkeys were implanted with a head post and recording chambers over area FEF under sterile conditions and under general anesthesia. Surgery and postoperative care were identical to those published in detail previously .
Identification of recording sites. Area FEF was identified by means of structural MRI initially, targeting the anterior bank of the arcuate sulcus. It was then confirmed by means of neuronal response properties: visual receptive field (RF) size and topography (Bruce et al., 1985), memoryguided saccade responses (persistent activity during the memory period), saccade-related responses to the visual/motor field, and low-current (50 mA) electrical saccade induction (Bruce et al., 1985). Finally, one monkey, at the end of the experiments, was killed with an overdose of pentobarbital and perfused through the heart. Details of the perfusion and histological procedures have been described previously (Distler and Hoffmann, 2001). The location of the recording sites in area FEF in that monkey was verified in histological sections stained for cytoarchitecture and myeloarchitecture.
Receptive field (RF) and saccade field (SF) mapping. The location and size of RF was measured by a reverse correlation method. A black square (1-3°size, 100% contrast) was presented at pseudorandom locations on a 9 ϫ 12 grid (5-25 repetitions for each location, 100 ms stimulus presentation, 100 ms interstimulus interval) while monkeys kept fixation on a central fixation point (FP). Details of the RF mapping were published in detail previously (Gieselmann and Thiele, 2008). RF eccentricity in this study ranged from 2°to 13°and RFs were largely confined to the contralateral visual field.
After RF mapping, we mapped SFs. Monkeys had to achieve fixation and, 500 ms thereafter, a saccade target was briefly flashed in one of nine possible locations. If a clear visual RF had been discernible, one saccade target would appear centered on the RF, whereas the remaining eight locations would be equally spaced on a circle around the RF. If no clear visual RF was present, the SF was mapped by placing saccade targets at various eccentricities on a circle around the fixation spot, thereby initially determining the approximate (memory-guided) SF location. If online determination of neuronal activity revealed a certain saccade direction preference, then additional saccade mapping was performed such that the previously determined preferred saccade location was chosen to be the center of a further saccade mapping with a total of 9 saccade targets spanning a homogenous grid with a spacing of ϳ1-5°, depending on main target eccentricity. If this assessment yielded a clear preference (as assessed from online thresholded spiking activity) for a specific saccade target location, this location was defined as the SF. If no such preference could be determined, then the next cell was searched for. For simplicity, both RF and SF will be referred to as RF below.
Behavioral task and stimuli. Monkeys were trained to fixate a white FP (0.1°diameter) on a gray background (1.72 cd/m 2 ) presented centrally on a 20 inch analog cathode ray tube monitor (110 Hz, 1600 ϫ 1200 pixels, 57 cm from the animal). Eye position was monitored with an infrared based system (Thomas Recording, 220 Hz) with a fixation window of Ϯ0.7-1.5°. Each trial was initiated when the monkey held the touch bar and fixated on the central point ( Fig. 1). Then, 500 ms after fixation onset, three stimuli appeared on the screen equidistant from the fixation spot. One stimulus was always centered on the RF of the recorded neuron. The other stimuli were presented equidistant on an invisible circle centered on the fixation spot (Fig. 1). These were always presented in the same location while a cell was recorded (i.e., across trials), but would/could differ between recordings of different cells. Stimuli were square wave gratings of 2-6°diameter (depending on the RF size and the RF eccentricity). Gratings differed in color, but one was always red/gray (r ϭ 17.08 cd/m 2 , gray; 1.72 cd/m 2 ), one green/gray (G ϭ 12.43 cd/m 2 , gray; 1.72 cd/m 2 ), and one blue/gray (B: 13.20 cd/m 2 , gray; 1.72 cd/m 2 ). The locations of the colors were pseudorandomly assigned on a daily basis, but the color locations were fixed for a given recording session. Grating orientation was at a random angle to the vertical meridian on a daily basis, but the angle was fixed for every neuron recorded. Gratings moved perpendicular to the orientation, whereby the direction of motion was pseudorandomly assigned for every trial. After a randomly selected time of 300 -1400 ms, a central cue appeared. The cue was green, blue, or red, indicating which of the three gratings would be behaviorally relevant on the current trial (the cue color that matched the color of the relevant grating). Cue selection occurred pseudorandomly. After 600 -1750 ms, one pseudorandomly selected grating changed luminance (luminance after dimming: B ϭ 2.66 cd/m 2 ; G ϭ 2.8 cd/m 2 ; r ϭ 2.0 cd/m 2 ). If the cued grating had changed luminance, the monkey had to release a central touch bar within 600 ms to obtain a fluid reward. If an uncued grating had changed luminance, the animal had to ignore it and wait for the cued grating to change luminance. This could happen after another waiting time of 600 -750 ms or after an additional waiting time of 1200 -on the central fixation spot. The task had no catch trials; that is, the cued grating always changed luminance, but the order thereof was unpredictable up to the point when the second grating had changed luminance. The timing of the dimming was also unpredictable within the time period indicated above.
Data acquisition. Neurons were recorded with tungsten in glass electrodes (fabricated in house, impedance of 0.5-2 M⍀ measured at 1 kHz), which were lowered into FEF by means of Narishige microdrives (Mo-95). Neuronal data were acquired with Neuralynx preamplifiers and a Neuralynx Digital Lynx amplifier. Unfiltered raw data were written to the disc and sampled with 24 bit at a sampling rate of 32.7 kHz. Data were replayed offline and band-pass filtered at 0.6 -9 kHz for offline spike sorting. Spikes were sorted manually using SpikeSort3D (Neuralynx).
Data analysis. Only correct trials were analyzed in the context of this study. Neuronal activity was aligned to the stimulus, to the cue, and to the first or second dimming onset. For the purposes of this study, the activity was analyzed quantitatively from Ϫ500 to 0 ms before the first dimming happened and the activity from Ϫ500 to 0 ms before the second dimming happened (for trials when no luminance change has happened in the stimuli located at the RF during the first dimming). The latter gave qualitatively identical results to the first dimming period and the relevant analyses are thus only explicitly reported in a few cases. Given that there were three attention conditions (attend-RF and two attend-away conditions) and two different stimulus motion directions, there were six conditions total. A two-factor ANOVA was calculated for the predimming activity to determine whether attention and direction of motion had a significant effect on neuronal activity and if there was a significant interaction. Cells that showed a significant main effect of attention or a significant interaction ( p Ͻ 0.05) were classified as attention modulated.
Analysis of cell type. To classify cells as broad or narrow spiking, spline interpolation was performed on the original waveforms because these had a temporal resolution of 30.5 s between sampling points. Spline interpolation was done to obtain a resolution of 5.4 s (Mitchell et al., 2007), which allows for better estimates of peak to trough times. Two slightly different methods were used to classify cells as narrow or broad spiking. First, peak to trough time was used as a classification criterion. Broad-spiking cells have longer peak to trough times than narrowspiking cells, in which the cutoffs used often are at ϳ200 -220 s (Mitchell et al., 2007;Hussar and Pasternak, 2009;Jacob et al., 2013). In our sample, a cutoff of 250 s was more appropriate because it separated the significantly bimodal distribution (Hartigan's dip test, p ϭ 0.016, calibrated Hartigan's dip test, p Ͻ 0.001) of peak to trough times. At the same time, it was on the conservative side of classifying narrow-spiking cells as broad because the cutoff was located on the narrow-spiking side of the bimodal separation (Fig. 2). The difference in cutoff times used between different laboratories is likely due to different filter design and settings during spike recording because the bandwidth and the type of filter used affects the spike shape and width. The band-pass filter of 600 -9000 Hz that is implemented in the Neuralynx data acquisition software was used; the exact filter settings used in previous studies are unknown.
A recently published method (Ardid et al., 2015) was also applied, which classifies cells based on peak to trough times and on repolarization times, and the ensuing principle component distribution thereof (https://bitbucket.org/sardid/waveformanalysis). This approach equally resulted in a significantly bimodal distribution of principle components ( p Ͻ 0.027, calibrated Hartigan's dip test) and classified cells into narrow, fuzzy, and broad spiking. Cells classified as narrow and broad spiking using this method yielded virtually identical results for all analyses reported.
Quantification of attentional modulation. To investigate effects of attention on neuronal firing rates, a mean rate-based approach and an ideal-observer-based approach were used. Specifically, an attention modulation index relative to precue (MIprecue) activity and a modulation index between attend-RF and attend-away (attMI) conditions were calculated as follows: MIprecue ϭ attend RF activity Ϫ precue activity attend RF activity ϩ precue activity (1) attMI ϭ attend RF activity Ϫ attend away activity attend RF activity ϩ attend away activity (2) Quantification of attentional modulation using an ideal-observer-based approach was done by calculating the area under the receiver operating characteristic (AUROC). This method is based on signal detection theory, which calculates the overall probability that a random sample of neuronal activity (i.e., spikes/s) selected during one attention condition is larger than a sample selected in the alternative attention condition (Green and Swets, 1966;Tolhurst et al., 1983;. The fidelity of this judgment depends on the degree of overlap between the attend-RF-elicited activity distribution and the attend-away-based activity distribution (i.e., the less overlap, the more reliable the judgment). FF and gain analysis. For each cell, the FF ϭ variance of rate/mean rate for the attend-RF and the attend-away conditions were calculated. The FFs for the two attend-RF and the FFs for the four attend-away conditions were then averaged. To determine whether attention significantly altered the FF at the population level, a Wilcoxon signed-rank test was used. FF modulation indices (FFMI) were calculated as follows:  Figure 1. Diagram of the task and the relevant events. Monkeys fixated centrally. Then, 500 ms after fixation onset, three colored gratings were presented equidistant from the fixation spot. One of the gratings was placed in the RF of the neuron under study. After a variable time (300 -1400 ms), a central colored cue indicated which stimulus was behaviorally relevant on the current trial.
The animal had to covertly monitor this stimulus and wait for it to change luminance (referred to as dimming in the figure). The target dimming could occur first, second, or third in the sequence of dimming events (left to right in the figure). Distracter dimming had to be ignored by the monkey. Detection of target dimming was indicated by releasing a hand-held touch bar. For additional details, see the Materials and Methods.
excitability from trial to trial are assumed to follow a gamma distribution, which makes the overall rate variance follow a negative binomial distribution (Goris et al., 2014). Fitting the single trial rate (count data) with a negative binomial yields a gain variance term, which captures the magnitude of the change in excitability from trial to trial. The two attend-RF conditions were used to obtain an estimate of the attend-RF gain variance and the four attend-away conditions were used to obtain an attend-away gain variance estimate. A gain variance (GV) modulation index was calculated as follows: GVMI ϭ GV attend RF Ϫ GV attend away GV attend RF ϩ GV attend away (4) The rate data were also fitted with a Poisson model to compare the quality of fits between the two models using maximum likelihood estimators. Model comparison was based on Akaike and Bayesian information criteria (AIC and BIC, respectively) analysis and associated AIC and BIC weights (Burnham and Anderson, 2004). AIC corresponds to (Burnham and Anderson, 2004) the following: where the L is the maximized likelihood function of the model fit and k corresponds to the number of free parameters in the model; that is, 1 and 2, respectively. This yielded AIC 1 to AIC 2 for the two models, respectively. Finally, model comparison was based on Akaike weights (w i ) as follows: where ␦ i corresponded to the AIC i Ϫ AIC min and AIC min is the smallest AIC obtained for the model fits. The larger the w i , the more evidence in favor of model i.
The BIC is often used instead of AIC. BIC applies a larger penalty on free parameters in the model and is calculated according to the following: where L is the maximized likelihood function of the model fit for each neuron and model and k corresponds to the number of free parameters in the model; that is, 1 and 2 respectively, and n corresponds to the number of data points (i.e., trials for each neuron and condition). The BIC weights are calculated as in Equation 6. The larger the BIC weight, the more evidence in favor of a given model.

Results
We recorded 349 cells from area FEF in two monkeys (148 from Monkey 1 and 201 from Monkey 2) while monkeys performed the covert cued visual top-down attention task. We followed published methods (Mitchell et al., 2007;Ardid et al., 2015) to determine the spike width of each cell. The distribution of spike widths in our sample was significantly bimodal (Hartigan's dip test: p Ͻ 0.001; Fig. 2). Using a conservative estimate of the trough of the bimodal spike-width distribution (250 s cutoff) to cluster cells into two classes resulted in 256 cells being classified as broadspiking cells and 93 cells being classified as narrow-spiking cells (Monkey 1 broad: 82, narrow: 66, Monkey 2 broad: n ϭ 174, narrow: n ϭ 27). Using the classification method suggested by Ardid et al. (2015) slightly altered absolute numbers assigned to a given cell class, but had no effect on the overall outcome reported below. Figure 3 shows the normalized population activity for the attend-RF and the attend-away condition separately for broadand narrow-spiking cells recorded in the two monkeys. Normalization was done by dividing each bin of the single-cell histograms by the maximum bin that occurred for this cell in any of the four periods shown. Attentional effects were absent during the initial stimulus phase (Fig. 3A) because the attention cue had not been presented at that time. After cue onset (Fig. 3B), the average activity for attend-RF conditions gradually increased, whereas the average activity for the attend-away conditions gradually decreased. This pattern continued until the point of first dimming (Fig. 3C). The evolution of activity during the second dimming period is somewhat different. Initially (after the first dimming), the activity differences decrease slightly, possibly related to a brief waning of attentional focus, followed by a slight increase in attentional modulation (Fig. 3D). Figure 3 shows that narrow-spiking cells on average had higher stimulus-driven firing rates than broad-spiking cells, suggesting that narrow-spiking cells had stronger attentional modulation. Assessment of spontaneous activity also showed that narrow-spiking cells had higher baseline firing rates than broad-spiking cells ( p ϭ 0.016, ranksum test).

Modulation of trial-averaged firing rates with attention
To assess attentional modulation at the single-cell level, we compared the distributions of single-trial activity in the two attention conditions during the period of 500 -0 ms before the first dimming. A total of 287 cells (287/349, p Ͻ 0.05, 2-factor ANOVA, see Materials and Methods) were significantly affected by attention (broad-spiking, n ϭ 213/256; narrow-spiking, n ϭ 74/93; Monkey 1: broad-spiking, n ϭ 67; narrow-spiking, n ϭ 48; Monkey 2: broad-spiking, n ϭ 146; narrow-spiking, n ϭ 26). The data presented hereafter are from attention-modulated cells (n ϭ 287). Extending the analyses to cells not significantly affected by attention did not change any of the conclusions reported. Performing the analysis on the second dimming period did not change qualitatively the results of any of the data reported here.
To analyze to what extent narrow-spiking cells and broadspiking cells alter their firing rate with attention, we plotted absolute firing rates for attend-RF and attend-away conditions for the two cell types. Figure 4A shows that the majority of cells show higher firing rates in the attend-RF condition than the attendaway condition, but some cells showed the opposite pattern. The latter seemed to occur more often in broad-spiking than in narrow-spiking cells. The firing rate change induced by attention (relative to precue activity) for the two cell types is shown in Figure 4B.
A mixed-model ANOVA revealed that the neuronal activity was significantly affected by the time period analyzed ( p Ͻ 0.001, poststimulus period, postcue period, and predim period), by attention ( p ϭ 0.002, attend-RF vs attend-away), whereas cell type (broad vs narrow) on its own did not show significant rate differences. However, we found a significant interaction between cell type and time period analyzed ( p Ͻ 0.001) and a significant triple interaction among cell type, analysis period, and attention ( p ϭ 0.002). Specifically, narrow-spiking cell activity during the predimming period was higher than broadspiking cell activity for the attend-RF condition ( p Ͻ 0.001, post hoc testing). When analyzed relative to precue activity, we found a significant interaction between cell type and attention (mixedmodel ANOVA, p Ͻ 0.001). Narrow-spiking cells showed higher activity relative to precue activity than broad-spiking cells for the attend-RF condition ( p ϭ 0.017 post hoc testing; Fig. 4B) and lower activity for the attend-away condition ( p ϭ 0.028 post hoc testing; Fig. 4B).
Figures 3, B-D, and 4B suggest that attention to the RF increased firing rates for attend-RF conditions and attention away from the RF decreased firing rates relative to precue activity levels for the overall population. However, we found heterogeneity in terms of the changes relative to precue activity in our cell sample. To quantify this, we calculated modulation indices relative to mean precue activity for the two attention conditions for each cell (MI precue ). Not all cells showed positive MI precues ; that is, some reduced their activity levels below precue levels during attend-RF conditions and some increased their activity during attend-away conditions to levels above the precue activity. This was the case for some broad-spiking cells and for some narrow-spiking cells. Although the medians of MI distributions for the attend-RF ac-   tivity relative to precue activity did not differ between narrowspiking and broad-spiking cells (Fig. 5C, p ϭ 0.481, rank-sum test), narrow-spiking cells nevertheless showed MI precue Ͻ0 less often than broad-spiking cells did ( p ϭ 0.02, 2 test); that is, they were less likely to reduce their attend-RF associated activity below the precue activity than broad-spiking cells were. Figures 3, B-D, and 4B suggest that attentional modulation was stronger for narrow-spiking cells than for broad-spiking cells. We quantified the strengths of attentional modulation by calculating the AUROC. We found that narrow-spiking cells had significantly larger AUROC (i.e., larger attentional modulation) than broad-spiking cells (Fig. 5, p Ͻ 0.001, rank-sum test). Moreover, narrow-spiking cells less often showed AUROCs Ͻ0.5 ( p ϭ 0.011, 2 test) compared with broad-spiking cells; that is, fewer narrow-spiking cells showed attention-induced activity reduction below attend-away levels than broad-spiking cells did (Fig.  5). An equivalent result was obtained using the attentional modulation index attMI instead of the ROC. AttMI indices were sig-nificantly larger ( p ϭ 0.003, rank-sum test) in narrow-spiking (attMI ϭ 0.355 Ϯ 0.027 SEM) than in broad-spiking cells (0.236 Ϯ 0.019 SEM) and significantly fewer narrow-spiking cells showed attMIs Ͻ0 (p ϭ 0.015, 2 test).

Modulation of trial-to-trial neural variability with attention
The AUROC analysis showed that an ideal observer could decode attend-RF and attend-away conditions from single-trial activity of single cells with 70 -80% accuracy in many cells, reaching close to 100% decoding accuracy in some cells. The ability of an ideal observer to decode the two task conditions (attend-RF and attend-away) depends on two factors. First, attention can increase the neural signal by enhancing the differences in trialaveraged responses across attention conditions. Second, it can reduce the neural noise by quenching the trial-to-trial variability of neural responses. The differences in attentional modulation between the two cell classes may result from difference in the effect of attention on neural signal, noise, or both. We now investigate whether attention affects rate variability in FEF and whether this differs between cell classes. However, we first wished to determine whether stimulus onset reduces rate variability in narrow and broad-spiking cells, as shown previously in many different studies (Chang et al., 2012;Purcell et al., 2012). This was indeed the case, as shown in Figure 6.
Previous studies investigated trial-to-trial variability in FEF by using the FF, a measure of neural noise defined as the ratio between variance and mean of spike count distributions. Surprisingly, previous studies in area FEF failed to show a rate variability reduction with attention (or more general task engagement) when assessed with the FF (Chang et al., 2012;Purcell et al., 2012). Therefore, attention might not stabilize the neuronal network in FEF even though it strongly affects firing rates.
However, FFs are measures of variability that are only accurate if the variance is proportional to the mean. If this is not the case (because of network excitability fluctuations, for example), the variance of spike counts may not be linearly related to the mean of the firing rate. Indeed, mean activity and variance appear to be linked by an expansive nonlinearity (Vogels and Orban, 1990;Zinke et al., 2006), which can be captured by the following function: The exponent x has been reported to be in the range of ϳ1.05-1.25 (Vogels and Orban, 1990;Zinke et al., 2006). A consequence of the expansive nonlinearity is that FF increases with increasing firing rates. Therefore, if attention affected the strength of the expansive nonlinearity (decreasing the exponent), then the FF might decrease, not change, or even increase depending on the strength of the attentional rate modulation and on the strength by which attention alters the expansive nonlinearity. This is illustrated for a simple example in Figure 7, A and B, modeling a condition in which attention reduced the exponent of the expansive nonlinearity from 1.08 to 1.06 while at the same time having a variable effect on the firing rate. In one case, attention increased the neuronal firing rate by a small amount (circles in Fig. 7A); in the other case, it increased the neuronal firing rate by a large amount (squares in Fig. 7A). The former resulted in attentioninduced FF reductions, whereas the latter did not cause a FF reduction (Fig. 7B). To explore the possible dissociation between attention-induced firing rate changes and FF changes more systematically, we assumed an exponent of x ϭ 1.15 for the attendaway condition. We systematically allowed attention to reduce the exponent x in steps of 0.002. Moreover, we assumed that attention to the RF increases firing rates relative to a baseline condition by the same amount as attention away from the RF reduces firing rates relative to the baseline condition. This scenario approximately captures the data shown in Figure 3. The attention-induced changes (increases/decreases) in firing rate occurred in 1 spikes/s increments for attend-RF and 1 spikes/s decrements for attend-away conditions, starting from no attentional modulation (i.e., attend-RF and attend-away yield identical rates). The results are shown in Figure 7C. Figure 7C shows that attention can increase FF or decrease FF, depending on the firing rate changes it causes and on the extent to which it reduces the expansive nonlinearity that links firing rates to rate variance. Given these results, FFs of narrow-spiking cells, which had higher firing rates and larger attentional modulation, might be differently affected by attention than FFs of broad spiking neurons, which had lower overall firing rates and smaller attentional modulation.
We indeed found that the FF was significantly reduced in broadspiking cells by attention (Fig. 8A, p ϭ 0.024, Wilcoxon signed-rank test), but not in narrow-spiking cells (Fig. 8A, p ϭ 0.898, Wilcoxon signed-rank test), which is consistent with predictions from the analysis shown in Figure 7C. To understand whether the discrepancies between cell types arose because cell types differ in how attention affected the neuronal firing rate or if they reflected cell-type-specific features, we subdivided cells of each type based on the strength of attentional rate modulation. The median of the attentional rate modulation served as the division border for each cell type population separately. We quantified attention-induced alterations of rate variability by calculating the attentional FF modulation index (FFMI). We found that the attention-induced FFMI reduction in broad-spiking cells varied with the strength of attention-induced rate modulation. Broad-spiking cells with an attentional rate MI smaller than the broad-spiking cell population median showed larger FF reduction with attention compared with broad-spiking cells with a larger attentional rate MI (p ϭ 0.03, rank-sum test). Performing the same analysis for narrow-spiking cells led to similar findings, although the differences in attention-induced FF reduction only showed a trend for significance, probably due to the smaller sample size (p ϭ 0.094, rank-sum test). Interestingly however, narrow-spiking cells that showed the largest rate increases when attention was directed to the RF on average showed an increase in FF with attention, whereas narrow-spiking cells with smaller attentioninduced rate increases on average showed a reduction of FF with attention. These results are consistent with the predictions shown in Figure 7, suggestingthatthestrengthofattention-inducedratechangeis the critical factor that determines how cognitive states affect FFs.

Supralinear modulation of trial-to-trial neural variability with attention arising from gain variations
Nonlinearity between rate variance and mean suggests that there are factors modulating neural responses over scales longer than typical interspike intervals. We hypothesized that they may arise from slow fluctuations of the gain of neural responses. To understand where these nonlinearities might arise, we applied a recently developed method that partitions response variability into a component modeling spike emission noise (described as a Poisson process) and a component that reflects fluctuations induced by trial to trial variations in gain (assumed to follow a gamma Both cell types show a brief strong reduction of FF after stimulus onset, which is consistent with previous reports. Narrow-spiking cells showed larger FF during sustained stimulusdriven responses than did broad-spiking cells. This is likely due to their overall higher firing rate and the expansive nonlinearity that links firing rate variance and mean firing rate. Shaded/ bounded areas represent SEM. distribution; Goris et al., 2014). Under these assumptions, a negative binomial probability model (Goris et al., 2014) can be fitted to single neuron spike count data to obtain an estimate of the gain variance. Using this model, we found that attention significantly reduced gain variance in broadand narrow-spiking cells (Fig. 8B, p Ͻ 0.001, Wilcoxon signed-rank test). Interestingly, these reductions were larger in narrowspiking cells than in broad-spiking cells (p ϭ 0.043, mixed-model ANOVA, interaction between cell type and attention on gain variance). These conclusions differ from the FF analysis, which is affected by the described confounds. Estimates of FF or gain variance aligned to dimming could be inflated by the variable time of dimming relative to cue onset and the gradual rate changes that occur over time (Fig. 3). Aligning data to dimming would thus sometimes use data from 600 ms after cue onset, when attention has not resulted in large activity increases/decreases relative to precue activity, and sometimes use data from up to 1700 ms after cue onset, when attention would have generated much larger activity increases/decreases relative to precue activity. Pooling across these different activity levels would artificially increase rate variability. This in turn could affect the comparison between the attend-RF and attend-away conditions. To control for this potential caveat, we repeated the same FF and gain variance analysis on our data aligned to cue onset, in which the time period of 500 ms after cue onset until 1000 ms after cue onset was used (trials in which dimming occurred earlier than 1000 ms after cue onset were eliminated). This ensured that the attention-induced activity increase/decreases were not affected by time relative to cue onset. As expected, FF and gain variance values were reduced, but the overall results did not change. FFs were reduced by attention in broadspiking cells (p ϭ 0.005, Wilcoxon signed-rank test), but not in narrow-spiking cell (p ϭ 0.677, Wilcoxon signed-rank test), whereas gain variance was significantly reduced in both cell types (p Ͻ 0.001, Wilcoxon signed-rank test, data not shown).
It could be argued that the changes in gain variance with attention are induced by alterations in firing rates due to attention if gain variance itself depends on firing rates. The large majority of cells showed higher firing rates in the attend-RF condition than the attend-away condition, which in turn might affect the gain variance distributions. To investigate this, we focused on our sample of cells that showed AUROC values Ͻ0.5; that is, cells that had lower attend-RF rates than attend-away rates. In this population, we found the same overall pattern of results (and a trend for significance); that is, the majority of cells had lower gain variance in the attend-RF condition than in the attend-away condition (n ϭ 41, p ϭ 0.08, Wilcoxon signed-rank test). Twenty-six cells showed lower gain variance with attend-RF, whereas 15 cells showed higher gain variance with attend-RF conditions. These 41 cells were mostly broad-spiking cells because narrow-spiking cells rarely showed reductions in firing rates during the attend-RF relative to the attend-away conditions (see, e.g., Fig. 5). It rules out the possibility that the effects were simply a consequence of difference in firing rate associated with attention conditions.  Figure 7. Effect of attention on firing rates, on the exponent of the expansive nonlinearity, and on the ensuing changes in the FF. A, Example case in which attention causes a systematic reduction of the exponent of the expansive nonlinearity (red and blue curves), but has either a small effect on firing rate increases (blue vs red dots) or has a more substantial effect on firing rate increases (blue vs red squares). B, Effect of the changes shown in A on the FF. C, Systematic exploration of attentional effects on the FF (color coded are FF difference between attend-RF and attend-away conditions) for different attentional rate modulations (relative to precue activity; assumed to be 50 spikes/s, x-axis) and different attentional reductions on the expansive nonlinearity (the exponent of the Variance ϭ mean rate x , y-axis). Attention-induced activity differences (attentional modulation) were assumed to occur in 2 spikes/s steps. A B Figure 8. Effect of attention on FFs and on gain variance in broad and narrow-spiking cells. A, FFs for the attend-RF (x-axis) and attend-away ( y-axis) condition for narrow-spiking (gray) and broad-spiking (black) cells. p-values indicate whether attention significantly affects FF (Wilcoxon signed-rank test). B, Gain variance for the attend-RF (x-axis) and attend-away ( y-axis) condition for broad (gray) and narrow (black) spiking cells. p-values indicate whether attention significantly affects gain variance (Wilcoxon signed-rank test).

A B C
As an additional control, we calculated the correlation between attentional rate modulation index and attentional gain variance modulation index. For this, we have subdivided our sample into cells in which attention increased firing rates (positive attentional rate modulation, as shown in Fig. 4) and cells in which attention decreased firing rates (negative attentional rate modulation). The latter was only possible for broad-spiking cells due to the small number of narrow-spiking cells with negative attentional rate modulation index. For broad-spiking cells with a positive attentional rate modulation index, we found a significant negative correlation (r ϭ Ϫ0.241, p ϭ 0.003, Spearman rank correlation) between the attentional rate modulation index and the attentional gain variance modulation index. For narrowspiking cells with a positive attentional rate modulation index, we did not find a significant correlation between attentional rate modulation index and attentional gain variance modulation index (r ϭ Ϫ0.169, p ϭ 0.182, Spearman rank correlation). Broadspiking cells with negative attentional rate modulation indices had a positive correlation between attentional rate modulation and attentional gain variance modulation, but this only showed a mild trend for significance (r ϭ 0.308, p ϭ 0.099). Two main conclusions arise from this analysis: (1) the link between attentional rate modulation and attentional gain variance modulation, even though significantly present in broad-spiking cells, yielded only a small negative correlation, explaining ϳ5% of the variance; and (2) the sign of attentional rate modulation does not predict the sign of attentional gain variance modulation; that is, broad-spiking cells suppressed by attention nevertheless show reduced gain variance when attention was directed to the RF of the neurons, corroborating the results reported above.
We also investigated whether changes in gain variance with attention can be found in cells that showed no attentional rate modulation. Narrow-spiking cells that did not show attentional rate modulation also did not show changes in gain variance with attention ( p ϭ 0.375, Wilcoxon signed-rank test). The same was true for broad-spiking cells ( p ϭ 0.483, Wilcoxon signed-rank test), Therefore, only the population of cells with significant attentional rate modulation showed significant reduction in gain variance with attention.
Our experimental design also entailed temporal expectation and changing uncertainty in relation to the location of target dimming as time progressed. Before the first dimming occurs (regardless of whether it is a target or distractor dimming), the target had a 33% probability of dimming, which changed to a probability of 50% for the second dimming. For the third dimming, this probability changed to 100%. Changes in gain variance might be affected by this expectation component or they might be due solely to the spatial orienting/attention component of the task. To investigate this, we also analyzed gain variance for attend-RF and attend-away conditions for the second dimming period. The third dimming period could not be analyzed in the same way because a dimming will have happened in the RF under attend-away conditions for the third dimming period. Therefore, an analysis before the third dimming would compare activities elicited by different stimulus intensities. The gain variance changes in the second dimming period attention were virtually identical to those in the first dimming period. We found a significant main effect of attention on gain variance ( p Ͻ 0.001), a mild trend for an effect of cell type ( p ϭ 0.090), and no interaction between the two factors ( p ϭ 0.246). Therefore, the spatial orientation/attention component affects the gain variance changes in a similar manner before the first and the second dimming period, even though the spatial uncertainty and the temporal expectancy have changed.
The above data show that attention reduces gain variance as assessed from the negative binomial fit, but they leave open the possibility that the negative binomial model is overfitting the data and a simple Poisson model would be sufficient to relate rate variance to mean rate. To assess this, we calculated the likelihoods associated with fitting a negative binomial and a Poisson distribution to the distributions of single trial spike count for each cell and used AIC and BIC and AIC and BIC weights (wAIC and wBIC) for model evaluation. Weights close to one indicate very strong support for one model over the alternative model. Of 287 cells, 235 had larger wAIC values for the negative binomial fits than the Poisson fits in the attend-RF condition. A total of 234/ 287 cells had larger wAIC values for the negative binomial fits than the Poisson fits in the attend-away condition. The respective numbers for BIC were 220/287 and 227/287 cells. In most cases, the wAIC (wBIC) values in favor of the negative binomial model were Ͼ0.95 (AIC: 192/287,BIC: 187/287). This demonstrates that for ϳ80% of the cells, the negative binomial model was the better model to describe the variance of the firing rate. The 20% of cells that were better described by the Poisson model in at least one of the attention conditions usually had very low firing rates (Ͻ5 Hz). The difference in cell numbers that were better fit with the Poisson model in the attend-RF versus the attend-away condition raises the question of whether most cells were better fit with the Poisson model under both attention conditions or if the two sets overlapped only loosely. The overlap was not very strict. Of the 52/53 cells better fit with the Poisson model, 14 were better fit under both attention conditions. Most of these (13/14) had low firing rates (Ͻ5 spikes/s) under both attention conditions.
To further investigate how attention affects the dependency between mean rate and variance of the rate, we fitted the function of Equation 8 to our rate versus variance population data using 2 error minimization and determined the exponent for the attend-RF and for the attend-away conditions ( Fig. 9 A, B). The exponent was reduced in broad-and narrow-spiking cells when attention was directed to the neuron's RF, as hypothesized and illustrated in Figure 7. To determine whether the use of separate fitting procedures for the two attention conditions was justified, we also fitted the data with a single function and performed 2error minimization. We performed model comparison based upon AIC and BIC weights. AIC and BIC weights for the separate fitting were Ͼ0.999 for narrow-spiking and broad-spiking cells, lending strong support to the idea that attention reduces the exponent that links mean rate to rate variance. The doubly stochastic model of neural firing as Poisson spike probability with variable gain (Goris et al., 2014) predicts that, in the presence of gain variance, the count variance of each cell depends on the mean firing rate in a supralinear way, according to the following formula: Var ϭ mean ϩ gain variance ‫ء‬ mean 2 . (9) We found that Equation 9 yielded variance estimates that were highly correlated with the recorded variance of our individual neurons, in which the correlation coefficients were 0.991 ( p Ͻ 0.001) for broad-spiking cells and 0.815 ( p Ͻ 0.001) for narrowspiking cells. We thus used the prediction of the variable gain model (Eq. 9) to reproduce, at the population level, the abovereported expansive nonlinearity between rate and variance. To do so, we plotted predicted variance against mean count for the two attention conditions and the different cell types and fitted these data with Equation 8. The results are shown in Figure 9, C and D, for broadand narrow-spiking cells. The exponents obtained from the fitting are similar to those obtained from the measured data. Finally, using AIC and BIC weights for model comparison also gives strong support for separate fits (attend-RF and attend-away fitted separately) over an approach in which the exponent is best described by a single fit (AIC and BIC weights for separate fits Ͼ0.99 for both cell types). The difference between Equation 8 (power law) and Equation 9 (linear plus scaled quadratic) might suggest that the two models are incompatible. However, they are two ways of quantifying phenomenologically the link between rate variability and mean firing rate. The power law (Eq. 8) is useful to quantify the link at the population level, when each cell only contributes a single rate estimate, whereas the gain variance analysis allows estimating the fluctuations in firing rate for a single cell given a certain condition (and also across conditions). In fact, the power law can also be used to describe the link for single cells, provided different conditions (e.g., different stimuli) are used that give rise to different mean firing rates (Goris et al., 2014). All in all, these results suggest that both cell types show a reduction of variability with attention and that this reduction can be described as a reduction of the supralinearity in the growth of variance with mean rate that is due to a decrease in the trial-to-trial variance of neural gain.

Discussion
We found that broad-and narrow-spiking cells in macaque area FEF differ in how they are affected by visual stimulation and by attention.
Broad-spiking cells are often assumed to be pyramidal cells, whereas narrow-spiking cells are often assumed to be fast-spiking interneurons (Mitchell et al., 2007;Shin and Sommer, 2012). However, a one-to-one mapping between the two is an oversimplification (Vigneswaran et al., 2011), at least for the motor cortex, in which spinal-cord-projecting neurons with the shortest latencies exhibited the thinnest spikes. Moreover, FEF cell types defined by function on average differ in their distributions of spike width ), but still show considerable overlap in their spike width distributions. Importantly, all of these different distributions overlap with the traditional distinction of narrow and of broad spikes. Given these findings, we do not claim that narrow-spiking cells invariably equate to putative interneurons or broad-spiking cells equate to putative pyramidal cells. Regardless, the separation based on spike widths nevertheless results in significant differences in firing rates between the two cell groups and their firing rate variability. Narrow-spiking cells on average have higher stimulus-induced firing rates. Moreover, they are more strongly affected by attention, more strongly increase their firing rate for attend-RF conditions relative to the precue stimulus-driven firing rate, and more strongly reduce their ac-tivity for attend-away conditions. Importantly, both cell types reduce their gain variability when attention is directed to the RF compared with when it is directed away from the RF, but the gain reduction was stronger in narrow-spiking cells. The overall effect is that narrow-spiking cells show stronger attentional modulation as quantified by the AUROC; that is, their responses allow for greater discriminability/separability between attention conditions. Previous studies in visual (occipital) cortical areas have reported similar effects as those reported here for area FEF. In area V4, narrow-spiking cells have higher firing rates (Mitchell et al., 2007) and are more strongly affected by attention (Mitchell et al., 2007). In V4 (and V1) attention reduces the rate variability quantified by FF and a recent study reported that attention reduces shared gain variability between V4 neurons, explaining why attention reduces rate variability and noise correlations (Rabinowitz et al., 2015). Related results have been reported for the prefrontal cortex. In DLPFC, task engagement differently affects firing rates in narrow-and broad-spiking cells (Hussar and Pasternak, 2012) and affects rate variability in broad-spiking cells (Hussar and Pasternak, 2010). However, unlike in area V4, broad-spiking cells were more strongly rate modulated by task engagement in DLPFC (Hussar and Pasternak, 2012). The effects that we see in area FEF regarding rate modulations are more reminiscent of the data from area V4. To the best of our knowl- edge, the effect of task engagement on rate variability in different neuron types of the frontal cortex has not been investigated previously. Our data additionally show that the excitability stabilization is specific for task relevant cells; that is, cells that alter their firing rate with the allocation of spatial attention. FEF cells not showing attentional rate changes also did not show gain variance reduction with attention.
The finding that attention more strongly affects firing rates in narrow-spiking neurons than in broad-spiking neurons is puzzling under the assumption that narrow-spiking neurons largely correspond to inhibitory interneurons (see also Mitchell et al., 2007, but see the discussion of narrow-spiking cells mapping onto putative interneurons above). If attention serves to increase the representation of the attended object at the level of firing rates, why would inhibition get upregulated? Potential explanations could be sought within the framework of the biased competition models of attention (Desimone and Duncan, 1995;Reynolds et al., 1999) or the related normalization models of attention (Lee and Maunsell, 2009;Reynolds and Heeger, 2009;Ni et al., 2012;Sanayei et al., 2015). Both assume that inhibition is an integral part of enhancing the representation of the attended object through competitive interactions between neuronal populations. It helps to suppress irrelevant representations, which in turn relieves neurons representing the attended object from reciprocal inhibition. The increased inhibition could also help to stabilize the network against slow (modulatory) influences; that is, the increased inhibition could be a key contributor that reduces the gain variance and noise correlation (Rabinowitz et al., 2015). Increasing inhibition may also be a necessity in a network that works in a balanced excitation-inhibition regime to preserve an asynchronous network state and to enhance the information capacity and discrimination accuracy in the network (Deco et al., 2014). It may at the same time serve to enable task-dependent enhancement of increased coherence in specific frequency bands to improve efficacy of communication between neuronal ensembles (Fries et al., 2001;Chalk et al., 2010;Bosman et al., 2012;Buschman et al., 2012). Future studies may give detailed answers to these speculations.
Rate variability is strongly affected by stimulus onset (Churchland et al., 2010) and by task engagement or global cognitive factors, all of which are assumed to alter the state of the cortical network Harris and Thiele, 2011;Ecker et al., 2014). Therefore, it was puzzling that previous reports did not find an effect of attention (Chang et al., 2012), saccade preparation (Chang et al., 2012, or visual search (Purcell et al., 2012) on rate variability in FEF, as assessed by the FF. As discussed here (see, e.g., Fig. 7), this is possibly related to the dependency of the FF on overall firing rates and on the strength of the expansive nonlinearity that links firing rate to rate variability (Vogels and Orban, 1991;Gur et al., 1997;Zinke et al., 2006;Goris et al., 2014). We propose that attention (and, more generally, task engagement) reduces the exponent that links the two. Using a recently established method to decompose different sources of rate variability (Falkner et al., 2013;Goris et al., 2014; and see Churchland et al., 2011 for a slightly different approach to the problem), we found that gain variance was reduced by attention in narrow-and broad-spiking cells. These results support the notion that task engagement affects cortical state regardless of the cortical hierarchy, stabilizing the network to optimize information processing, and allows for efficient information exchange between cell groups in a task-dependent manner.