Abstract
Despite an ever growing knowledge on how parietal and prefrontal neurons encode low-level spatial and color information or higher-level information, such as spatial attention, an understanding of how these cortical regions process neuronal information at the population level is still missing. A simple assumption would be that the function and temporal response profiles of these neuronal populations match that of its constituting individual cells. However, several recent studies suggest that this is not necessarily the case and that the single-cell approach overlooks dynamic changes in how information is distributed over the neuronal population. Here, we use a time-resolved population pattern analysis to explore how spatial position, spatial attention and color information are differentially encoded and maintained in the macaque monkey prefrontal (frontal eye fields) and parietal cortex (lateral intraparietal area). Overall, our work brings about three novel observations. First, we show that parietal and prefrontal populations operate in two distinct population regimens for the encoding of sensory and cognitive information: a stationary mode and a dynamic mode. Second, we show that the temporal dynamics of a heterogeneous neuronal population brings about complementary information to that of its functional subpopulations. Thus, both need to be investigated in parallel. Last, we show that identifying the neuronal configuration in which a neuronal population encodes given information can serve to reveal this same information in a different context. All together, this work challenges common views on neural coding in the parietofrontal network.
Introduction
The neurons of the frontal eye fields (FEF) and the lateral intraparietal area (LIP) encode the spatial position of visual stimuli (Bruce and Goldberg, 1985; Barash et al., 1991; Ben Hamed et al., 1997, 2001, 2002; Ben Hamed and Duhamel, 2002), as well as the spatial position of the locus of attention (Gottlieb et al., 1998; Armstrong et al., 2009; Farbod Kia et al., 2011; Ibos et al., 2013; Suzuki and Gottlieb, 2013; Astrand et al., 2014a). Accordingly, the reversible inactivation of these functional cortical areas leads to reliable deficits in the selection of visual information and spatial attention orientation (Wardak et al., 2002, 2004, 2006; Liu et al., 2010; Suzuki and Gottlieb, 2013). Despite this accumulated knowledge, it is still unclear how these cortical regions process neuronal information at the populational level. A simple assumption would be that the functional neuronal populations mirror the temporal response profiles of their individual cells. However, recent studies indicate that this is not necessarily the case and that the single-cell approach overlooks dynamic changes in how information is distributed over the neuronal population (Meyers et al., 2008, 2012; Barak et al., 2010; Crowe et al., 2010; Kadohisa et al., 2013; Stokes et al., 2013).
Here, our focus is on how local cortical networks encode information in time. Schematically, information can be encoded by a stationary neuronal population, i.e., by a population in which the contribution of each of its individual cells and their interconnection weights remain constant in time. As a result, a classifier designed to extract information from the neuronal response of this population at a given time point will reliably extract information at any other time. Alternatively, information can be encoded dynamically, i.e., by a population in which the contribution of each of its individual cells and their interconnection weights constantly change in time. As a result, a classifier designed to extract information from the neuronal response of such a population at a given time will not be able to extract this very same information at other times. Averaged population response profiles describing overall changes in spiking rate cannot distinguish these two distinct dynamic population regimens. We use a time-resolved population pattern analysis to explore how spatial position, spatial attention and color information are differentially encoded and maintained in the primate prefrontal (FEF) and parietal cortex (lateral intraparietal area; LIP). Overall, our work brings about three novel observations. First, we show that parietal and prefrontal populations operate in two distinct population regimens for the encoding of sensory and cognitive information: a stationary mode and a dynamic mode. Second, we show that the temporal dynamics of a heterogeneous neuronal population brings about complementary information to that of its functional subpopulations. Thus, both need to be investigated in parallel. Last, we show that identifying the neuronal configuration in which a neuronal population encodes given information can serve to reveal this same information in a different context. All together, this work challenges common views on neural coding in the parietofrontal network.
Materials and Methods
Surgical procedure and FEF and LIP mapping.
All procedures were approved by the local animal care committee in compliance with the guidelines of the European Community on Animal Care. All experimental procedures are identical to those described by Ibos et al., 2013. Briefly, standard surgical procedures were used to place an MRI-compatible head restraint device and two peek recording chambers were positioned over the LIP and the FEF of one female (left hemisphere, Monkey M, 7 kg) and one male (right hemisphere, Monkey Z, 10 kg) monkey (Macaca mulata; Wardak et al., 2004). Gas anesthesia was performed using Vet-Flurane (0.5–2%) following an induction with Dormitor (medetomidine at 0.85 mg/ml, 0.025 mg/kg and ketamine 1000: ketamine at 100 mg/ml, 7 mg/kg). Postsurgery pain was controlled with a morphine pain-killer (Buprecare, buprenorphine at 0.3 mg/ml, 3 injections at 6 h intervals; first injection at the beginning of the surgery, 0.01 mg/kg, i.m.) associated with a nonmorphine pain killer (Tolfedine 4%: tolfenamic acid at 40 mg/ml; 4 mg/kg) and a full antibiotic coverage was provided (a long-action large-spectrum antibiotic, Terramycine, oxytetracylcine at 200 mg/ml; one injection during the surgery and one 5 d later, 0.1 mg/kg, i.m.). FEF sites were defined as the anterior bank of the arcuate sulcus sites in which low-threshold microstimulations (<50 μA) evoked systematic eye movements. This characterization was confirmed by the visuomotor response patterns on a classic memory-guided saccade task at these sites (Bruce and Goldberg, 1985). Similarly, the LIP sites were characterized based on their visuomotor responses in a memory-guided saccade task (Gnadt and Andersen, 1988; Colby et al., 1996). In both areas, we targeted the cortical positions at which evoked saccade amplitude and visual or motor receptive field (RF) position ranged between 10° and 15°. Recordings on the main task started when neurons had visual, saccadic, and/or delay responses in a memory-guided saccade task. Our neuronal dataset can thus be considered as heterogeneous, unbiased toward a certain functional type compared with other studies (Premereur et al., 2011).
Behavioral task.
The data analyzed in the present work were collected while monkeys performed a cued target detection task based on a rapid serial visual presentation (Fig. 1). It allowed to dissociate in time the processes related to the orientation of attention from those related to target detection (Ibos et al., 2009). In particular, the cue is a nonspatial abstract cue that informs the monkeys in which visual hemifield they should direct their attention. Briefly, the monkeys had to fixate a central point on the screen throughout each trial. Two streams of visual objects were presented, one in the visual receptive field of the neuron being recorded and the other in the contralateral hemifield. The visual streams corresponded to a rapid succession of 150 ms long visual items with no intervening blanks. The two streams were presented with a 300 ms offset (2 stimuli) one with respect to the other. The first stream was randomly presented either inside or outside of the receptive field of the neuron being recorded. The cue, which in turn instructed the monkeys of the position of the target, was always presented in the first stream of stimuli, 300–600 ms following second-stream onset (2–4 stimuli). The cue could be green (or red, respectively), predicting that the target would appear in the same (or other, respectively) stream. In the following, the green cue will be called a “Stay” cue and the red cue a “Shift” cue. The monkey had to combine the information related to the physical attributes of the cue (its location and identity) to find out where the target was likely to appear. The monkey had to release a lever to report the presence of the target. The target could appear 150, 300, 600, or 900 ms following the cue, so as to avoid automatic responses. A maximum of 17 visual items was presented on each trial. In 67% of the trials, the target appeared in the instructed stream (valid trials), in 17% of the trials, it appeared in the opposite stream (invalid trials), and in 16% of the trials it did not appear at all (catch trials) to discourage systematic responses. The monkeys were rewarded for releasing the lever 150–750 ms following target onset on valid and invalid trials and holding it on catch trials. Invalid trials were used to check that the monkey used the predictive information provided by the cue to optimize its behavior (Ibos et al., 2013). Sessions in which this was not the case were discarded from the analysis.
Neural recordings.
Recordings were performed using both single tungsten electrodes (Frederick Haer) and platinum/tungsten tetrodes (Thomas Recording). Electrodes were lowered with two independent NAN electrode microdrives placed over each recording chamber. Electrophysiological signals were amplified and spikes were digitized at 20,000 Hz with National Instruments cards (Plexon) and experimental control was achieved by custom data acquisition software. Single units were identified offline using Offline Sorter software (Plexon).
Cell populations.
The spiking activity of 131 FEF neurons and 87 LIP neurons were recorded in independent recording sessions from two macaque monkeys. Of these, 123 FEF neurons and 80 LIP neurons were characterized as task dependent. The neurons of these two neuronal populations have previously been characterized based on the specific pattern of activity evoked in the 300 ms following cue presentation (Ibos et al., 2013).
Cue-related cells.
Briefly, cue-related cells were defined as those cells in which activity changed significantly with respect to the baseline following cue presentation (multiple successive bin-wise ANOVAs on the number of spikes in 2 adjacent 100 ms time windows in steps of 1 ms, p < 0.01 for at least 30 of 35 ms in the time interval between 30 and 300 ms following cue onset). Cue-related cells responded to at least one of the four possible cue configurations (Shift or Stay cue, inside or opposite the RF).
Neuronal selectivity to cue position and to spatial attention allocation.
The selectivity of each of these cue-related cells to cue position or spatial attention allocation can be assessed using a receiver operator characteristic (ROC) nonparametric analysis that provides a qualitative estimation of the degree of overlap of two distributions of firing rates regardless of any specific a priori about normality or homoscedasticity (Green and Swets, 1966; Swets, 2014). Specifically, we calculated, for each cue-related cell, and at each 1 ms time step, a ROC value comparing, in the 100 ms window centered on this time step, the trial-by-trial spike counts for the following: (1) a cue located in the receptive field versus outside (Fig. 1Bi) and (2) attention oriented toward the receptive field versus outside (Fig. 1Bii). ROC values could vary ±0.5. For the sake of clarity, these values were rectified so that the ROC values in time varied between 0.5 and 1 (Ibos et al., 2013 shows additional details).
Classification of cue-related response profiles.
Because cells with a strong response to a single specific cue, for example a shift cue to the left, can contribute both to the coding of a left cue and to the coding of attention to the right, we further used a bootstrap analysis to compute an attention index in time (|(attention contra − attention ipsi)/(attention contra + attention ipsi)|) and a position index in time (|(position contra − position ipsi)/(position contra + position ipsi)|). We then calculated the difference between these attention and position indices (IAP; Fig. 1Biii) and assessed the statistical significance of this difference using a permutation test (Ibos et al., 2013). Cells with a positive significant IAP were classified as “attention neurons” (Fig. 1B, light gray shading), cells with a negative significant IAP were classified as “position neurons” (Fig. 1B, dark gray shading). The remaining cells were classified as “cue identity cells” (Fig. 1B, intermediate gray shading). A subset of neurons in FEF (n = 21) and in LIP (n = 4) reliably encoded the final position of attention, although providing no information about cue location or cue color (Ibos et al., 2013). These cells discriminated between cues instructing attention toward the receptive field (e.g., for a neuron with a contralateral receptive field: contralateral Stay cues and ipsilateral Shift cues) and cues instructing attention away from the RF (e.g., for a neuron with a contralateral RF: ipsilateral Stay cues and contralateral Shift cues). Furthermore, in FEFs, 40 cells reliably encoded the color of the cue and 11 cells reliably encoded the position of the cue. In LIPs, 27 cells reliably encoded cue identity (its color) and nine cells reliably encoded the position of the cue. In the present paper, we compare dynamical population information coding, both in the entire neuronal population and in the specific neuronal populations described in this section.
Data preprocessing.
For each cell and each trial, the spiking data were smoothed by averaging the spiking activity over 100 ms sliding windows (resolution of 1 ms). This window width corresponds to a tradeoff between performance and decoding speed, as narrower filtering windows result in a lower performance, whereas wider filtering windows decreases temporal resolution (Farbod Kia et al., 2011; Astrand et al., 2014a). The neuronal populations were constructed by concatenating the cells of interest (all cells, cue position cells, cue identity cells or attention cells). Trials in which the delay between cue and target was equal to 150 and 300 ms were excluded from the analysis so as to avoid any confound between cue and target processing. The available number of trials varied from one cell to the other (alignment on first visual stream, FEF: mean = 148 trials, standard deviation (SD) = 54; LIP: mean = 163 trials, SD = 79; alignment on cue, FEF: mean = 40 trials, SD = 12; LIP: mean = 41 trials, SD = 19). For each cell, 40 trials were randomly selected per condition (first-stream position, cue identity, cue position, and spatial position of attention). For a minority of cells, some trials were randomly duplicated to achieve the requirement of 40 trials per condition. Although this trial duplication procedure can potentially induce an artificial inflation of performance, this has little impact on the present work as we are analyzing general temporal dynamic patterns rather than comparing across condition absolute performance rates. In addition, 95% confidence intervals for performance statistical significance are defined using a random permutation procedure (see Nonparametric random permutation tests, below). This procedure is expected to adjust the confidence intervals to the biasing effects of trial duplication, thus further minimizing the impact of this duplication. This random selection of 40 trials per cell was repeated 10 times, thus defining 10 different population activity seeds (of >131 to the power of 40 possible FEF population activities, respectively, 87 to the power of 40 possible LIP population activities). Each seed corresponded to a 3D matrix, with a first dimension corresponding to the number of cells in the population of interest, a second dimension corresponding to the number of trials (here 40) and a third dimension corresponding to the time around the event of reference (the cue or first visual stream onset). Thus cell {i–k} of this 3D seed matrix corresponded to the smoothed estimate of the response of neuron i, on trial j, and time stamp k. Note that these population responses are free of the correlations that would be found in simultaneous recordings.
Neural decoder.
To quantify the amount of information in the data, a regularized linear regression was applied. This procedure minimizes the mean square error for equation C = W · R, where R is the time course of the neuronal response of each of the n neurons of the population of interest for each of the t available trials, W is the synaptic weights that adjust the contribution of each cell to the final readout and is as a consequence a 1 by n vector, and C is a 1 by t vector, the sign of the elements of which describes the two possible classes taken by the binary variable of interest.
The first approach is to inverse the above equation as W = C · R†, noting R† the Moore–Penrose pseudoinverse of R. Determination of R† was done on a subset of the data (Atrain= training dataset) and the resultant W matrix was applied to solve C = R · W on the rest of the data (Atest = testing dataset). As the Moore–Penrose pseudoinverse leads to overfitting, we used a Tikhonov-regularized version of it: this solution minimizes the compound cost norm (W · R − C)+ λ · norm(W), where the last term is a regularization term added to the original minimization problem. The scaling factor λ was chosen to allow for a good compromise between learning and generalization (Astrand et al., 2014a).
Classification studies often use a linear discriminant analysis (LDA). It is important to note that even though LDA and multilinear regression analysis (used here) are based on distinct classification procedures, in the case of two-class classification problems, LDA and linear regression are formally identical. Indeed, LDA relies on Bayes theorem and assumes the following: (1) that the conditional probability density functions of the input variables (i.e., the neuronal response) to have normal distributions and (2) that the class covariances are identical. Under these assumptions, the two-class problem can be expressed as a linear discriminant function c = w⃗ C r⃗ + b, which is identical to the equation used in a linear regression (Duda et al., 2000). As a result the only difference between LDA and the classification procedure used here is the use of a regularization term which we have previously shown to improve classification accuracy (Astrand et al., 2014a).
Decoding procedure.
We trained the classifier on 70% of the trials and tested it on the remaining 30% of the trials so that the testing is performed on a naive set of trials, never experienced by the classifier (Ben Hamed et al., 2003, 2007). During training, the decoder was simultaneously presented with single-trial population activities (corresponding to the observed inputs) and the state of the decoded variable (corresponding to the associated outputs: visual stream on the left or on the right/Red or Green cue/Cue on the left or on the right/Attention instructed to the left or to the right). During testing, the decoder was presented with the test set and produced its guess for the state of the decoded variable. The readout performance on each decoding run corresponds to the percentage of trials on which the classifier provided the correct guess for the state of the decoded variable. This training/testing procedure was repeated for each data seed (i.e., 10 times in all) to yield an average readout performance.
Dynamic information coding analysis.
The decoding procedure described above defines the linear function that best accounts for the relationship between the population response and the variables of interest at a given time step. This linear function thus reflects the weighted contribution of each neuron to the output variable and captures how the population is encoding the information of interest at this specific time stamp. A stationary population is expected to continue encoding the information of interest at other time stamps with this very same specific configuration. On the opposite, in a population encoding given information dynamically, the linear function linking the population response to the variable of interest is expected to change as a function of time. To distinguish between stationary and dynamic information coding within the target neuronal populations defined above (see Cell populations), we used a full-cross temporal classification analysis, which defined the optimal decoder at each time step on a subset of the available trials (training trials) and evaluated the decoder's performance on the remaining trials (testing trials) at all other time stamps. We limited the procedure to every 10 ms, a temporal resolution sufficient to capture the encoding dynamics without loss of information (as the spiking data were smoothed by averaging the activity over 100 ms sliding windows). As a result, the classification performance at all time stamps is free of potential within-trial cross-temporal correlations. When decoding cue properties, the train- and test-windows ranged from the cue onset (0 ms) to 600 ms following cue onset, thus covering the delay period before target presentation. During this period, monkeys interpreted the cue and oriented their attention to the most probable target location. Trials on which the target appeared 150 or 300 ms following cue presentation were thus excluded from the analysis. When decoding first visual stream position, the train-window ranged from first visual stream onset (0 ms) to 600 ms after first visual stream onset and the test-window ranged from 100 ms before first visual stream onset to 600 ms after first visual stream onset. Second visual stream onset was presented 300 ms following first visual stream onset, in the opposite hemifield.
Because this paper primarily focuses on how a given variable is dynamically encoded in time, it is important to understand the impact of different changes in the neuronal response characteristics onto the decision boundary of a linear regression classifier model. Figure 2 illustrates the impact on the decision boundary and the classification rate of three different changes in neuronal response characteristics: a change in (1) the baseline firing rate, (2) the difference in spike-rate between two stimuli (i.e., selectivity), and (3) the signal-to-noise ratio of the neuronal response (i.e., the reliability). A change in the average firing rate of the neurons impacts the localization of the hyperplane in the n-dimensional neuronal response space, but not the classification rate (Fig. 2A; the solid line boundary obtained for a given baseline response of neuron 2 and the dashed line boundary obtained following a change in this baseline response, define the same number of false classification elements). As a result, when the average neuronal firing rates change in time (due to a change in the sensory environment or in the cognitive requirements of the task), a hyperplane defined at a given moment in time does not necessarily apply at other times, leading to a dynamic temporal encoding pattern. Likewise, an increase (respectively, a decrease) in neuronal selectivity leads to a change in the definition of the weights thus changing the localization of the hyperplane as well as to an increase (respectively, a decrease) of the correct classification rates (Fig. 2B). As a result, when neuronal selectivity changes in time, a hyperplane defined at a given moment in time no more applies at other times, also leading to a dynamic temporal encoding pattern. In contrast, a change in the signal-to-noise ratio of the neuronal response does not impact the localization of the decision boundary hyperplane, though such a change does impact classification rates, higher signal-to-noise ratios (Fig. 2C, sharp colors) leading to higher correct classification rates than lower signal-to-noise ratios (Fig. 2C, lighter colors). As a result, correct classification rates highly correlate with the cell's response reliability. In addition, the weights defined at a given moment in time still apply at other times, though a partial drop in correct classification rates will be observed (Fig. 2C; King and Dehaene, 2014), leading to stationary temporal encoding patterns (see last result section for a data-driven perspective on the relationship between the population temporal dynamics and the underlying individual neuronal responses).
Nonparametric random permutation tests.
To determine the chance level against which to discuss the reported decoding performance, we defined a 95% confidence interval performance limit as follows. Using a sampling with replacement procedure, we randomly reassigned, for each cell, the condition label of each trial (thus a stay to the left cue trial could randomly become any of a stay to the right, shift to the left or shift to the right cue trial, or remain a stay to the left cue trial). This resulted in 40 randomly assigned new trials per condition per cell. The performance with which the variable of interest could be predicted from the activity of this random population was then calculated. This procedure was repeated 100 times, for each of the 10 data seeds, thus yielding a 1000 data point distribution of chance classification performance at each time stamp. The classification performance of a given classifier on real nonpermuted data were considered significant when it fell in the 5% upper tail of its corresponding chance performance distribution (nonparametric random permutation test, p < 0.05). Though chance levels were independently calculated for each classifier at each time stamp, it turns out that these chance levels had a very small variability (mean = 65,8%, SD = 2,3%) and the results did not change if we considered an average chance level calculated over all possible decoders and time stamps.
Classification of temporal coding population patterns.
This classification is based on the full cross-temporal dynamic analysis described above. For each train time (x-axis), the total time in milliseconds following cue onset during which the classification performance of a classifier tested on all other time points (y-axis) is above the 95% confidence interval is calculated (see Figs. 3E, 5E,F, 6E). When analyzing cue-related coding, the temporal dynamics of the neuronal populations of interest is defined as follows: (1) dynamic populations are populations for which the total time above the 95% confidence interval never exceeds, for any of the defined classifiers, twice the size of the smoothing window (i.e., neuronal populations that do not have a reliable coding of the information of interest throughout two independent adjacent 100 ms time windows); (2) stationary populations are populations for which the total time above the 95% confidence interval is >400 ms (i.e., sustained over >90% of the analysis time window, bearing in mind an average cue-related response latency of 100 ms); and (3) transiently stationary populations are populations for which the total time above the 95% confidence interval falls between these two thresholds (i.e., between 200 and 400 ms). When analyzing first-stream-related coding, the thresholds to define stationary and transient population dynamics are adjusted to take into account the fact that first and second-stream onsets are separated by only 300 ms. In this case, stationary populations are thus defined as populations for which the total time more than the 95% confidence interval is >260 ms (i.e., sustained over the entire first stream to second-stream time interval, bearing in mind an average cue-related response latency of 40 ms). Neuronal populations with total time more than the 95% confidence interval <260 ms are characterized as transient.
Results
Neurons encode different variables dynamically. For example, a visuomotor neuron will first respond to the onset of a visual stimulus, as well as around saccade execution. In the present paper, we question a different issue, namely, how stable is the temporal code with which a given functional neuronal population represents a specific variable (here, the location of a stable visual stream and the position, attention and color cue-related information during the cue-to-target interval) and how this code relates to the underlying individual neuronal responses. Specifically, the aim of this study is to compare the temporal dynamics with which attention, position, and color are encoded by prefrontal and parietal neuronal populations, during a cued-target detection task. We thus focus on the time interval between cue and target presentation, during which the monkeys oriented their attention to one of two streams. By task design, cue position and cue color alone are completely uninformative about target location. The monkeys had to combine both information to correctly orient their attention toward the location at which the target had the highest probability to appear: both a red (shift) cue on the left and a green (Stay) cue on the right instructed spatial attention on the right while both a red (shift) cue on the right and a green (Stay) cue on the left instructed attention on the left. Accordingly, their behavioral performance is higher when the target appears at the predicted location (80% of the target trials) than when it appears at the unpredicted location (20% of the target trials): reaction times are shorter on valid trials than on invalid trials (Monkey M: 457 vs 480 ms, p < 0.001; Monkey Z: 414 vs 422 ms, p < 0.01), and detection rates are higher (Monkey M: 79.6 vs 62.9%, p < 0.001; Monkey Z: 66.8 vs 62.4%, p < 0.001; Ibos et al., 2013 shows a detailed behavioral analysis).
Using a full cross-temporal classification analysis, we analyze how a classifier designed to readout the information of interest (respectively, the instructed position of attention, cue position or cue color) from the activity of a given neuronal population at a given time following cue presentation, succeeds in reading out the information encoded by the same neuronal population at other times, on novel neuronal population response activities. In other words, we dissociate the time windows used for training and testing the classifier, so as to identify how the neural code changes with time. This procedure is applied independently on prefrontal (FEF) and parietal (LIP) neuronal population responses.
Temporal dynamics of spatial attention signals
The attention-related temporal dynamics depends on the functional population
We first explore the dynamics of spatial attention encoding in LIP (Fig. 3A) and FEF (Fig. 3B) neuronal populations, irrespectively of whether individual cells are selectively involved in spatial attention orientation or not. Neuronal activities are aligned on cue presentation (time 0 ms). Each plot represents, in a color code, the accuracy with which a classifier trained at a given time point (x-axis), correctly predicts the spatial position of attention as instructed by the cue, from the neuronal population response at all possible timings in the cue-to-target interval (y-axis). Importantly, this prediction procedure is performed on trials that have not been used to define the initial classifier, thus ensuring that classification rates are not biased by persistent temporal patterns that could be embedded in spike trains. Correct classification rates are color-coded from cyan (50% correct) to dark red (100%). Ninety-five percent confidence intervals in classification rates are defined using a nonparametric random permutation test (dark gray contours, see Materials and Methods).
In area LIP, maximum classification rates are observed along the diagonal (Fig. 3A), indicating that a classifier that optimally extracts spatial attention signals from the LIP population response at a given time cannot successfully readout this information at other times. This is a signature of a dynamic encoding of spatial attention by the LIP population. Precisely, significant information about the spatial allocation of attention can be extracted from 81 to 477 ms following cue onset, though maximum classification rates are observed between 202 and 321 ms. In agreement with this dynamic encoding of attention by LIP neurons, the total time following cue onset during which the classification performance of a given classifier is above the 95% confidence interval never exceeds 200 ms (Fig. 3E, red curve; see Materials and Methods for details). In contrast, the FEF appears to encode spatial attention with a relatively more stationary neuronal network. Indeed, for the entire FEF population, a classifier constructed to extract this information at a given time successfully classifies the allocation of spatial attention from neuronal population responses at other times (Fig. 3B). Specifically, FEF encodes spatial attention as early as 114 ms following cue onset and up to target presentation. Two functionally distinct epochs of stationary spatial attention encoding can be identified in this interval: an initial epoch running from 114 to 242 ms, and a late epoch running from 242 to 537 ms, possibly suggesting a transition in the FEF neuronal network ∼250 ms, resulting from either the output of local computations within the FEF or distal influences from other cortical regions. This can be described as a transiently sustained encoding of attention by FEF neurons, the total time following cue onset during which the classification performance of a given classifier is above the 95% confidence interval ranging between 200 and 400 ms (Fig. 3E, dark blue curve; see Materials and Methods for details).
To test this hypothesis, we further analyzed the temporal dynamics of spatial attention encoding in two complementary FEF subpopulations: (1) the cells that are individually involved in spatial attention processing as defined by a nonparametric statistical test (Fig. 3C, see Materials and Methods and Ibos et al., 2013 for details on how these cells are identified) and (2) the remaining non-attention-selective FEF cells (Fig. 3D; note that this cannot be performed for area LIP because only four attention selective cells could be identified; Ibos et al., 2013). Extremely distinct temporal dynamics can be identified in these two FEF subpopulations. Indeed, the FEF attention-selective cells encode spatial attention signals as early as 71 ms following cue onset, and they do so in a neuronal network configuration that remains remarkably stationary in time from an average of 143 ms following cue onset up to target presentation (Fig. 3C). This corresponds to a sustained encoding of attention by FEF attention-selective neurons, the total time following cue onset during which the classification performance of a given classifier is above the 95% confidence interval being >400 ms (Fig. 3E, intermediate blue curve, see Materials and Methods). In contrast, dynamic spatial attention signals progressively arise in the non-attention-selective FEF subpopulation between 298 and 432 ms following cue onset, up to target presentation (Fig. 3D). The total time following cue onset during which the classification performance of a given classifier is >95% confidence interval remains <200 ms, thus defining a dynamic encoding of spatial attention (Fig. 3E, light blue curve; see Materials and Methods). As the temporal dynamics of spatial attention encoding in each of these two functional FEF subpopulations, as well as in the entire FEF population appear unrelated one to the other, these observations possibly suggest distal influences from other cortical regions onto specific FEF functional cell types, though this would need to be verified experimentally.
Although this analysis describes a clear functional difference between LIP, FEF, and FEF functional subpopulations, it does not allow to clearly identify the neuronal population in which attentional signals initially arise. In Figure 3F, we represent the confidence (p value; assessed by nonparametric random permutation test) with which spatial attention signals can be readout from the different populations of interest as a function of time from cue onset (the dashed line representing the 95% confidence interval). This analysis calculates the average confidence with which a classifier, defined at a specific time from cue onset, reads out attention-related information in new trials, at the same time (averaged over 10 ms, along the diagonal; Fig. 3A–D, gray-shaded cross-sections, F). It thus captures both dynamic and stationary spatial attention encoding patterns. Because statistical significance thresholds may vary from one subpopulation to the other, this representation better captures temporal coding differences than the mere plot of the decoding performance along the diagonal. Spatial attention signals arise first both in the attention-related FEF population (Fig. 3F, intermediate blue; significance reached at 75 ms) and in the LIP population (Fig. 3F, red; significance reached at 75 ms), shortly followed by the entire FEF population (Fig. 3F, dark blue; significance reached at 110 ms). The FEF nonattention population has a later onset, classification performance reaching significance only transiently at 394 ms and more persistently at 430 ms (Fig. 3F, light blue). Thus, despite the fact that the FEF and LIP spatial attention signals as assessed from the entire populations are different in nature (stationary vs dynamic), they reach significance in both populations at the same time. This can possibly suggest a common input or a mutual interaction between the two cortical regions.
Local network dynamics and functional tagging
Following first visual stream presentation, early in the trial, a preferential response to the onset of this event can either be interpreted as a coding of its spatial position or as a coding of the orientation of spatial attention toward this specific spatial location (or as a coding of both spatial position and spatial attention). Indeed, by construct, the cue is always presented in the first visual stream of stimuli, and the second stream remains irrelevant up to cue interpretation. As a result, the monkey can be expected to orient its attention toward this early visual event and maintain it there, including when the second stream of visual stimuli is presented. We have however, no behavioral marker supporting this hypothesis, and several alternatives could be at play. For example, attention could be stable on the first visual stream up to cue onset. Attention could be on the first visual stream up to cue onset, but momentarily disrupted by second visual stream onset. Attention could be on first visual stream following its onset than move onto second visual stream at its onset. Attention could be divided onto both streams. Last, attention could be unfocused, the prior information about cue position provided by the first stream being unnecessary, given the high visual salience of the cues (red and green against gray distracters). The question we are asking here is whether the specific configuration of the neuronal population when encoding spatial attention following cue presentation can serve as an unambiguous signature or tag of spatial attention processes at other times in the task, in particular following first visual stream presentation. As in the previous section, we use a full cross-temporal classification analysis to address this question. Specifically, we train a classifier at different times following cue onset (Fig. 4A–C; x-axis) and we measure the performance with which each of these classifiers is able to predict spatial attention signals following first and second-stream onsets (Fig. 4A–C; y-axis). Although no information about spatial attention can be extracted following first-stream presentation from the entire LIP neuronal population (Fig. 4A), precise spatial attention information can be read out from the entire FEF neuronal population (Fig. 4B) and the attention-selective FEF neuronal population (Fig. 4C). Specifically, a classifier defined to extract spatial attention from the FEF neuronal population between 180 and 265 ms following cue onset, is able to reliably readout spatial attention allocation at 67 ms following first-stream onset and up to second-stream presentation (Fig. 4B). Following second-stream onset, readout performance reverses, departing from 50% chance classification, without however reaching significance. The same observations hold when the classifier is defined on the 458–600 ms postcue response activities, though overall readout performance is lower. In contrast, a classifier defined on the 265–458 ms postcue response activities cannot successfully extract any spatial attention information from the population response following stream onset. This supports the hypothesis that several functional populations coexist within the FEF, each encoding spatial attention in a stationary way over several hundred milliseconds: the earliest (and to a lesser extent, the latest) FEF neuronal subpopulation accounts both for early cue-related attention orientation signals and attention orientation signals to the first stream of stimuli; on the contrary, the intermediate FEF neuronal subpopulation does not account for attention orientation signals to the first stream of stimuli. In comparison, the FEF attention-selective neuronal population contributes both to encoding spatial attention signals following cue presentation and following first-stream onset (Fig. 4C). All neuronal populations (including LIP) show a drop in the available spatial attention signals following second visual stream onset, though this drop does not reach significance. This analysis brings about two major observations. First, we demonstrate spatial attention orientation signals early on in the task, despite their temporal co-occurrence with spatial position signals. Second, we demonstrate that second-stream onset disrupts these attentional signals, most probably correlating with the automatic attentional capture induced by this salient sensory event (as is the case for first-stream presentation).
Temporal dynamics of spatial position signals
In addition to encoding spatial attention, both LIP and FEF also encode spatial position. In the following, we will first analyze how the spatial position of a visual stimuli is encoded following first-stream presentation by both the entire FEF and LIP neuronal populations and by their respective cue position selective neuronal populations (i.e., cells that are individually involved in the coding of the spatial position of the cue as defined by a nonparametric statistical test; see Material and Methods and Ibos et al., 2013 for details on how these cells are identified). Then we will analyze how cue position information is encoded during the cue to target interval. Last, we will analyze how spatial position signals at a given time in the task can serve to extract spatial position information at other times. This is achieved thanks to a cross-temporal classification analysis that runs from first visual stream onset to 600 ms following cue onset. Because the interval between second-stream onset and cue presentation is variable (300, 450, or 600 ms), the early cross-temporal analysis is aligned on first-stream onset, whereas the latter analysis is aligned on cue onset, on both the training and testing axes.
Spatial position signals following visual stream onsets
As expected due to the fact that FEF neurons have strong visual responses to stimuli presented in their RFs, the entire FEF (Fig. 5A, bottom left) and the cue-position selective FEF (Fig. 5C, bottom left) populations strongly encode the spatial position of the first visual stream. This encoding starts as early as 52 ms for the entire population position signals and 67 ms for the cue-position selective subpopulation signals and decoding performance remains high up to the presentation of the second stream of stimuli. At this point, decoding performance reverses and drops below the lower 95% confidence interval limit, indicating a reliable encoding of second-stream position (which, by task design is located on the opposite visual field from the first stream). These two populations (entire FEF and cue-position selective FEF) thus encode spatial position within a stationary neuronal network that remains active up to cue presentation, as the total time following first-stream onset during which the classification performance of a given classifier is more than the 95% confidence interval being >260 ms (Fig. 5E, blue curves; see Materials and Methods). At this point, it is important to understand what the decoding performances reported in Figure 5A,C are actually reflecting. Indeed, in theory, if the FEF contained two independent populations, one coding a visual stream to the left and the other one a visual stream to the right, we would obtain a stable coding of the first visual stream as well as an independent stable coding of the second visual stream. In this case, a classifier trained to decide whether a stimulus has been presented to the left, for example, will continue to provide this decision even when a stimulus is presented to the right. This is however not what we observe. The reason for this is the following. In the present dataset, the response of both FEF and LIP cells are modulated by the presentation of a contralateral stimulus (as expected from the fact that their receptive fields are described as contralateral, FEF: Bruce and Goldberg, 1985; LIP: Ben Hamed et al., 2001, at best including a small portion of the ipsilateral perifoveal visual field, LIP: Ben Hamed et al., 2001), but also to an ipsilateral, eccentric stimulus (the streams are presented at an eccentricity ranging between 10° and 15°). This ipsilateral coding of visual information in these higher order visual areas does not necessarily correspond to an enhanced neuronal response, but often to an inhibitory response (Gregoriou et al., 2009, their Fig. 1b) interfering with a sustained response to a contralateral stimulus (Ibos et al., 2013, their Fig. 2). It is not clear whether this ipsilateral representation in the FEF and LIP is task dependent (e.g., present only when the tasks involves coordinated processes across both hemifields or not). As a result, the neuronal populations encoding contralateral and ipsilateral stimuli are not independent. This means that, out of a classifier perspective, the same neurons will contribute to the decoding of an ipsilateral or contralateral visual stimulus irrespectively of the order in which they are presented and irrespectively of whether only one stimulus is presented or not. The decoder used in this section is not trained on deciding whether a visual stimulus is present or not but rather, to discriminate between stimuli presented to the right or to the left. When the decoder is trained on detecting either: (1) the presence of a visual stimulus, whether to the right and on to the left (i.e., irrespectively of its position) or (2) the presence of visual stimulus to the left or to the right, then we obtain a sustained high classification accuracy from first-stream onset up to the end of the trial (specifically, a stable representation of a left stimulus is achieved at ∼300 ms when the first stream is presented to the left and at ∼500 ms when the second stream is presented to the left, data not shown). The classification procedure presented in Figure 5 requires the decoder to decide whether a visual stimulus is present on the right or on the left. When two stimuli are present, one on the right (respectively on the left) and the other on the left (respectively on the right), then the decoder's decision changes into which stimulus has the strongest representation. This is precisely what can be seen in Figure 5 (bottom left plots). For example, a classifier trained on activities recorded 200 ms following first-stream onset, correctly identifies the location of the first stream following its presentation. On second-stream presentation, its classification drops below the lower 95% confidence interval (two-tail nonparametric random permutation test), indicating that this classifier is deciding that a stimulus is now present opposite from the first stream. In other words, this decoder's decision is driven by the populations' response to abrupt second-stream onset more than it is to the now stable first-stream onset, possibly due to the fact that stable visual stimuli have a lower visual salience, associated with a weaker cortical representation (Gottlieb et al., 1998).
A closer analysis of the temporal dynamics of these two FEF populations suggests that two successive coding patterns are at play. An initial pattern remains stationary between 52 and 155 ms (for the entire FEF population, 67 and 158 ms, respectively, for the selective position subpopulation) and provides a reliable prediction of the spatial position of both the first and second streams early on following their onsets. This pattern most probably reflects the transient phasic ON of the FEF neurons at the presentation of a visual stimulus within their RF. A second encoding pattern emerges at 155 ms (respectively, 158 ms) and remains stationary for up to second stream, possibly reflecting the tonic response of the FEF neurons to the presentation of a sustained visual stream within their receptive field. This dual encoding pattern is more pronounced in the dynamics of the cue-position FEF population than in the entire FEF population. Overall, the LIP neuronal population follows the same dynamical encoding pattern as observed for the FEF, with an early population encoding phase of both first and second-stream positions, and a later population encoding pattern specifically reflecting the spatial position of a persistent visual event (Fig. 5B, bottom left). However, the total time following first-stream onset during which the classification performance of a given classifier is above the 95% confidence interval never reaches 260 ms, indicating a dynamic coding of first-stream position (Fig. 5E, red curves; see Materials and Methods). LIP appears to encode this information less reliably than the FEF, possibly reflecting the fact that the representation of these visual elements are actively suppressed from the salience map at this time in the task (Gottlieb et al., 1998). Surprisingly, the encoding of spatial position by the position-selective LIP population (as defined by their selectivity to cue position; Fig. 5D, bottom left) is extremely poor, hardly reaching significance ∼100 ms following first-stream presentation. This suggests that, in LIP, the coding of the spatial position of visual streams and that of the cue is achieved by only partially overlapping neuronal population codes. The horizontal bands observed in Figure 5D are difficult to relate to the temporal structure of the neuronal population responses. A possibility is that this reflects a stable population activity pattern that can be weakly detected at many training time points (explaining the spread of decoding accuracy along the x-axis) but that is maximal in amplitude just after the cue offset (explaining the limited range along the y-axis). In other words, this would reflect the fact that neuronal firing rates maintain a fixed proportional relation but scale up just after cue offset and then down in unison.
Spatial position signals during the cue to target interval
How the entire FEF (Fig. 5A, top right) and the cue-position selective FEF (Fig. 5C, top right) populations encode the spatial position of the cue following cue presentation is time dependent. During the initial 192 ms following cue onset, the spatial position of the cue is encoded dynamically, the configuration of the neuronal network encoding this variable changes at each time step. From 192 to 410 ms following cue onset (respectively from 190 to 430 ms for the position selective subpopulation), a transiently stationary encoding pattern of spatial position can also be seen, in both FEF populations, the configuration of the neuronal network encoding cue position remains stable over >200 ms. After 400 ms following cue onset, the entire FEF population appears to fall back again into a dynamic encoding, whereas the position selective subpopulation interrupts its encoding of cue position after 480 ms. In the parietal cortex, stationary spatial signals appear earlier. Indeed, in the entire LIP population, a stationary configuration of the neuronal population reliably encoding spatial position can be seen from 100 to 368 ms (Fig. 5B, top right). Later on in the cue to target interval, spatial position is encoded dynamically. The encoding of the spatial position of the cue in the position selective LIP population remains stationary from as early as 28–288 ms (Fig. 5D, top right). This particular decoding pattern suggests that the position selective subpopulation in LIP encodes the cue position in a stationary manner with a network configuration centered on 190 ms following cue presentation. In contrast with what is seen for both FEF populations, no early dynamic encoding of cue position can be seen in either LIP populations. If this signal corresponds to a partial dynamic coding of attention to first-stream position in anticipation of cue presentation, then such an attentional signal is absent from LIP. In both the entire LIP and cue position selective LIP populations, the total time, following cue onset during which the classification performance of a given classifier is above the 95% confidence interval, never reaches 200 ms, indicating a transiently sustained coding of this information (Fig. 5F; see Materials and Methods). A similar pattern is observed in the corresponding FEF populations, though the total classification time above the 95% confidence interval exceeds 200 ms, thus fitting our criteria for a transiently sustained encoding of cue position.
Within-task stability of spatial position signals
Up to now, we have described spatial position signals in both the FEF and LIP following either visual stream presentation or cue presentation. Here, we investigate whether the same local neuronal networks are recruited to encode the spatial position of both these events. In other words, we ask the question of whether the position of the visual stream and that of the cue are encoded by the same neuronal networks. To do this, we quantify the classification performance with which spatial position signals can be readout following cue onset, using classifiers optimized to extract spatial position signals at all times following first-stream presentation, both by the entire FEF (Fig. 5A, top left) and LIP (Fig. 5B) populations and by their position-specific subpopulations (Fig. 5C, FEF; Fig. 5D, LIP). We also quantify the classification performance with which spatial position signals can be read out following first-stream presentation, using classifiers optimized to extract spatial position signals at all times following cue onset, both by the entire FEF and LIP populations and by their position-specific subpopulations (Fig. 5A–D, bottom right). Classifiers defined between 92 and 418 ms (between 135 and 427 ms for the position selective subpopulation) following cue onset achieve a statistically significant performance at classifying visual stream position, both from the entire and from the position-selective FEF population (Fig. 5A,C, bottom right). For both these FEF populations, this spatial position encoding is stationary, though this feature is more striking for the position-specific FEF population. On this population, the reverse operation, consisting in defining classifiers on the response of this population to first-stream onset, also succeeds to extract cue position information, and reveals a very similar stationary temporal encoding pattern (Fig. 5C, bottom right). The entire FEF population response following first-stream onset hardly allows to predict cue position during the cue to target interval (Fig. 5A, top left), contrasting with the fact that, as described above, the population response following cue onset reliably allows to predict first-stream position (Fig. 5A, bottom right). This suggests that the classifier captures additional components following first-stream position that do not contribute to the encoding of cue position. In LIP, a common neuronal local network configuration encoding first-stream position from the cue position configuration network (LIP cue position-selective subpopulation: 69 ms, Fig. 5D, bottom right; all LIP: 57 ms, Fig. 5B, bottom right) and cue position from the first-stream position configuration network (LIP cue position selective subpopulation: 159 ms, Fig. 5D, top left; all LIP: 128 ms, Fig. 5B, top left) can be identified in both the entire and the position-selective LIP populations. Overall, in strong contrast with what can be seen in the FEF, LIP does not appear to use the same neuronal configuration to encode cue position and first-stream position.
Temporal dynamics of color signals
In this last section, we use the same full cross-temporal classification analysis as described above to analyze how cue color is encoded by areas LIP and FEF (Fig. 6). Both areas encode this information dynamically, as reflected by the fact that classification rates more than the 95% confidence interval are achieved along the diagonal, irrespectively whether the entire FEF and LIP neuronal populations are considered (Fig. 6A and B, respectively), or only the color-specific cells, as identified by a parametric statistical test (Fig. 6C,D; Ibos et al., 2013 shows details on the cell classification procedure). Notably, significant correct classification rates are observed in the entire LIP population very early following cue onset (54 ms; Fig. 6B), whereas only a transient encoding of cue color can be seen at 198 ms in the LIP color-selective population (Fig. 6D). In comparison, the temporal dynamics of color-coding are very similar between the entire and the color-specific FEF populations, information about this variable arising at 151 ms for the entire population and at 155 ms for the color selective subpopulation. Accordingly, in all of the populations of interest, the total time, following cue onset during which the classification performance of a given classifier is more than the 95% confidence interval, never reaches 200 ms, indicating a transient coding of color information (Fig. 6E; see Materials and Methods).
Effect of population size on temporal dynamics
The functional differences observed between the FEF and LIP populations and their respective subpopulations in the coding of spatial attention, position or color, could actually be due to a difference in the population size. To test this, we performed full cross-temporal classification analyses to decode spatial attention from the entire FEF and LIP neural populations, with population sizes ranging between 20 and 131 in steps of 20 for the FEF, and between 20 and 87 in steps of 20 for LIP. For the FEF, an additional population size of 87 was tested so as to provide a direct comparison with the actual LIP entire population. For a given population size, the neurons were randomly drawn from the entire population and the full cross-temporal classification analyses was performed as previously. This procedure was repeated 10 times to produce an average full cross-temporal classification analyses per population size. We show that, although population size affects the time of significance onset of spatial attention encoding by the population, the core dynamics (stationary versus transient) is minimally effected by population size, both in the FEF (Fig. 7A, top row) and in LIP (Fig. 7A, bottom row). Specifically, when a classifier is trained onto a window centered on the 250 ms postcue (235–265 ms; Fig. 7B, left, red curves) or on the 350 ms postcue (300–400 ms; Fig. 7B, right, red curves) of LIP populations of increasing sizes, significant decoding of spatial attention is consistently achieved only for small windows of testing time, centered around the training time, supporting a transient encoding of spatial attention. In contrast, for the FEF, significant decoding of spatial attention is consistently achieved longer windows of testing time (Fig. 7B, blue curves), supporting a stationary encoding of spatial attention. This is more marked when the classifier is trained on a late time window (300–400 ms; Fig. 7B, right, blue curves) than on an earlier time window (235–265 ms; Fig. 7B, right, blue curves). As a result, the description of transient or stationary processes is minimally impacted by population size. In contrast, the amplitude of decoding performance as well as it the latency of significance onset is affected by population size (Astrand et al., 2014a provides a description of population size onto decoding performance).
Relationship between the population temporal dynamics and the underlying individual neuronal responses
The following section analyzes how neuronal selectivity and reliability impact the population temporal dynamics. For the sake of space, this analysis is only performed on spatial attention encoding by the different populations of interest. However, the same observations hold for the encoding of cue position and cue color. Response selectivity of an individual neuron to spatial attention is defined as the difference in its spiking rate when attention is oriented to the visual stream generating maximal responses as compared with the nonpreferred visual stream. Figure 8A represents the average selectivity of the FEF attention-selective cells (intermediate blue), the entire FEF population (dark blue), and the entire LIP population (red). The statistical a significance of the rise in selectivity of attention-related information in each of the populations of interest is assessed as follows. Multiple Wilcoxon tests are performed at every 10 millisecond, comparing the distribution of spike-rate differences across the population from −50 to +50 ms around this time point (obtained from a sampling, with replacement, of 40 trials from each of the spatial attention condition, repeated 20 times) to the distribution of spike-rate differences during a baseline interval running from −50 to +50 ms around the cue. Only statistical differences that persisted with a p < 0.05 for >100 ms (10 successive time points) were considered as statistically significant. As can be expected from the very definition of the attention-selective cells, their average selectivity is higher than that of the other two populations. This average populational selectivity, however, only partially accounts for the differences in the dynamics of spatial attention coding. Indeed, both the entire FEF and LIP populations have similar average selectivities but notably, different temporal dynamics, the first being transiently sustained while the second is transient (it is to be noted here, that we do not discuss changes in average firing rates as seen in the previous section, because these are assumed to be constant during the cue to target interval; cue-related changes in response strength are thus fully captured by the neuronal selectivity measure). To further relate the temporal dynamics of each neuronal population to the response patterns of its individual neurons, we identified, for each training time (sampled every 10 ms), the two cells with the highest absolute contribution to the final readout of the classifier at each time step (i.e., highest |wi · ri|), for the entire FEF population (Fig. 8B1i), the FEF attention selective population (Fig. 8B2i) and the entire LIP population (Fig. 8B3i; note that we have considered the two top-contributing cells at each time step, for the sake of the readability of the figures, each cell being identified by a distinct color/shape code; however, the same qualitative observations still hold if one considers the top 5 or top 8 contributing cells, see below and Fig. 8C; also note that because individual cells can be among the 2 top-contributing cells at several time points during the cue to target time interval, the total number of cells fulfilling the top-contribution criterion is higher than this criterion, though upper bounded by the population size). This analysis reveals that these “top contributing” cells have a sustained contribution in time for both FEF populations, whereas this is not the case in LIP. Specifically, the top-contributing FEF cells to the population coding of attention do so for an average 227 ms (n = 9). An average contribution of 273 ms (n = 8) can be seen for the FEF attention-selective cells, and a contribution of only 185 ms for the LIP cells (n = 13). This observation still holds true when more top-contributing cells are included in the analysis. Specifically, Figure 8Ci, represents the average contribution-time of top-contributing cells, as a function of the number of top-contributing cells criterion. The second data point along the x-axis corresponds to a criterion of two top-contributing cells, and thus summarizes the observations reported in Figure 8Bi. Because individual cells can be considered as top-contributing cells at several time points (Fig. 8Bi, for a number of top-contributing cells criterion of 2), the total number of cells fulfilling the top-contribution criterion (Fig. 8Ci, right, y-axis) is higher than this criterion (x-axis) but upper bounded by the size of the population of interest. When a two-way unbalanced ANOVA is applied, taking as dependent variable the duration of contribution of single neurons to the decoder and as factors the population of interest (LIP all, FEF all, or FEF attention) and the number of top-contributing cells criterion, a significant main population of interest effect can be noted (p < 0.000001). A post hoc analysis indicates that the durations of contribution of the FEF attention-selective and the FEF entire populations are significantly higher than the durations of contribution of the entire LIP population (p < 0.001). At the level of the entire neuronal populations, a strong correlation is observed between the duration of the individual cell contribution (i.e., |wi · ri|) to the overall decoding and the time during which the cell is selective for spatial attention position (FEF all: r2 = 0.39, p < 0.001; FEF attention-selective: r2 = 0.71, p < 0.001, LIP all: r2 = 0.40, p < 0.001, where the interval of selectivity is defined, for each cell of a given population, as the time during which its selectivity is above its mean selectivity during a 200–50 ms baseline before cue presentation ±3 SD). A strong correlation is also observed between the duration of the individual cell contribution to the overall decoding and the time during which it is reliably selective for spatial attention position (FEF all: r2 = 0.40, p < 0.001; FEF attention-selective: r2 = 0.67, p < 0.001, LIP all: r2 = 0.32, p < 0.01, where the interval of reliability is defined as the time during which the cell's reliability is below p < 0.05). To further relate this observation to the individual cell responses, we calculated, for each of the top-contributing cells (Fig. 8B1i, BB2i, B3i; identified by a color/shape code), their spatial attention-related selectivity in time (Fig. 8B1ii, B2ii, B3ii), as well as the associated p value in time (nonparametric random permutation test, as an indicator of signal-to-noise ratio; Fig. 8B1iii,B2iii,B3iii). A visual inspection of these plots suggests that attention-related selectivity and the associated reliability is more sustained in the FEF attention-selective top-contributing cells, than in the FEF entire population top-contributing cells, than in the LIP entire population top-contributing cells. Confirming this qualitative assessment, for the entire FEF population, the top-contributing cells to the population coding of attention have a selectivity that is, on average, sustained over 111 ms, associated with a reliability that is, on average, sustained over 97 ms. For the FEF attention-selective population, the top-contributing cells are, on average, selective (218 ms) and reliable (184 ms) over longer time periods. In comparison, the top-contributing LIP cells are, on average, selective (101 ms) and reliable (88 ms) over shorter time periods. This still holds true when more top-contributing cells are included in the analysis [Selectivity: Fig. 8Cii, two-way unbalanced ANOVA, taking as dependent variable the duration of selectivity of single neurons and as factors the population of interest (LIP all, FEF all, or FEF attention) and the number of top-contributing cells criterion, significant population of interest main factor, p < 0.000001; Reliability: Fig. 8Ciii, two-way unbalanced ANOVA, taking as dependent variable the duration of reliability of single neurons and as factors the population of interest (LIP all, FEF all, or FEF attention) and the number of top-contributing cells criterion, significant population of interest main factor, p < 0.000001]. A post hoc analysis indicates that the durations of significant selectivity (respectively, reliability) of the FEF attention-selective population are significantly higher than the durations of significant selectivity (respectively, reliability) of the entire FEF population (p < 0.001; respectively, p < 0.001) and of the entire LIP population (p < 0.001; respectively, p < 0.001). In addition, the durations of significant reliability of the entire FEF population is also significantly higher than the durations of significant reliability of the entire LIP population (p < 0.001). Overall, this analysis indicates a strong correlation between the overall population dynamics and the properties of the underlying cells, sustained temporal coding relying on cells with sustained selectivity and reliability, whereas dynamic temporal coding emerges from cells with more transient, short temporal selectivity and reliability periods. It is, however, important to note that only three of the top-contributing cells in the attention-selective population are also identified as top-contributing cells in the entire FEF analysis, indicating that the contribution of a given cell to the temporal decoding does not only depend on its individual response characteristics, but also on how these responses compare with the overall population activity [i.e., to Σ (wi · ri)].
Together, we have provided theoretical and empirical evidence indicating that dynamic classification patterns arise when the underlying individual neurons are selective and/or have varying firing rate patterns to the variable of interest for only short time durations. In contrast, sustained temporal population coding patterns are obtained when the individual cells have more sustained patterns of selectivity, though the reliability of the selectivity of these responses may vary in time.
Discussion
We describe two distinct population regimens for the encoding of sensory and cognitive information in the parietal and prefrontal cortex: a stationary mode, in which a stable neuronal configuration encodes the information of interest, irrespectively of the response pattern of its individual elements, and a dynamic mode, in which the information of interest is encoded by a neuronal configuration, the elements and weight coefficients of which rapidly change with time. In addition, we show that analyzing the temporal dynamics of a heterogeneous neuronal population does not suffice to properly capture the neuronal processes at play in a given cortical area, and that its functional subpopulations should also be interrogated. Last, identifying the neuronal configuration in which a neuronal population encodes attention or position can serve to reveal this same information at other time periods in the task. These findings are discussed below.
Attention is encoded in a stationary coding pattern in the prefrontal cortex
In the cued-detection task used here, cue position and cue color need to be combined for a correct prediction of target position. This information is thus expected to be transiently represented in these two areas. In contrast, the maintenance of spatial attention at the target's most probable location is expected to be represented in a stationary way. Accordingly, we describe a stationary encoding of spatial attention by the prefrontal attention selective cells, the response pattern with which this population encodes spatial attention remaining constant throughout the cue-to-target interval. This is interesting in two respects: (1) attention related information is reliably represented in each of the prefrontal attention selective cells for durations shorter than the entire cue to target time interval (Fig. 1B, top; Ibos et al., 2013) and (2) the time at which spatial attention is represented varies from one attention selective cell to the other (Fig. 1B). This indicates that each of these cells contributes to the stationary population encoding of attention, including at times in the cue-to-target interval during which their individual response is not statistically significant. This stationary population encoding of attention is specific to the prefrontal cortex and cannot be found in the parietal cortex, neither at the single-cell level (Figs. 1B, 8Bii), nor at the population level in the entire LIP population (Fig. 3A).
Position is encoded in a stationary way but color is encoded dynamically
The initial dynamic coding of cue position by the cue position selective prefrontal cells rapidly shifts into a transiently stationary population encoding pattern. This departs from the theoretical expectation that no sustained encoding of cue position is required to perform the task. It also contrasts with the fact that individual cells encode cue position transiently and independently one from the other. Indeed, whereas this population encodes position in a stationary manner between 192 and 410 ms, the median cue position response latencies in individual FEF cells is 121 ms (Ibos et al., 2013), indicating that one-half of these cells start firing before this time. A similar though earlier dynamic population encoding of cue position can be found in the parietal population (median LIP cue position response latencies = 117 ms; Ibos et al., 2013). This transiently stationary encoding pattern of position is unexpected given the task structure. In comparison, the coding of cue color by the prefrontal cue selective cells is fully dynamic, the response profile of the cells encoding this information changing from one instant in the task to the other. This information is encoded between 151 and up to 400 ms. A similar dynamic population encoding of cue color is found in the parietal population, though this coding starts much earlier after cue presentation and continues up to 500 ms.
Stationary versus dynamic population coding
Overall, attention, position and color (respectively, position and color) are represented simultaneously in the prefrontal cortex (respectively, parietal cortex), though each of these variables is encoded by different population patterns. Stationary population coding involves a functional population in which the contribution of its individual coding elements is constant over time. In contrast, dynamic population coding involves individual neurons containing information on only short nonoverlapping time scales. As a result, information is available at any time on only a fraction of the population. Nonstationary population activity profiles have been reported previously in several cortical areas (Meyers et al., 2008; Crowe et al., 2010) including in the parietal cortex (Barak et al., 2010) and the prefrontal cortex (Meyers et al., 2012; Kadohisa et al., 2013; Stokes et al., 2013). In particular, Crowe et al. (2010) propose that in the parietal cortex, task-critical information is encoded dynamically while at the same time, task-irrelevant information is encoded in stationary neuronal codes. Although our observations in the parietal cortex possibly support this hypothesis (as we show a transiently stationary encoding of spatial position, and a dynamic coding of attention), they highlight the fact that temporal population codes might differ from one cortical region to another. Indeed, our results, as well as the recent report by Stokes et al. (2013) indicate a stationary coding of task-relevant information in the prefrontal cortex. Active mechanisms for sustaining working memory information in local neuronal populations are proposed to be at play through short-term plasticity mechanisms (Fujisawa et al., 2008; Mongillo et al., 2008; Erickson et al., 2010). The same mechanisms may also contribute to sustained spatial attention coding, as a distinctive property of prefrontal cortex relative to other cortical regions. Short-term plasticity could also be at the origin of dynamic population coding. Indeed constant inputs to a neuronal population can result in time-dependent response patterns if the membrane potentials and synaptic weights of its elements (the hidden state of the neuronal population) are continuously changing under the influence of the input pattern of activity (Buonomano and Maass, 2009). Interestingly, simultaneous stationary and nonstationary temporal coding patterns within both of the prefrontal and parietal cortex indicate that the putative short-term plasticity mechanisms at play selectively and differentially target specific functional subpopulations, possibly based on a principle of common driving input (Nikoliç et al., 2007).
Information multiplexing
Single-cell recording studies usually target specific functional cell categories. In contrast, studies that are interested in how neuronal populations contribute to cognition do not operate an a priori selection of cells. Our observations call for a mixed approach. Indeed, the functional characterization of individual cells does not fully account for the information in a given area (e.g., attention is encoded dynamically in LIP even though we fail to identify a significant attention-selective cell population). Likewise, we unveil a late population attention-related signal in the nonattention FEF population (Fig. 3D) that we fail to identify at the single-cell level (Ibos et al., 2013). Conversely, the analysis of the FEF and LIP temporal population coding patterns for attention, position, or color, though instructive in themselves, do not capture the entire functional processes at play. Temporal population coding in both the entire populations and their respective functional subpopulations bring about complementary observations. In particular, it reveals that different task-relevant and task-irrelevant information can be encoded in a given cortical area through different temporal coding patterns. How this is achieved at the neuronal level is still unclear. A parsimonious proposal would be that each functional subpopulation could be under the influence of distinct modulatory influences: different driving inputs, different neuromodulatory sensitivities, and different synchronization influences.
Context dependence
A frequent assumption in neurophysiology is that cells encode information irrespectively of time in the task and irrespectively of the type of task. For example, a cell encoding the spatial position of a visual item is expected to encode it in a similar way whatever the context. A growing body of evidence indicates that cell-selectivity is task dependent (Ben Hamed et al., 2002; Anton-Erxleben et al., 2009). Our data clearly demonstrate that population codes are also context-dependent. For example, parietal cue position cells hardly contribute to the coding of visual stream position. In contrast, the entire parietal population encodes both visual stream and cue position. Thus, the population code representing visual stream position succeeds to capture information about cue position while the inverse is not true and individual cell-selectivities fail to fully describe the functional contribution of a given area. Last, this highlights the fact that a given population encodes given information (here, position) in a context-dependent manner (here, neutral visual stimulus vs a task-relevant visual item). Importantly, these context-related influences are area dependent. For example, in contrast with LIP, both the entire FEF and cue-position populations encode cue position and visual stream using the same neuronal pattern.
Importantly, the population code with which the prefrontal attention-selective cells encode spatial attention orientation during the cue-to-target interval also identifies potential spatial attention signals during the pre-cue interval. These signals coexist with spatial position signals. The fact that they are best identified at this time in the task in a neuronal subpopulation that is attention-selective (i.e., by definition, nonselective to position) indicates that spatial attention population codes can serve to pinpoint attention-related processes at other times in the task and possibly in other tasks involving spatial attention (for review, see Astrand et al., 2014b for potential applications of this functional tagging). It is, however, for now unclear whether dynamic coding patterns are also deterministic across different task phases and different tasks.
Footnotes
This work was supported by the AgenceNationaleRecherche Grant ANR-05-JCJC-0230-01 (S.B.H.), the Centre Nationale pour la Recherche Scientifique (CNRS; to E.A.), the Délégation Générale des Armées (E.A.), the Fondation pour la Recherche Médicale (FRM; to E.A. and G.I.), and the French Ministère de la Recherche (G.I.). We thank J.-L. Charieau and F. Hérant for technical support with animal care.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr Suliann Ben Hamed, Centre de Neuroscience Cognitive, CNRS UMR 5229, Université Claude Bernard Lyon I, 67 Bd Pinel, 69675 Bron cedex, France. benhamed{at}isc.cnrs.fr