Abstract
Although many studies have shown that the activity of individual neurons in a variety of visual areas is modulated by attention, a fundamental question remains unresolved: can attention alter the visual representations of individual neurons? One set of studies, primarily relying on the attentional modulations observed when a single stimulus is presented within the receptive field of a neuron, suggests that neuronal selectivities, such as orientation or direction tuning, are not fundamentally altered by attention (Salinas and Abbott, 1997; McAdams and Maunsell, 1999; Treue and Martinez Trujillo, 1999). Another set of studies, relying on modulations observed when multiple stimuli are presented within a receptive field, suggests that attention can alter the weighting of sensory inputs (Moran and Desimone, 1985; Luck et al., 1997; Reynolds et al., 1999; Chelazzi et al., 2001). In these studies, when preferred and nonpreferred stimuli are simultaneously presented, responses are much stronger when attention is directed to the preferred stimulus than when it is directed to the nonpreferred stimulus. In this study, we recorded neuronal responses from individual neurons in visual cortical area V4 to both single and paired stimuli with a variety of attentional allocations and stimulus combinations. For each neuron studied, we constructed a quantitative model of input summation and then tested various models of attention. In many neurons, we are able to explain neuronal responses across the entire range of stimuli and attentional allocations tested. Specifically, we are able to reconcile seemingly inconsistent observations of single and paired stimuli attentional modulation with a new model in which attention can facilitate or suppress specific inputs to a neuron but does not fundamentally alter the integration of these inputs.
Introduction
Subjects can better detect or discriminate visual information by attending to specific locations. To understand the mechanisms of this behavioral improvement at a cellular level, visual representations within cerebral cortex have been characterized by measuring neuronal responses to stimuli when the locus of attention is shifted. These studies have led to two seemingly contradictory models: one in which attention modulates responses in a primarily multiplicative manner regardless of the stimulus (gain), and another in which attention alters stimulus summation (competition). The two models have substantially different implications on the nature of visual representations because, in the first, receptive fields (RFs) are not altered by attentional shifts, whereas in the second, they are, by reducing the influence of unattended stimuli (Reynolds et al., 1999) relative to that of attended stimuli. Unfortunately, the gain model cannot easily explain how attentional modulation varies with the particular stimulus that is attended, whereas the competition model cannot explain the effects of attention on neuronal responses (Motter, 1993; McAdams and Maunsell, 1999) and behavior (Posner et al., 1980; Davis et al., 1983) when a single stimulus is present inside a receptive field. One possible solution is to postulate an “input gain” mechanism of attention, in which gain does not depend on the stimulus number, spacing, or content but can be directed to specific neurons providing input to the neuron under study (Maunsell and McAdams, 2001). In this model, changes in attention alter the set of inputs that are modulated but not the rules of input summation within neurons.
To evaluate such a model of attentional modulation, it is therefore essential to understand spatial summation within individual neurons. This can be illustrated by a simple example. If spatial summation is described by an averaging of the inputs to a cell, then attention should increase responses both when it is directed to the nonpreferred stimulus and when it is directed to the preferred stimulus. Alternatively, assume that spatial summation within the receptive field of a neuron is “winner-take-all,” in which the strongest input determines the response of the neuron. The attentional modulation when a nonpreferred stimulus (weak input) is paired with the preferred stimulus (strong input) would then depend on which stimulus is being attended. Directing attention to the preferred stimulus would increase responses, whereas directing attention to the nonpreferred stimulus might have no effect because, even after gain was applied to the nonpreferred inputs, its strength could be less than the inputs associated with the preferred stimulus. If it was assumed that this neuron was averaging its inputs, then this difference in responses could be mistakenly interpreted as a receptive field change associated with attention rather than a change in the inputs to the cell.
Although this example emphasizes the importance of spatial summation in understanding the mechanisms of attention, no studies have simultaneously measured spatial summation and attentional modulation. Here we measure spatial summation and the attentional modulation of single neurons in visual cortical area V4 by analyzing the responses to single stimuli and pairs of stimuli when attention is directed to different positions inside and outside of the receptive field. First, we report that responses to paired stimuli usually depend on which stimulus is being attended. Second, we find that spatial summation in many neurons within V4 can be well described by a simple two-parameter model but is not well described by a simple averaging or winner-take-all model. We find that input gain models can better explain the attentional modulation of neuronal responses than “competition” models. Moreover, we find that attention acts primarily through the facilitation of attended stimuli rather than suppression of unattended stimuli. These results suggest that attention can affect the gain of particular inputs to a neuron but does not affect how inputs are combined to produce neuronal responses.
Materials and Methods
Two monkeys (Macaca mulatta) performed an orientation change detection task for a juice or water reward. Animals were treated in accordance with use and care guidelines established by the National Institutes of Health. Some of these data were also used to describe the effect of change probability on attentional modulation in a previous publication (Ghose and Maunsell, 2002).
Visual stimuli.
Stimuli were presented on a cathode ray tube display on a gray background [15.6 cd/m2; Commission Internationale de l'Eclairage (CIE), x = 0.33, y = 0.33]. Each gun of the display was gamma corrected (8 bits). The stimuli were Gabors of sinusoidally varying contrast (4 Hz, 100% peak contrast), truncated at a radius of twice the SD of the Gaussian envelope. Gabors were modulated around the mean luminance achromatic point in color space (CIE, x = 0.33, y = 0.33) that also defined the background. Single Gabors of varying chromatic modulation, size, spatial frequency, position, and orientation were used to characterize receptive fields and specify the stimulus parameters used in the attention task. All well isolated, visually responsive cells were tested.
Task design.
Trials were presented in block mode (12 or 15 trials per block), in which the behaviorally relevant position was fixed within each block and the same set of stimuli was presented within the receptive field. Animals were required to fixate on a small dot (∼0.1°) throughout each trial (fixation widow width, ±0.5–0.7°). Attention was spatially cued using instruction trials at the beginning of each block, in which only a single stimulus was presented. Subsequent trials within the block, which were used for all the data presented here, included stimuli at this cued position as well as other positions.
Stimuli were presented at two nonoverlapping positions within the receptive field (Fig. 1). In addition to these receptive field stimuli, stimuli were simultaneously presented at symmetric positions in the quadrant diagonally opposite from the receptive field. During each trial there were either two or four Gabors present. Trials with a single Gabor within the receptive field (A) were randomly interleaved with trials with two Gabors within the receptive field (B). Stimulus positions were fixed during data collection from each cell so that there were a total of four possible behaviorally relevant positions. Gabors of three different orientations were used: the preferred orientation, the null orientation, and an intermediate orientation.
The monkey's task was to release a lever as soon as a change occurred at the cued position while ignoring changes at other positions. Orientation changes only occurred when the counter-phasing Gabors reached zero contrast (every 125 ms), and changes at different positions were forced to occur at different times. Only one orientation change could occur at each position in each trial. Animals were rewarded with juice when they released a lever between 250 and 450 ms after a change at the cued location. Earlier releases, failures to release, and eye movements outside the fixation window immediately ended the trial without reward. Approximately 10% of trials were catch trials in which no change occurred at the cued position, and the monkey was rewarded for keeping the lever depressed and maintaining fixation. Each animal's performance, excluding fixation breaks, was >90% correct and did not depend on the time at which the behaviorally relevant change occurred.
Within the receptive field of the neuron under study, stimuli were positioned along a line of isoeccentricity (between 2 and 5°). Trials were assigned to three groups according to the cued position: attend in position 1, attend in position 2, and attend out (in which attention was directed to one of the positions in the opposite quadrant). Cells used in this study had at least eight blocks completed for each group. The attend in data comprised 12 trial types: three in which only a single stimulus at the attended position was presented in the RF (three orientations) and nine in which two stimuli were present within the receptive field (3 orientations at position 1 × 3 orientations at position 2). For the attend out group, there were 15 trial types: the same nine trials in which two stimuli were present and six trials in which single stimuli were presented at the two within-RF positions. In total, there were therefore 39 trial types (12 + 12 + 15). Analyses were based on cells for which responses to at least eight correctly completed trials of each type were recorded.
Electrophysiological recording.
Recordings were made from individual neurons in area V4 on the surface of the prelunate gyrus in daily sessions using transdural electrodes (0.5–1.5 MΩ at 1 kHz) and conventional extracellular recording techniques. Action potentials were recorded with a resolution of 1 ms using a time base that was synchronized with the vertical retrace of the monitor. Eye position was monitored by scleral search coil. Eye position and lever releases were recorded with a resolution of 5 ms.
Once a single unit was isolated, its receptive field, optimal orientation, and optimal spatial frequency were estimated by presenting Gabors with manually chosen parameters. Spatial receptive fields were confirmed quantitatively using single Gabor stimuli at eight adjacent positions around a central point. Four of these positions were along a line described by the vector connecting the central point and the fixation point; the remaining four were along a line normal to this vector. Gabor size and the center point of the eight position array were chosen so that responses to the central four positions were approximately equal and that responses significantly above spontaneous activity were observed at all eight positions. Data in this paper describe responses when one or two Gabors were presented at two central positions defined by this receptive field mapping. Preferred, intermediate, and null orientations were defined according to the responses of a single Gabor at the central point.
Analysis of attention effects.
The behaviorally relevant orientation change occurred with random timing. Two different change probability functions were used (schedule A and schedule B). In both schedules, change probability was consistently positive in the interval from 500 to 2500 ms after stimulus presentation (Ghose and Maunsell, 2002). In both schedules, probability was the same in all positions. Only responses from this interval and before any orientation change within the RF are described here and were quantified by mean rate (spikes per second).
Gain models of attention were obtained for each neuron through regression analysis on pairs of responses in which the visual stimuli within the receptive field were identical, but the behaviorally relevant position of change differed. In such models, the effect of attention is independent of the particular stimulus used to evoke responses: responses to attended and unattended stimuli are related by a single multiplicative factor (gain) and offset. Because of the possibility that the effects of attention are dependent on stimulus number, separate regression analyses were done for the responses to paired stimuli and the responses to single stimuli at each position. Spontaneous responses were included in the regression analyses by measuring neuronal activity in the 500 ms before stimulus presentation. To test whether a linear model of attention could fit data from all stimuli and all neurons, regression analyses were also done on the population of responses from all neurons.
To test for interactions between the effects of attention and stimuli, three-factor ANOVAs were done on the responses from each neuron. To allow for multiplicative effects on neuronal responses, spike rates were log-transformed before this analysis. Thus, if the effects of orientation and attention are separable in the same sense that stimulus parameters such as orientation and spatial frequency are for V1 neurons, the ANOVA would reveal no significant interaction between the factors of orientation and attention. Because the log transform required positive values, for those few trials in which no spikes were observed, a fractional response was defined to be half of the average response rate over all trials multiplied by the duration of the trial. For trials in which a single stimulus was presented within the receptive field, the ANOVA factors were as follows: position within the receptive field (two levels: position 1 and position 2), orientation (three levels: preferred, intermediate, and null), and the spatial locus of attention (two levels: attention directed within the receptive field and attention directed outside of the receptive field). For trials in which pairs of Gabors were presented within the receptive field, the ANOVA factors were as follows: orientation at position 1 (three levels), orientation at position 2 (three levels), and attention position (three levels: position 1, position 2, and outside of the receptive field). Significant effects were defined by p < 0.05.
Analysis of spatial summation.
We assessed spatial summation using variants of a generalized model that has been used to explain paired-stimulus responses in other visual areas (Britten and Heuer, 1999). In this model, the response to paired stimulation (R1,2) is related to the responses to individual stimuli (R1 and R2) according to the following equation: This model can characterize most of the common models of spatial summation including winner-take-all (large n, α = 1), averaging (n = 1, α = 0.5), and normalization (n = 0.5) (Britten and Heuer, 1999). To derive a model of spatial summation for each neuron, data from all three attentional conditions (attend out, attend position 1, and attend position 2) were used: where the asterisk indicates the receptive field position to which attention was directed. Six models were tested for each neuron: the three mentioned above, an unscaled power model (α = 1), a scaled linear model (n = 1), and a generalized scaled power model in which the parameters α and n were both free to vary. Because the scaled power model has the most free parameters, it necessarily provided the best fits and was primarily used to quantitatively test the effects of attention.
To test various models of attention, unattended single-stimulus responses were used to predict the paired responses to the three attentional conditions. This was done by introducing two additional parameters describing the attentional gain at each position (β1 and β2) so that This model is consistent with output gain models based on single-stimulus responses because, when R2 is 0, output responses are multiplicatively increased by a factor of αβ. However, in contrast to output gain, when multiple stimuli are within the receptive field, the model allows for gain to be selectively applied just to the inputs associated with the attended stimulus. Positive gain (β > 1) selectively applied to a particular stimulus would therefore increase the influence of that stimulus. Such a model can also incorporate suppression: in the case of a single stimulus, no inputs are suppressed, whereas in the case of multiple stimuli, negative gain (β < 1) might be applied to the inputs associated with unattended stimuli. Finally, if the spatial extent of this gain is large, this model could also explain how attention directed to locations immediately outside of the receptive field can alter receptive field profiles by increasing the gain of those inputs nearest the attended location (Connor et al., 1997).
For all attention models, β values were set equal to 1.0 for the attend out paired responses. For the paired-stimulus responses in which attention was directed within the receptive field, β values were varied so as to test four different models of attention. The β values were independent for the two attend in positions (Equations 3, 4). For the output gain model, β1 = β2 for both attend in positions (positions 1 and 2), and the β values were free to vary whenever attention was directed within the receptive field. Two additional models (the spotlight and filter models), in which attentional effects were limited to a particular position within the receptive field, were also tested. For the spotlight model of attention, the effect of attention is limited to the attended position: β values were set to 1 when attention was not directed to the β position and free to vary when attention was directed to the β position. In this model, β values > 1 correspond with attention increasing the influence of the attended stimulus. For the filter model of attention, the effect of attention is limited to the unattended position: β values were set to 1 for the attended position and free to vary for the unattended position. In this model, β values < 1 correspond with attention, decreasing the influence of the unattended stimulus. In all of these models, there were a total of two free parameters because β was free to vary between the two attend in positions. Finally, for the input gain model of attention, both the unattended and the attended β were free to vary. In this model, attention can act at both positions, and there were a total of four free parameters. For all models, β values were unrestrained: the three models differ in the locus of the attentional effects but are free to include both suppressive (β < 1) and facilitatory (β > 1) effects.
For all spatial summation and attention models, optimal parameters were obtained by minimizing mean square error (MSE) weighted according to the variance of the experimental observations using the downhill simplex method. This weighted MSE was then normalized to the explainable variance (variance of the means of the observations − variance of a typical observation). Models with different numbers of free parameters were statistically compared using an F test based on the sum of residuals weighted according the variance of the experimental observations.
Results
Attentional modulation of responses
Datasets were acquired for 159 neurons from two animals. Figure 2 illustrates the responses from an example cell to the test conditions. This neuron exhibited strong orientation tuning to single Gabors (top row and right column). Consistent with a previous report (Pollen et al., 2002), orientation tuning was similar at the two positions. When a single Gabor was presented in the receptive field, average responses were consistently higher when attention was directed within the receptive field (solid lines) than when it was directed outside of the receptive field (dashed lines). The results were more complex when two Gabors were presented within the receptive field: the effect of attention on responses to a particular stimulus combination depended on which stimulus was being attended. For example, when a preferred orientation Gabor at position 2 was paired with a null orientation Gabor at position 1 (Fig. 2B, column 1, row 3), responses were strong when attention was directed to position 2 (solid thick lines) but weak when attention was directed to position 1 (solid thin lines). This is consistent with the results originally reported by Moran and Desimone (1985) using pairs of bars within V4 receptive fields: responses are stronger when attention is directed to a preferred stimulus than when it is directed to a nonpreferred stimulus. The dominance of the attended stimulus was seen in this cell when the positions of the preferred and nonpreferred stimuli were switched (column 3, row 1).
The dominance of the attended stimulus is readily apparent in response surfaces constructed from the responses of the neuron to different stimulus combinations (Fig. 3). The bars along the edges of each plot show the responses to single stimuli. Orientation tuning is evident in these responses, and single-stimulus responses increase when attention is directed to the stimulus (attend 1 or attend 2). The three square color plots show the responses to paired stimulation when attention was directed to position 1, position 2, and outside the RF, respectively. For such stimulus pairs, the attended position dominates the response (A, center and right). The paired response surface for attend out shows that responses to pairs of stimuli are not necessarily simply predicted from single-stimulus responses (Pollen et al., 2002). For example, a simple averaging model is inadequate because, with such a model, the strongest response would be limited to preferred stimuli in both positions (top left corner). Instead, the stimulus at position 1 dominates responses so that, when a nonpreferred stimulus was presented at position 1, the neurons responded weakly even when a preferred stimulus was present at position 2.
To test whether the effects of attention could be explained by a simple increase in responsiveness (gain), regression analysis was applied to pairs of responses in which the visual stimulation was identical but the attended position was different. For single stimuli, the linear model provides a good fit with regression coefficients ∼0.99: at both positions, the slope is significantly above 1, whereas the offset is not significantly different from 0 (Fig. 3B,C). For paired stimuli, the situation is more complex. When attention is directed to position 1, an output gain model provides a good fit for the data (r = 0.897). However, this gain (slope) is larger for paired stimuli than for single stimuli (D vs B). Furthermore, the paired-stimulus responses when attention was directed to position 2 were not proportionally related to the responses when attention was directed outside of the RF (E). This follows from the weak dominance of position 1 when attention was directed outside of the RF (A) and the dominance of position 2 when attention was directed to that position.
ANOVA was applied to determine the factors that significantly affect the response rate of this neuron. For single stimuli, there was a significant effect of attention (p ≪ 0.001), orientation (p ≪ 0.001), and position (p < 0.001), but there was no interaction between attention and other factors. Thus, consistent with the regression model and previous reports (McAdams and Maunsell, 1999; Ghose and Maunsell, 2002), attentional modulation did not depend on orientation or position within the receptive field. For paired stimuli, there were significant effects of attention (p ≪ 0.001), orientation at position 1 (p ≪ 0.001), and orientation at position 2 (p ≪ 0.001). Unlike the single-stimulus case, however, there were significant interactions between attention and the orientations. As expected from previous results, attention did not simply scale the responses to paired stimuli: its effects depended on which orientation was being attended. Moreover, there were significant interactions between orientation at position 1 and orientation at position 2.
To measure the average effects of attention and stimulus variation on the studied population, responses from each cell were normalized according to the response seen at the best orientation and best position in the attend out dataset. Responses were then sorted according to preferred, intermediate, and null orientations, and preferred and nonpreferred position. The effects of attention shown for the neuron illustrated in Figure 3 were similar in this population average (Fig. 4). On average, the null orientation response was 50% of the preferred orientation response, whereas nonpreferred position responses were 70% of the preferred position responses. As was seen in Figure 3 with paired stimulation, attention to a particular stimulus increases the influence of that stimulus on responses (A, middle and right). Thus, the strongest responses were not limited to the trials in which both stimuli were of preferred orientation (top left corner). Also similar to the example cell, the output gain model provides an excellent fit to the single-stimulus data with regression coefficients above 0.99 at both the preferred and nonpreferred positions. Slopes and intercepts for attentional effects do not differ significantly between the two positions. Consistent with a gain model, the intercepts are not significantly different from 0, whereas the slopes are above 1.0. Poorer regression coefficients were seen with paired stimulation: 0.93 for the preferred position and 0.91 for the nonpreferred position. Again, the intercepts are not significantly different from 0, whereas the slopes are significantly above 1.0.
To study whether interactions were present for those cells whose responses were modulated by attention, we performed ANOVA on individual cells. For single stimuli within the receptive field, the three-way ANOVA revealed that the responses of 104 of 159 cells were significantly affected by changes in orientation at p < 0.05 level, whereas 86 of 159 cells had responses modulated by attention. Interestingly, for some neurons (43 of 159), orientation tuning varied between stimulus positions. However, very few neurons exhibited significant interactions between attentional modulation and orientation (12 of 159) or between attentional modulation and position (19 of 159). Thus, for the majority of neurons whose responses to single stimuli were modulated by attention, the effects did not depend the particular stimulus.
Robust attentional effects were more common when two stimuli were presented in the receptive field (121 of 159). For paired-stimuli responses, most neurons (123 of 159) were affected by orientation in at least one of the two positions. A significant proportion of neurons were like the example neuron shown in Figure 3 and showed significant interactions between orientations at the two positions (62 of 159). Similarly, interactions between attention and receptive field stimulation were more common with paired-stimuli responses than with the single-stimulus responses: attention × orientation at position 1, 48 of 159; attention × orientation at position 2, 55 of 159; attention × orientation at position 1 × orientation at position 2, 28 of 159.
To study whether the effect of attention depends in a fundamental way on the number of stimuli, we performed regression analysis on all the responses from all cells to single and paired stimuli (Fig. 5). For this dataset, the regression model tests whether a single output gain model can explain the effects of attention regardless of stimulus configuration or the particular cell chosen. For both single-stimulus (A) and paired-stimulus (B) responses, the output gain model provided a good fit with correlation coefficients of 0.945 and 0.904, respectively. Moreover, the models yielded indistinguishable coefficients for the single- and paired-stimulus datasets: with a slope of 1.13 and 1.14 and an intercept of 2.02 and 2.00, respectively. Because these intercepts are significantly different from 0, the average effect on attention cannot be simply described as a multiplicative increase in responses: in addition to the multiplicative effect, attention has a small additive effect on response rates. These analyses show that the effects of attention are not absent with single stimuli. On the contrary, the average effect of attention, in terms of attentional modulation, is not statistically different between the single-stimulus (B) and paired-stimulus (D) datasets (Wilcoxon's test).
There is greater variance in the distribution of attentional modulations seen with paired stimuli (Levene's test, p ≪ 0.001) than with single stimuli. This suggests the possibility that attentional modulation is more stimulus dependent when paired stimuli are presented. For example, as shown in Figures 2⇑–4, responses were larger when attention was directed to the preferred stimulus than when attention was directed to the nonpreferred stimulus. To directly examine this, we performed regression analysis on responses from cells with significant orientation tuning according to the single-stimulus ANOVA. Separate analyses were done according to when the preferred or null orientation was presented. As expected from previous regression analyses, when only single stimuli of preferred orientations are considered, attentional effects are well modeled by a linear model (r = 0.93). Virtually identical regression parameters specify the effect of attention on the nonpreferred orientation responses. This is consistent with the ANOVA analyses suggesting that attentional modulation is constant for different orientations (McAdams and Maunsell, 1999) of isolated stimuli. Moreover, the distribution of attentional modulations is similar for single stimuli of preferred (Fig. 6A) and nonpreferred (Fig. 6B) orientations.
This consistency is not seen when both a preferred and null orientation stimulus are presented within the receptive field (Fig. 6 C–E). In this case, when the pair member with preferred orientation is attended (C), the effects of attention are significantly larger than the effects seen for single stimuli (A, B). Conversely, when attention is directed to the null orientation stimulus of such a pair (D), the effects of attention are much smaller than those seen with single stimuli. Consistent with this, paired-stimulus responses are significantly larger when attention is directed to a preferred stimulus than when directed to a nonpreferred stimulus (E).
These results are primarily consistent with previous reports. Attentional modulation was not dependent on the presence of multiple stimuli within the receptive field: significant attentional modulation was observed when a single stimulus was present within the receptive field. Consistent with a previous report regarding orientation tuning and spatial attention (McAdams and Maunsell, 1999), these results suggest that attentional modulation is independent of orientation and can be approximated as a multiplicative gain on responses. Our results are also consistent with published reports of attentional modulation in the case of paired stimulation (Moran and Desimone, 1985; Luck et al., 1997): attention significantly modulates responses in more cells with such stimulation than with single stimuli and increases the relative influence of the attended stimulus on responses.
Spatial summation
To characterize spatial summation within V4 receptive fields, we evaluated for every cell how well six different models could predict responses to paired stimuli from the responses observed when the stimuli were presented separately (Eq. 1). Each model had zero to two free parameters. A total of 27 paired responses were compared with model predictions (3 orientations at position 1 × 3 orientations at position 2 × 3 attentional states). We measured model performance by computing the mean square error weighted according to the variance of each observation and then divided by the explainable variance. We restricted our analysis to 90 neurons whose signal-to-noise ratio (variance of mean responses to different stimuli divided by mean variance of responses to the same stimulus) was at least 2:1.
Figure 7 illustrates the performance of these models. Three of the models, winner-take-all, averaging, and unscaled power, were extremely poor fits for most neurons (A–C). For most neurons, these models produced errors larger than the variance of the observations (error > 1). The remaining three models were considerably better: the median normalized error for each of these models was ∼0.6 (D–F). Thus, receptive field models that do not have the flexibility of a scaling term (α in Eq. 1) are unable to account for paired-stimulus responses for the majority of neurons. Among the three models containing such flexibility, the addition of the free parameter of power in the scaled power model (n in Eq. 1), which was fixed at 1 in the scaled linear model and 0.5 in the normalization mode, significantly improved fits in a large fraction of neurons (F test; scaled linear, 29 of 90; normalization, 37 of 90). For the remaining neurons, two single parameter models (scaled linear and normalization), in which the power term was locked and the scaling term was free to vary, performed as well as the generalized scaled power model in which both parameters were free to vary (G, H). However, because of the superiority of the scaled power model overall (p < 0.001, paired sign rank test) and for a large fraction of neurons, it was used as the spatial summation model for testing the different models of attentional modulation.
Spatial summation and attention models
Four models of attention were tested for each cell using the spatial summation parameters obtained in the scaled power models of Figure 7F. In all of these models, attentional effects were incorporated by modulating the attend out single-stimulus responses (R1 for position 1 and R2 for position 2) in specific ways (Eqs. 5). The simplest two models were the filter and spotlight model. In the filter model, the effect of attention is to modulate the unattended stimulus. This might be accomplished by either shrinking the spatial receptive field so that it no longer fully includes the unattended stimulus or by reducing the influence of the unattended stimulus throughout a constantly sized receptive field. The spotlight model (Posner, 1980; Posner et al., 1980; Crick, 1984; Eriksen and St. James, 1986), conversely, solely modulates the response of the attended stimulus. These models therefore describe two extreme possibilities for how spatial attention acts: in the filter model, it acts solely by decreasing the influence of unattended stimuli, whereas in the spotlight model, attention acts by increasing the influence of attended stimuli. Because attention occasionally decreases response (Figs. 5, 6), we generalized these models to incorporate both increases and decreases in the influence of particular stimuli. In this case, the only difference between the models is the actual site of attentional modulation, and the two models have the same number of free attentional parameters. As shown in Figure 8A, for most neurons in which there is a difference between the models, the spotlight model is superior (lower errors). The superiority of the spotlight model is statistically significant at p < 0.001 (paired sign rank test).
In the generalized input gain model, attention can affect both the attended and unattended positions. In a large fraction of neurons, the additional flexibility of modulating both unattended and attended inputs results in a significantly better fit than either the filter (Fig. 8B) or spotlight (Fig. 8C) models (F test; filter, 26 of 90; spotlight, 20 of 90). A final model of attention, output gain, stipulates that attention acts at both the attended and unattended positions, but its magnitude is equivalent at the two positions. Again, for a large fraction of neurons, the input gain model provided a significant improvement in fit over output gain (F test; 37 of 90) (Fig. 8D). Indeed, the output gain model provides the poorest explanation of the data of all the attentional models tested.
Figure 9 shows the quantitative effects of attention on the inputs associated with attended and unattended stimuli by plotting the attentional modulation coefficients (β) of the best fitting input gain model for each neuron. Only neurons whose input gain model error was <0.5 were included (n = 35). The origin is defined by a lack of any attentional modulation (β = 1), with surrounding values reflecting suppression (β < 1) and facilitation (β > 1) by attention. If the spotlight model was a good description of attentional modulation, all points would lie near the x = 1 axis. Alternatively, if the filter model was a good description, coefficients would lie along the y = 1 axis. Finally, if attentional modulation was identical at the two sites (output gain), points would lie along the diagonal. Clearly none of these simpler models adequately characterizes the population. However, there are more points along the spotlight axis than the filter axis (A). The gray triangle indicates points in which the modulation to the attended stimulus is larger than that for the unattended stimulus. Gain is significantly larger for the attended position than the unattended position (p ≪ 0.001). A majority of neurons fall into the fourth quadrant, indicating a combination of facilitation for the attended stimulus and suppression for the unattended stimulus.
However, the amount of suppression at the unattended site is not as consistent across the population as the facilitation of the attended site. This relative importance of attended position as opposed to the unattended position is consistent with the superiority of the input gain model over the filter model that was shown in Figure 8.
A major determinant of the goodness of fit of these attention models is the appropriateness of the underlying spatial summation models. The two measures are highly correlated (r = 0.88): if the spatial summation models provide a poor fit, so do the attention models based on it. Because of this dependency, it is important to measure how much the attentional weights depend on the particular spatial summation model used. We tested this dependency by deriving the attentional weights under the input gain model for the two models closest to the scaled power model: normalization and scaled linear receptive field models (Fig. 7G,H). As shown in Figure 8, B and C, most points lie near the diagonal, indicating that the attentional modulation parameters derived with the input gain model do not strongly depend on the particular RF model used. Moreover, there is no consistent tendency for attention modulation parameters derived from the different receptive field models to differ (paired rank sum test).
To derive the average effect of attention seen over our population, we applied the input gain model to the average normalized responses (Fig. 4). We slightly modified the attention models in this case so that the attentional weights to attended and unattended stimuli were defined to be consistent for the two attend in positions. For the average normalized cell derived from this dataset, the attentional modulation at the attended position was 1.425, whereas at the unattended position it was 0.901. These values are similar to the median values shown in Figure 9A.
Discussion
We have measured neuronal responses under varying stimulus and attentional conditions to formulate a quantitative model of the effects of spatial attention on visual representations in area V4. Consistent with previous observations, we find that attentional modulation can be observed when single stimuli are present within the receptive field and does not depend on the presence of competing or weak stimuli within the receptive field. Moreover, we find in the same cells in which the single-stimulus response modulations are observed that attentional modulations to paired stimuli strongly depends on the particular stimulus to which attention is directed. These seemingly contradictory findings are reconciled by considering the rules of spatial summation within neurons. Specifically, by allowing for nonlinearities in input summation, we can account for both single- and paired-stimuli responses with and without attention by postulating that attention can change the strength of inputs that a neuron receives but not the manner in which inputs are summed by the neuron. The model is parsimonious in that it assumes that the rules governing the summation of inputs within a particular neuron, which are fundamentally important in determining receptive field properties, are not dependent on behavioral state. Our paired-stimuli responses are consistent with a “biased competition” model of attention increasing the influence of a particular stimulus: we observed a strong difference in responses to paired stimulation when attention was directed to the preferred, as opposed to the nonpreferred, stimulus. For our data, the median of these responses across all cells, regardless of whether they had significant attentional effects or not, was 1.55. This is considerably less than a similar analysis, based on baseline-subtracted responses, done by Moran and Desimone (1985), which yielded a median modulation of 2.77. However, it is very similar to the figure Luck et al. (1997) reported based only on cells that were significantly affected by attention (1.63).
Despite these consistencies, there are many aspects of our data that a pure competition model cannot easily explain. Because competition models are inherently focused on explaining paired-stimulus responses, the effects of attention in other circumstances can be difficult to accommodate. For example, attentional effects can been seen in spontaneous activity (Luck et al., 1997), with low-contrast stimuli (Reynolds et al., 2000), and when attention is direction outside the receptive field (Connor et al., 1997). These observations led to the suggestion that the stimulus-evoked saturation of neuronal response could preclude the increase in firing rate normally seen with attention (Reynolds et al., 1999). Our data are primarily inconsistent with this suggestion: we found significant attentional modulation for single effective stimuli (Figs. 4B,C, 5A,B). Moreover, on average, the attentional modulation seen with single stimulus is the same as that seen with paired stimulation (Fig. 5B,D). This is consistent with previous reports of responses in area V4 (Motter, 1994; McAdams and Maunsell, 1999) and other visual areas, as well as behavioral improvements in reaction time when only a single stimulus is present. Conversely, initial studies found relatively little attentional modulation of single-stimulus responses in V4 (Moran and Desimone, 1985). In these studies, single-stimulus data were acquired separately from the paired-stimulus data and from essentially different neuronal populations. Because the two types of stimuli were not interleaved, the comparison is confounded by task difficulty: single-stimulus responses were only measured in relatively easy tasks. Because attentional modulation depends on task difficulty (Boudreau et al., 2006), single-stimulus responses collected under such a design are unlikely to exhibit much attentional modulation.
A pure competition model, in which significant attention modulations only occur when stimuli are nearby to one another, faces additional challenges. If this process can only take place at a particular cortical area, for example, V4, then it implies a fixed spatial scale for attention, which is inconsistent with psychophysical demonstrations of flexibility in the spatial extent of attention (Eriksen and St. James, 1986; LaBerge and Brown, 1986). For such flexibility to exist in the model of biased competition, it would be necessary to first identify the scale over which competition exists, then identify the neurons whose receptive field size matches that scale, and then modulate the synapses of those neurons selectively. Moreover, this must be done automatically and quickly because attentional modulation can be seen in the earliest stimulus responses (Ghose and Maunsell, 2002). Finally, the model assumes that the activity of neurons providing input to a V4 neuron cannot be modulated by attention, which is inconsistent with experiments demonstrating task effects in V1 neurons (Roelfsema et al., 1998; Sengpiel and Hubener, 1999; Huk and Heeger, 2000; Crist et al., 2001).
If the combination of inputs strongly varies with behavioral state, it might be challenging to create robust higher-level visual representations dependent on a particular combination of low-level features (Salinas and Abbott, 1997). For example, Pasupathy and Connor (1999) have shown that V4 neurons can selectively respond to particular line intersections. Assume a neuron that responds to the combination of orthogonal line elements such as a +. This might arise from summing inputs from orthogonally orientated neurons (− and +). Although the biased competition model mandates that such stimuli would compete and result in a response in between the | and − responses, that is clearly not the case for a cell that is truly selective for the combination of these features: a cell might respond vigorously to + and not at all to either | or −. Indeed, such selectivity for higher-order features might explain the relatively large proportion of neurons in our sample that are not well fit by standard spatial summation models. Biased competition also does not describe how attention might be directed to the + as a whole. The associated inputs (− and |) are actually superimposed and therefore should be maximally “competitive.” Given the wealth of spatial attention literature regarding the detection of letters, it is clear that detection performance for such a conjunction can be enhanced. However, if the stimuli are naturally competitive and attention can only mediate this competition, it is not clear how performance for detection of the conjunction could be enhanced without constructing separate + representations for different allocations of attention.
Two models have been proposed to describe spatial summation in V4. Reynolds et al. (1999) asserted that summation can be described by an averaging process but did not test this assumption against alternative models of summation. Conversely, Gawne and Martin (2002) compared the averaging model with a winner-take-all model and found the winner-take-all model to be superior. Our data indicate that simple averaging and the winner-take-all model are poor descriptions of input summation for most V4 neurons. Our data are far more consistent with a scaled linear model, although it is clear for some cells that a more generalized model with a power term not equal to 1 provides a better fit. One clear difference between our data and those of Gawne and Martin are the time course and magnitude of the responses: in our data, the average peak firing rate is ∼30 spikes/s shortly after stimulus onset, whereas in many of the examples shown by Gawne and Martin firing rates after stimulus presentation are in the range of hundreds of spikes per second. This difference is likely attributable to stimulus differences: whereas we used a 4 Hz counter-phasing Gabor that on average was equiluminant with the background, Gawne and Martin presented, with sudden onset, a black and white checkerboard pattern. Whatever the cause of the response difference, the fact that the Gawne and Martin data include epochs in which neurons were near saturation might tend to bias summation toward a winner-take-all model because responses near saturation cannot be increased much.
The generalized spatial summation model used here, which includes both a linear scaling and a power term, has been used to predict paired-stimulus responses of middle temporal area (MT) neurons (Britten and Heuer, 1999). In many respects, our findings are similar: in both studies, normalization, scaled linear, and scaled power models provided the best fits, and the typical values of the scaled power model in MT (slope, 0.745; exponent, 2.72) are similar to those that we found (slope, 0.578; exponent, 2.07). Some differences do exist, however: for our data, the winner-take-all, averaging, and unscaled power were extremely poor fits for most neurons, whereas for the MT data, the mean error for these models was ∼0.4. Similarly, the best performing model, the scaled power model, performs less well for our V4 data than in MT, in which the mean normalized error was 0.25. Two factors are likely to contribute to these differences. First, it is likely that a substantial fraction of V4 neurons selectively respond to particular orientation combinations. This form of higher-order feature selectivity would not be captured by spatial summation models, just as for V1 neurons the response to an optimally oriented bar cannot be predicted by the averaging of responses to small spots making up the bar. Second, all of the Britten and Heuer fits incorporated an additional offset parameter to correct for errors in measurements of spontaneous activity. Although the addition of such a parameter has little effect in our data, based on a least eight presentations, it is unclear how much the MT fits, based on data from single presentations of many different pairings, were helped by the offset term.
Our results offer insight into a fundamental question of spatial attention: does attention act as a filter to eliminate (Broadbent, 1958; Treisman, 1960, 1969) signals associated with unattended stimuli or a facilitator of signals (Eriksen and Yeh, 1985) associated with attended stimuli? In most neurophysiological measurements of attentional modulation, it is impossible to distinguish the two because a difference between attend in and attend out responses might arise from a facilitation within the receptive field, suppression outside of the receptive field, or some combination of these effects. Our data suggest that attentional modulation is such a combination of facilitation and suppression. However, the average suppression is relatively weak and local so that, for large attention shifts, such as shifting the locus of attention between visual quadrants, the facilitatory effect dominates. Even on local scales, suppression is not consistently visible: for many cells, facilitation to the attended stimulus was significant, whereas suppression to the unattended stimulus was not (Fig. 9A). Thus, although simple models in which the attention solely increases the influence of attended stimuli (spotlight) or decreases the influence of unattended stimuli (filter) cannot fully account for our observations, the spotlight model of pure facilitation provides a better explanation of our observations than the filter model of pure suppression.
Although the exact relationship between the spatial characteristics of a task and the spatial scaling of attentional modulation remains unclear, it is clear that the spatial window of attention is pliable (Eriksen and St. James, 1986; LaBerge and Brown, 1986). We suggest that, just as the temporal characteristics of a task affect the timing of attention modulation (Ghose and Maunsell, 2002), so do the spatial characteristics affect its scale. Given the evidence that task difficulty can affect the magnitude of attentional modulation, we would further suggest that the overall magnitude of attentional modulation is primarily determined by task difficulty. Task difficulty is not separable from spatial and temporal constraints: a detection task that involves a small brief flash would undoubtedly be more difficult than one involving a large static stimulus. The spatial and temporal characteristics of a trained task would therefore play a critical role in determining the timing, magnitude, and distribution of the attentional modulation of visual signals. However, the present results suggest that such factors neither affect the fundamental mechanisms of such modulation nor alter the rules of spatial summation underlying visual receptive fields.
Footnotes
-
This work was supported by National Eye Institute Grants EY05911 and EY14989 and the Human Frontier Science Program. J.H.R.M. is an Investigator with the Howard Hughes Medical Institute. We thank D. Murray and T. Williford for assistance with the animals and I. Harrison and B. Schneider for helpful comments on this manuscript.
- Correspondence should be addressed to Geoffrey M. Ghose at his present address: Department of Neuroscience, Center for Magnetic Resonance Research, University of Minnesota, 2021 6th Street SE, Minneapolis, MN 55455. geoff{at}cmrr.umn.edu