Abstract
The neural mechanism of bottom-up attention and its relationship to top-down attention are poorly understood. Visual stimuli that differ from others in their component features are salient and tend to draw attention in a bottom-up manner. “Popout” stimuli differ uniformly from surrounding items and are more easily detected than stimuli composed of a conjunction of surrounding features. We compared the responses of single area V4 neurons to popout and conjunction stimuli appearing within the classical receptive field (CRF) and found that their responses are modulated by popout. This selectivity was more robust when larger numbers of surrounding items and multiple features were included in the display, and it was absent when only a few items were presented immediately outside the CRF. In addition, the popout modulation of V4 activity was eliminated when top-down attention was directed to locations outside of the CRFs during saccade preparation, indicating that the salience of popout stimuli is not sufficient to drive selection by V4 neurons. These results demonstrate that neurons in feature-selective cortex are influenced by bottom-up attention, but that this influence is limited by top-down attention.
Introduction
Visual attention is classically divided into two categories based on the origin of signals causing heightened perception of selected stimuli, namely “top down” and “bottom up” (James, 1890; Kinchla, 1992). Top-down attention refers to the willful deployment of perceptual resources based on task-driven goals, whereas bottom-up attention is driven by the physical salience of external stimuli. Extensive neurophysiological work has established neural correlates of top-down attention throughout visual cortex (Motter, 1993; Kastner and Ungerleider, 2000; Reynolds and Chelazzi, 2004), yet the influence of bottom-up attention on visual cortical responses is more equivocal, particularly in early, feature-selective areas.
Psychophysical studies have established that visual targets composed of features that are dissimilar from surrounding distracters are more salient and more easily located during search (Egeth and Yantis, 1997; Wolfe and Horowitz, 2004). These “popout” stimuli are believed to draw attention in a bottom-up manner, with search being driven largely via “parallel” (Treisman and Sato, 1990) or “preattentive” (Wolfe, 1994) means. In comparison, targets made up of a unique conjunction of nontarget features are more difficult to locate and require longer search times for increasing numbers of distracters (Treisman and Gelade, 1980; Hegdé and Felleman, 2003). The robust difference in search efficiencies for popout and conjunction stimuli, both of which are defined by feature discontinuities, suggests a means by which to probe the mechanisms of bottom-up and top-down attention.
Major models of visual attention typically involve separate stages for the computation of differences in local features (“feature maps”) and global salience (“salience maps”) (Treisman and Sato, 1990; Wolfe, 1994; Itti and Koch, 2000). Recent neurophysiological studies have provided compelling evidence that global salience is computed within parietal (Balan and Gottlieb, 2006; Goldberg et al., 2006; Buschman and Miller, 2007) and/or within prefrontal (Moore et al., 2003; Thompson and Bichot, 2005; Buschman and Miller, 2007) cortex. Whether this modulation arises de novo in these presumed salience maps or converges there from feature-selective (feature map) areas is unknown. A recent study found that while there is ample color and orientation selectivity in V1, neurons there cannot distinguish between popout and conjunction defined along those two feature dimensions, but instead respond equally to all feature discontinuities (Hegdé and Felleman, 2003). Whether there is a similar absence of popout modulation in later stages of feature-selective areas is an open and crucial question for the above models.
We recorded from single neurons in macaque V4 and compared their responses to identical classical receptive field (CRF) stimuli presented within a popout or conjunction configuration. We found that V4 responses to popout stimuli were enhanced compared to responses to conjunction stimuli. This enhancement was more robust when larger numbers of surrounding items and multiple features were included in the display but was absent when only a few items were presented immediately outside the CRF. In addition, we found that the modulation of V4 activity by popout was eliminated when top-down attention was directed to locations outside of the CRF, indicating that the salience of popout stimuli is not sufficient to drive selection by V4 neurons.
Materials and Methods
Two male monkeys (Macaca mulatta) weighing 9 and 11 kg were used as subjects in these experiments. General experimental and surgical procedures have been described previously (Graziano et al., 1997). Each animal was surgically implanted with a head post, a scleral eye coil, and recording chambers. Surgery was conducted using aseptic techniques under general anesthesia (isoflurane) and analgesics were provided during postsurgical recovery. All surgical and experimental procedures were approved by the Stanford University Administrative Panel on Laboratory Animal Care and the consultant veterinarian and were in accordance with National Institutes of Health Guide for the Care and Use of Laboratory Animals and Society for Neuroscience guidelines.
Electrophysiology
Recordings from single V4 neurons were made through a surgically implanted cylindrical titanium chamber (20 mm diameter) overlaying the prelunate gyrus. Electrodes were lowered into the cortex through a stainless steel guide tube using a hydraulic microdrive (Narashige). Neuronal activity was recorded extracellularly with varnish-coated tungsten microelectrodes (FHC) of 0.2–1.0 MΩ impedance (measured at 1 kHz). Extracellular waveforms were digitized and classified as single neurons using online template matching (FHC). V4 neuron CRFs were mapped in a separate behavioral paradigm in which oriented bars were swept across the display in eight different directions during fixation. While the activity of the recorded cell was monitored via an audio amplifier, the edges of the CRF were plotted on a second monitor (Lamme, 1995). All V4 CRFs in this study were in the lower contralateral visual field with eccentricities between 2.5 and 5°.
Visual stimulation
All stimuli were presented on a colorimetrically calibrated CRT display (Mitsubishi 2070SB-BK, 29 cm vertical and 39 cm horizontal, 60 Hz) at a resolution of 1024 × 768 (26 pixels/degree). The display was controlled by a Pentium-based computer with an NVIDIA FX5200 video card (8 bits per gun). Judd chromaticities of the phosphors were measured with a Photo Research PR-650 spectra colorimeter and the output of each phosphor was linearized using an International Light IL1700 radiometer. The values were red (0.628, 0.342), green (0.294, 0.612), and blue (0.152, 0.081). The colors of the stimuli were specified in a color space based on opponent representation of cone responses (MacLeod and Boynton, 1979). Cone excitations were calculated using Smith–Pokorny (Smith and Pokorny, 1975) cone fundamentals based on human observers. This color space is similar to the one proposed by MacLeod and Boynton (1979) and used by Derrington et al. (1984) but specifies color in terms of contrasts with respect to a neutral gray. The colors of all stimuli were psychophysically equiluminant to one another. Psychophysical equiluminance was determined for each monkey using the minimum motion method of Logothetis and Charles (1990).
Stimulus arrays composed of colored, oriented bars were presented on a neutral gray background (10 cd/m2), with one bar centered within the CRF of the recorded V4 neuron(s) (Fig. 1). Bars were arranged in rows (3, 5, or 7) and columns (3, 5, or 7) resulting in 9, 25, and 49 arrays. The position of each non-CRF bar was randomly offset by 5 pixels in both the x and y directions on each presentation to avoid any perceptually apparent structure in the array. In each stimulus array, bars outside of the CRF had their nearest edges positioned a minimum of 1.3 CRF diameters from the CRF center. For each neuron, five array types were constructed in which the CRF bar was held constant while the surrounding bars varied in color and orientation. This variation resulted in four array types in which the CRF bar differed from the surrounding ones in color and/or orientation. In three of the array types (popout), the surrounding bars differed from the CRF bar in the same way, i.e., color, orientation, or both. In the fourth type, the CRF bar was unique in the conjunction of the two stimulus features. For example, the CRF bar could be red–vertical and the surrounding bars could be red–horizontal, green–vertical, and green–horizontal. In a remaining array type, the surrounding items were identical to the CRF bar (homogenous). Finally, we also presented the CRF bar alone (singleton). For some neurons (76/137), display conditions were added in which the five array types were presented in the absence of a CRF bar to subtract out possible CRF stimulation by the surrounding array. The 48 items displayed during these empty CRF trials were identical to the surrounding items of those used for the largest array size. For these neurons, activity measured during the presentation of the surrounding array by itself (popout or conjunction) was subtracted as a “baseline” from the response elicited by the surround plus the popout or conjunction CRF stimulus. Popout indices were computed for all neurons exhibiting above baseline responses.
Orientation and color of the CRF bar were 0, 45, 90, or 135°, and red or green, respectively. Orientation selectivity of the recorded neurons was determined online by comparing responses to briefly presented achromatic, oriented bars. Red/green selectivity was determined in a similar manner by presenting a bar of preferred orientation with either high L–M contrast (“red”) or low L–M contrast (“green”) in the CRF. The luminance of all bars was held constant at 20 cd/m2.
Behavioral procedures
Monkeys were seated in a primate chair in a quiet room and positioned 57 cm in front of a CRT monitor. Each monkey was trained to fixate a central spot (0.1°, 50 cd/m2) on the display, where visual stimuli were presented. Their gaze remained within a 2° diameter fixation window through all trials during the fixation task and during trials of the delayed saccade task before the saccade cue. Eye position was monitored via a scleral search coil, digitized, and stored at 500 Hz. The spatial resolution of eye position measurement was ≪0.1°. Stimulus presentation, data acquisition, and behavioral monitoring were controlled by the CORTEX system.
Fixation task.
For 137 of 284 of the recorded neurons, visual stimuli were presented during a fixation task. In this task, 300 ms following fixation, a series of five stimulus arrays plus the singleton were presented for 200 ms each, with 100 ms in between presentations (Fig. 1). The presentation order of the arrays and singleton was randomized on each trial. Monkeys were required to maintain fixation throughout the entire 2 s visual stimulus presentation to receive a juice reward, and only correctly completed trials were included in the analyses. The location of each 200 ms stimulus (array or singleton) was updated to reflect the monkey's current eye position within the fixation window. This readjustment effectively fixed the location of the stimulus with respect to the fovea across stimulus presentations and eliminated most of the positional variability caused by fixational eye movements (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
We used a fixation task to study the impact of bottom-up (stimulus-driven) effects of salience on area V4 responses. In all recording sessions, the array stimuli were behaviorally irrelevant; in no condition was the monkey rewarded for responding in any way to the CRF stimulus or the distracters. Thus, during the presentation of each array, there was no task-driven basis for the monkey to attend to any stimulus other than the fixation spot. We avoided the use of a visual search task as there is evidence that such tasks may be suboptimal for assessing bottom-up attention (Prinzmetal and Taylor, 2006) and that the detection of preattentive features (Wolfe and Horowitz, 2004) and the perceptual effects of bottom-up attention can be masked by top-down attention (Joseph et al., 1997; Ipata et al., 2006).
Delayed saccade task.
For a subset of 147 neurons, one stimulus array or the singleton was presented either during fixation or immediately before a saccade to a visual target (0.1°) positioned 7° from fixation in the ipsilateral or contralateral field, and distant (4.5°-12°) from the CRF (see Fig. 1c). In this task, monkeys fixated a central spot for 400–1400 ms, after which the saccade target appeared and the monkey continued to fixate. After 300–600 ms, the fixation spot disappeared and the monkey was rewarded for saccades to the target. On half the trials, appearance of the visual stimulus (array or singleton) occurred before the cue to saccade (fixation spot offset), either before (400–600 ms; n = 111) or after (200–400 ms; n = 36) target presentation. On the other half of trials, appearance of the visual stimulus occurred simultaneously with the cue to saccade. Thus, in the latter half of trials, neuronal responses were measured during the preparation of saccades to non-CRF targets. These two conditions were randomly interleaved throughout each experimental session. Single-feature popout (color and orientation) was tested in this paradigm for 113 of the 147 neurons.
Analyses
Comparison of responses to visual stimuli were made based on means calculated from a 265 or 100 ms time window beginning 60 ms after the onset of each stimulus. Each neuron's activity was normalized by dividing by the largest average response to any of the six stimuli for that neuron. Normalized means were used to compute response differences or the popout index (PI). The PI was computed as (responsepopout − responseconjunction)/(responsepopout + responseconjunction), where popout was the response to either “color popout,” “orientation popout,” or “combined popout” arrays. Statistical comparisons of normalized responses were made using nonparametric statistics (e.g., Wilcoxon signed-rank test, Friedman test). For PIs, the distributions were generally normal, and thus comparisons and tests of main effects were performed with parametric statistics (e.g., ANOVA, t test). A criterion level of p < 0.05 was used in all statistical analyses. Popout modulation latencies were computed in two ways. First, we used a sliding window of 15 ms (1 ms steps) to compare responses to the two conditions for the population using a Wilcoxon signed-rank and a criterion level of p < 0.05. Second, we used a standard Poisson spike train analysis (Legéndy and Salcman, 1985; Maunsell and Gibson, 1992; Sheinberg and Logothetis, 2001, Bisley et al., 2004). A “surprise” index [SI = −log(P)] was computed using summed spikes from a sliding time window of 100 ms, with 1 ms steps. If the surprise index surpassed a threshold of 2 (i.e., p ≤ 0.01), and the mean index value remained ≥2 for the following 100 ms, that latency was flagged. Neurons for which latencies could not be reliably estimated were removed from this analysis. The distributions of visual onset and population modulation latencies for individual neurons were compared using a Wilcoxon signed-rank test and a criterion level of p < 0.05. The Fano factor of each neuron's response, i.e., the variance of spike count across trials divided by the mean response (Mitchell et al., 2007), was computed from spike counts during the 60–325 ms time window (i.e., for 265 ms) for each condition. For a subset of neurons (n = 81), color and orientation selectivity were measured by a standard selectivity index, computed as (responsemax − responsemin)/(responsemax + responsemin), where responsemax and responsemin correspond to the most and least effective of two colors or four orientations, respectively.
Responses to stimuli in the delayed saccade task were measured in 100 ms time windows beginning 60 ms after stimulus onset in both the fixation and presaccadic conditions. Thus, in both conditions responses were measured up to 160 ms following stimulus onset. In the presaccadic condition, only trials in which the saccade occurred >130 ms of stimulus onset (and fixation spot offset) were included in the dataset. Therefore, no more than 30 ms of the postsaccadic response was included. Since V4 neurons have visual latencies >45 ms (Maunsell and Gibson, 1992), this cutoff ensured that the eye movement did not contaminate the visual response. Normalized means and PI were calculated from this time period as described above. Differences between popout and conjunction responses were calculated in both the fixation and presaccadic time periods in 50 ms sliding windows, with 10 ms steps. Comparison of the eye position during the 100 ms analysis period between the fixation and presaccadic conditions confirmed that the stimulus locations within the CRF were identical (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Results
Popout modulation of V4 neurons
We studied the influence of bottom-up attention on a total of 284 neurons in macaque area V4 of two monkeys. We first measured the effects of varying surround feature discontinuities on the responses of 137 V4 neurons to a constant CRF stimulus (Fig. 1a). Arrays of colored, oriented bars were presented in sequence on a CRT display while a monkey maintained fixation on a central fixation spot (Fig. 1b). In each array, one of the bars appeared in the CRF of a single V4 neuron. The bar in the CRF differed in either color (color popout), orientation (orientation popout), or both (combined popout) from the surrounding bars, or it was made up of a unique conjunction of features present in the surrounding bars (“conjunction”). By presenting the stimulus arrays during the central fixation task, we could measure the stimulus-driven effects of varying context. In all recording sessions, the array stimuli were behaviorally irrelevant; in no condition was the monkey rewarded for responding in any way to the CRF stimulus or the surrounding items. Thus, there was no goal-driven basis for the animal to attend to the CRF stimulus or any other stimulus in the display.
Our stimulus arrays were based on those used previously to test for differences in popout and conjunction responses of area V1 neurons (Hegdé and Felleman, 2003). That study found that although popout stimuli were psychophysically more salient than conjunction stimuli, V1 responses did not distinguish between the two. We found that, unlike V1 neurons, V4 neurons reliably distinguished between popout and conjunction stimuli. In an initial population of 137 neurons recorded in two monkeys, the mean response to CRF stimuli in the combined popout array was significantly greater than the mean response to stimuli in the conjunction array (p < 10−8, Wilcoxon signed-rank test). The mean responses to conjunction and homogenous arrays were statistically indistinguishable (p > 0.25, Wilcoxon signed-rank test). To quantify the enhancement to popout stimuli (“popout modulation”) and facilitate comparison with previously reported effects of top-down attention (Luck et al., 1997), we computed a popout index [PI = (responsepopout − responseconjunction)/ (responsepopout + responseconjunction)] for each neuron (Fig. 2a). The mean PI for combined popout was 0.075 (p < 10−7, Wilcoxon signed-rank test), which corresponds to a ∼16% response enhancement. The selectivity of individual neurons for color or orientation was largely uncorrelated with popout modulation (supplemental Fig. 2, available at www.jneurosci.org as supplemental material). There was a significant positive correlation between orientation selectivity and orientation popout (p < 0.0045), but otherwise feature selectivity was not predictive of popout modulation.
For a subpopulation of neurons (76 of 137), we were able to subtract a baseline level of activity measured with popout and conjunction arrays presented in the absence of the CRF stimulus from the popout and conjunction responses (Fig. 2a) (see Materials and Methods). This allowed us to subtract out any direct influence of the surround stimuli on V4 responses. We computed an additional PI from the surround-subtracted responses and found that the popout modulation was similar to the modulation observed in non-surround-subtracted responses (mean PI = 0.085, p < 0.0045, Wilcoxon signed-rank test). In addition, to control for any habituation or sensitization to the CRF stimulus due to the rapid presentation of array stimuli, we tested whether the popout modulation depended on when the popout and conjunction arrays appeared in the array sequence. We computed the PI for responses to each of the six flashes in the array sequence and found no effect of sequence position on the PI (p > 0.55, Friedman test).
Timing and onset of popout modulation
We examined the latency of popout modulation in our population of V4 neurons. First, using a statistical comparison (see Materials and Methods) between combined popout and conjunction responses for the full population, we found that popout modulation emerged within 105 ms of stimulus onset (Fig. 2b). Second, we used a Poisson surprise index (Fig. 2c) (see Materials and Methods) to measure both the visual onset and popout modulation latencies for individual neurons in the population. The median latency for visual onset was 64 ms, similar to previous reports (Maunsell and Gibson, 1992). In the same neurons, popout modulation emerged at a median 85 ms from stimulus onset. For individual neurons, the median popout modulation was 24 ms later than the visual onset response (p < 10−8, Wilcoxon signed-rank test). In addition, we used a support vector machine “classifier” to measure the latency at which the population response could distinguish between popout and conjunction arrays (see supplemental Fig. 3 and supplemental Methods, available at www.jneurosci.org as supplemental material). With this analysis, we found that the popout modulation emerged within ∼100 ms of stimulus onset. The results of these disparate three analyses were fairly similar.
Response variability
Popout modulation did not appear to alter the variability of V4 responses. A recent study found that top-down attention reduces visual response variability of V4 neurons (Mitchell et al., 2007). Likewise, response variability is decreased during the preparation of saccades to CRF stimuli (Moore and Chang, 2009). Using a similar measure of response variability, namely, the Fano factor (see Materials and Methods), we looked for a similar reduction in the variability of responses to popout versus conjunction arrays (Fig. 2d). Unlike the effects of top-down attention and saccade preparation, we found no significant difference in response variability between the two array types (p > 0.25, Wilcoxon signed-rank test). Thus, by this measure, popout modulation appeared to differ from the effects of top-down attention.
Effect of array size and number of features on popout modulation
Popout modulation was also dependent on the number of items in the array surrounding the CRF (Fig. 3). We varied the number of items in the stimulus arrays, as well as the popout type (i.e., color, orientation, and combined) in a subpopulation of 82 neurons. Popout modulation was present for the largest array size (49 items) in all three popout types (color, mean PI = 0.054, p < 0.002; orientation, mean PI = 0.053, p < 0.008; combined, mean PI = 0.081, p < 10−4; Wilcoxon signed-rank test). However, only the combined popout stimulus yielded significant modulation with 25 or fewer items (25 item: combined, mean PI = 0.039, p < 0.005; color, mean PI = 0.016, p > 0.1; orientation, mean PI = 0.012, p > 0.4; 9 item: combined, mean PI = 0.028, p < 0.035; color, mean PI = −0.007, p > 0.95; orientation, mean PI = 0.009, p > 0.4; Wilcoxon signed-rank test). Collapsing across popout types, there was a significant main effect of array size on the popout index (p < 0.007, ANOVA), and post hoc tests revealed that the modulation for the 49-item array was significantly greater than the modulation for the 9-item array (9 vs 49: p < 0.01, Tukey–Kramer) (Fig. 3b). This finding is consistent with the observation that differences in visual search times are sometimes found only for a sufficiently large number of distracters (Treisman and Gelade, 1980; Hegdé and Felleman, 2003). Popout modulation (i.e., the popout index) could depend on array size via an increase in response suppression for conjunction stimuli with larger arrays, a decrease in response suppression for popout stimuli with larger arrays, or both. Although there is a trend toward a significant increase in popout responses with larger arrays (popout, p > 0.06, Friedman test), a significant main effect of array size was observed only for conjunction responses (Fig. 3c). Increasing array sizes reduced the response to conjunction stimuli (conjunction, p < 0.025, Friedman test).
Importantly, the three array sizes differed only in the addition of stimuli to locations distant from the CRF, while items near the CRF remained the same (Fig. 3a). Thus, distracters that were clearly distant from the CRF contributed critically to the popout modulation. For example, the stimuli added to the 25-item array to make the 49-item array were presented in different quadrants from the CRF, and at a minimum of 1.3 CRF diameters from the CRF's nearest edge (see Materials and Methods). The observation of a distant non-CRF modulation, as well as the apparent dependence of that modulation on the extent of non-CRF visual stimulation, is consistent with previous reports (Schein and Desimone, 1990; Desimone et al., 1993; Pigarev et al., 2001).
Top-down limitations of popout modulation
A classic view of visual search is that elementary features such as color and orientation require minimal attentional resources to be detected, and that targets defined by those features (i.e., popout stimuli) can be selected in parallel or preattentively (Treisman, 1985; Braun and Sagi, 1990; Wolfe, 1994). This view seems to have motivated the assumption that the selection of popout stimuli by the visual system is accomplished largely via feedforward mechanisms and at early stages of visual processing (Knierim and van Essen, 1992). Our results thus far are consistent with this view. However, as mentioned, while sensitivity to feature discontinuities is clearly present in V1, popout modulation (as defined by differential responses to popout and conjunction stimuli) is absent (Hegdé and Felleman, 2003). Furthermore, psychophysical studies have shown that the localization of popout targets is impaired when top-down attention is engaged at fixation (Joseph et al., 1997; Einhäuser et al., 2008). This evidence runs counter to the notion that bottom-up selection has an unlimited capacity. Instead, bottom-up selection may rely on a limited neural resource that is shared with top-down attention. If this is true, then the popout modulation we observe in V4 should not persist when attention is engaged at non-CRF locations. Instead, we would expect it to be reduced or eliminated if top-down attention limits available attentional resources.
We tested the independence of bottom-up response modulation of V4 neurons from top-down mechanisms by comparing popout modulation during fixation and during the preparation of visually guided saccades. A wealth of evidence exists that links the preparation of saccades and the deployment of top-down spatial attention (Sheliga et al., 1994; Kowler et al., 1995; Kustov and Robinson, 1996; Schafer and Moore, 2007). Furthermore, recent neurophysiological studies demonstrate a causal role of saccade-related mechanisms, namely, the frontal eye fields (FEFs) and the superior colliculus (SC), in directing covert spatial attention (Moore and Fallah, 2001; Cavanaugh and Wurtz, 2004; Müller et al., 2005). Consistent with these studies is the fact that it is difficult or impossible to attend to nontarget locations during saccade preparation (Hoffman and Subramaniam, 1995; Deubel and Schneider, 1996). We took advantage of the apparent coupling of saccades and spatial attention by probing popout modulation during trials in which monkeys prepared saccades to targets distant from the CRF. We reasoned that if popout modulation in V4 is independent of top-down attention, then it should be unaffected by saccade preparation.
In a second population of 147 neurons, we measured popout modulation during fixation as well as during saccade preparation in interleaved trials using a delayed saccade task (Fig. 1c). In this task, the monkey was rewarded for making saccades to targets presented at locations distant from the CRF. However, on a given trial, the stimulus array was presented either before a cue to saccade (fixation condition) or during saccade preparation (presaccadic condition). For both fixation and presaccadic conditions, popout modulation was measured during 100 ms of the visual response to the stimulus array. (Saccadic reaction times necessitated a smaller analysis window; see Materials and Methods). Thus, the visual stimulation in both conditions was physically identical (supplemental Fig. 1, available at www.jneurosci.org as supplemental material). The two conditions differed only in the degree to which the monkey was preparing a saccade to a non-CRF location. Confirming the identical visual stimulation in the fixation and presaccadic conditions was the fact that responses of neurons to a single bar stimulus were statistically indistinguishable between the fixation and presaccadic conditions (mean normalized responses: fixation = 0.813, presaccadic = 0.812; p > 0.7, Wilcoxon signed-rank test). This observation also rules out the occurrence of any substantial CRF shifts during saccade preparation, which have been observed previously in V4 in a paradigm in which saccades are made to targets near the CRF (Tolias et al., 2001).
For a subpopulation of highly popout-modulated neurons (n = 49, combined popout), the mean PI was 0.23 (p < 10−8, Wilcoxon signed-rank test) during the first 100 ms of the visual response in the fixation condition (Fig. 4a). However, these neurons failed to exhibit modulation in the presaccadic condition (mean PI = 0.003, p > 0.95, Wilcoxon signed-rank test). Furthermore, the mean presaccadic PI for these neurons was significantly less than that of the entire population during fixation (p < 0.007, Wilcoxon signed-rank test). For the full population of neurons there was significant modulation for combined popout during fixation (n = 147, mean PI = 0.046, p < 0.003, Wilcoxon signed-rank test) (Fig. 4b) despite the much smaller response time window used in the delayed saccade task (100 ms versus 265 ms). However, this modulation disappeared during saccade preparation (mean PI = −0.005, p > 0.7, Wilcoxon signed-rank test). Furthermore, there was a significant reduction in modulation from the fixation to the presaccadic condition (p < 0.02, Wilcoxon signed-rank test). The elimination of popout modulation during saccade preparation was observed for color popout as well. During fixation, there was significant modulation of responses by color popout (n = 113, mean PI = 0.057, p < 0.02, Wilcoxon signed-rank test), but not by orientation popout (n = 113, mean PI = 0.021, p > 0.4, Wilcoxon signed-rank test). Yet, as with combined popout, modulation by color popout disappeared during saccade preparation (mean PI = −0.031, p > 0.35, Wilcoxon signed-rank test). Furthermore, the reduction in color popout modulation from the fixation to the presaccadic condition was also significant (p < 0.03, Wilcoxon signed-rank). Thus, the modulation of V4 neurons by popout feature discontinuities was eliminated during saccade preparation.
Discussion
The results of this study demonstrate that although area V4 neurons are selective for the type of feature discontinuities that define popout, this selectivity is limited by the deployment of top-down attention. V4 neurons exhibit enhanced responses to CRF stimuli in popout arrays compared to conjunction (and homogenous) arrays. The popout modulation appeared within ∼100 ms of stimulus onset and ∼25 ms after the visual onset response. Unlike what has been reported for top-down modulation (Mitchell et al., 2007; Moore and Chang, 2009), we found that popout modulation was not accompanied by an increase in response reliability, a result that may be related to the observed differences in the psychophysical effects of bottom-up and top-down attention (Ling and Carrasco, 2006). The magnitude of the popout modulation increased with the number of array items and the number of features that define the popout stimulus. The increased popout modulation with large arrays of items (i.e., more surrounding items) observed in this study demonstrates the substantial spatial extent of non-CRF influences on V4 responses and dovetails with the increased perceptual benefit (e.g., shorter reaction times) in popout search with larger set sizes (Treisman and Gelade, 1980). Similarly, the maximal popout modulation for combined features, as opposed to single features, is consistent with visual search effects (Krummenacher et al., 2002; Weidner and Müller, 2009). In addition, consistent with evidence of an influence of top-down attention on the perception of popout and the detection of preattentive features (Joseph et al., 1997), we found that popout modulation was limited by top-down attentional demands. During the preparation of saccades to non-CRF targets, which confines top-down attention to those targets, popout modulation was eliminated. Thus, popout modulation in V4 is top down limited.
The enhancement of V4 responses to popout stimuli suggests that the physical salience of the stimuli automatically draws attention to the CRF during fixation when top-down attentional resources are available. Since monkeys were not trained to attend to the CRF stimulus, the modulation is, by definition, stimulus driven rather than task driven. Additionally, in contrast to one of the signatures of top-down attention, in which the variability of visual responses to the attended stimuli is reduced (Mitchell et al., 2007; Moore and Chang, 2009), we found no changes in response variability. More importantly, the popout modulation was more robust when larger numbers of surrounding items, and multiple features, were included in the display, and it was absent when only a few items were presented outside the CRF. Thus, the difference in V4 responses to popout and conjunction stimuli parallels the increased perceptual benefit of popout with larger numbers of distracters during visual search (Treisman and Gelade, 1980; Hegdé and Felleman, 2003). The absence of popout modulation observed with small arrays containing only distracters near the CRF demonstrates that the popout modulation relied on an influence from well beyond the CRF and thus was not due to direct stimulation by near-CRF stimuli. Thus, unlike V1 neurons (Hegdé and Felleman, 2003), V4 neurons appear to be selective for popout.
An important consideration is whether our behavioral task and method of stimulus presentation were optimal for measuring bottom-up effects in visual cortex. While many neurophysiological and imaging studies have used passive fixation tasks to probe bottom-up attention (Knierim and van Essen, 1992; Lamme, 1995; Nothdurft et al., 1999; Hegdé and Felleman, 2003; Beck and Kastner, 2005; Constantinidis and Steinmetz, 2005), other studies in higher-order cortical areas have used visual search tasks (Bichot et al., 2005; Thompson et al., 2005; Ogawa and Komatsu, 2006; Buschman and Miller, 2007). Although popout and conjunction stimuli are often used in the context of a visual search task, some have argued that this task may not be the optimal way to measure purely bottom-up effects. For example, one psychophysical study found that detection measured in tasks that merely required subjects to report the first perceived stimulus is more reliably influenced by stimulus salience than detection in visual search tasks (Prinzmetal and Taylor, 2006). This study and other psychophysical (Joseph et al., 1997) and neurophysiological (Ipata et al., 2006) studies suggest that the interaction of bottom-up and top-down attention during visual search can lead to a diminution of bottom-up attention effects.
Psychophysical tests of an independence of bottom-up from top-down attention have yielded conflicting evidence. In addition to the earlier assumptions of the preattentive basis of bottom-up attention, some psychophysical observations suggest that this is indeed the case. Braun and Sagi (1990) reported that subjects' detection of popout targets was not impaired during concurrent performance of a form recognition task. Likewise, Braun and Julesz (1998) argued that detection of popout targets carries little or no attentional cost. However, other studies appear to contradict these findings. Joseph et al. (1997) had subjects perform a rapid serial visual presentation (RSVP) task at fixation while concurrently performing a peripheral popout detection task. Subjects were severely impaired at detecting popout stimuli in the dual-task condition compared to the detection task alone. These authors suggested that the task difference may be responsible for the contrary result, presumably due to the greater attentional demands of the RSVP task. A similar result was found in a visual search task. While subjects free-viewed photographs, Einhäuser et al. (2008) incrementally changed the contrast of one side of the photograph. This change in contrast, and change in salience, resulted in a bias of the subjects' saccade endpoints toward the higher-contrast side. However, this bias was eliminated when subjects were given the task of locating a target embedded in the photograph. Furthermore, subjects were in fact biased toward the low-contrast side of the photograph if the target was consistently embedded there, thus reversing the bottom-up bias observed in free-viewing. Together, these psychophysical results indicate that, at least under some conditions, top-down attention can override the influence of stimulus-driven salience. This fact indicates that bottom-up attention is not independent of top-down attention and suggests that perhaps neither are its neural correlates.
If it is true that psychophysically defined bottom-up attention is limited by top-down attention, then how this fact can be understood in terms of what is known about the neural mechanisms of visual attention is an important question. Recent studies indicate that the mechanisms underlying saccade programming provide a source of attentional selection of visual representations (Moore, 2006). For example, several studies have shown that microstimulation in oculomotor structures such as the FEF and the SC results in an attentional benefit at spatial locations that correspond to the site of stimulation (Moore and Fallah, 2001; Cavanaugh and Wurtz, 2004; Müller et al., 2005). In addition, inactivation of the FEF (Wardak et al., 2006) and the SC (McPeek and Keller, 2004) results in deficits of attention and target selection, while microstimulation of the FEF has been shown to drive attention-like changes in visual cortex (Moore and Armstrong, 2003; Armstrong and Moore, 2007; Ekstrom et al., 2008). This causal evidence is complemented by recent studies showing correlates of attention in saccade-related structures including the FEF (Kastner et al., 1999; Buschman and Miller, 2007), the SC (Ignashchenkova et al., 2004), and the lateral intraparietal area (LIP) (Bisley and Goldberg, 2003). Collectively, these studies suggest a model in which top-down signals code the direction of intended eye movements and simultaneously alter visual cortical responses to spatially corresponding stimuli (Moore et al., 2003). Since neurons in at least the FEF and the LIP have been shown to be selective for popout (Buschman and Miller, 2007), it should follow that demands on top-down deployment would limit their capacity to maintain a bottom-up salience map. Indeed, it has been shown that task demands override popout modulation in the LIP (Ipata et al., 2006). Thus, in terms of potential sources of attentional modulation, it seems likely that bottom-up and top-down attention effects would rely on shared mechanisms, at least at the level of their suspected origins.
It may be convenient to equate bottom-up and top-down influences with feedforward and feedback projections within the visual system, respectively. However, this is probably incorrect. For example, although the FEF and the LIP are considered to be later stages of the visual cortical hierarchy than V4 (Felleman and Van Essen, 1991; but see Ungerleider et al., 2008), the visual responses of neurons in the former areas have been reported to have shorter latencies (Schmolesky et al., 1998; Bisley et al., 2004). Thus, the popout modulation we observe in V4 may well originate from these putatively higher cortical areas and propagate via feedback connections. Indeed, the role of feedback in shaping low-level visual cortical response properties has received much focus in recent years. For example, although it was originally believed that surround modulation of responses in V1 is primarily due to long-range, horizontal connections within the same area (Gilbert, 1998), more recent evidence suggests that this modulation propagates too quickly to be accounted for by horizontal connections (Bair et al., 2003). Instead, more rapid feedback inputs from neurons with larger CRFs at later cortical stages appear more likely to play a role (Hupé et al., 1998). The lack of popout modulation in V1 is consistent with the conclusion that the popout modulation we observe in V4 either originates from feedback connections or is generated within V4. That is, this quintessentially bottom-up property may not rely purely on feedforward mechanisms. Therefore, the neural circuit implications of the bottom-up/top-down dichotomy may be misleading. Instead, the terms “exogenous” and “endogenous,” which merely specify whether a stimulus is selected by virtue of its physical characteristics or by some internally defined goal, respectively, may be more accurate. On the other hand, because sensitivity to the popout feature discontinuity requires selectivity for the component features, it seems unlikely that V4 neurons completely inherit popout modulation from higher areas, areas which are known to exhibit very little feature selectivity (Sereno and Maunsell, 1998; Peng et al., 2008). Rather it seems that V4 neurons could correspond to the so-called feature map component of models of salience (Treisman and Sato, 1990; Wolfe, 1994; Itti and Koch, 2000), while parietal and frontal areas comprise the salience map (Bichot and Schall, 2002; Constantinidis and Steinmetz, 2005; Balan and Gottlieb, 2006; Ipata et al., 2006; Buschman and Miller, 2007). Thus, in a direct comparison, one might expect shorter popout (vs conjunction) modulation latencies in V4 compared to salience map areas when popout is defined by orientation and equiluminant color, given that the detection of the feature discontinuity is time limited by orientation and color selectivity. Color selectivity alone requires a minimum of ∼68 ms to appear in V1 (Cottaris and De Valois, 1998). It would seem then that the degree to which top-down mechanisms can influence bottom-up attention may depend on the features that define the latter.
A revised model of salience in visual cortex
Soltani and Koch (2008) recently modeled the computation of salience in visual cortex and reported results that are complementary to our experimental observations. This biophysically plausible spiking network model includes layers of orientation and color selective neural populations, with V1, V2, and V4 each represented by different layers. When individual features were processed by separate populations within each layer, popout modulation was absent or nearly absent in the V1 layer for single-feature and combined-feature popout arrays, respectively. However, popout modulation emerged strongly in the V4 layer of the model for both single and combined-feature popout arrays. These results closely mirror the experimental findings in V1 (Knierim and van Essen, 1992; Hegdé and Felleman, 2003) as well as our findings in V4. The model also showed that the difference in response to popout and conjunction arrays emerged early in the V4 layer relative to the V1 layer of the model, and that the model PIs for the single- and combined-feature popout arrays were qualitatively similar to those measured in our dataset. Finally, the inclusion of saccade preparation in the model during the presentation of the stimulus arrays resulted in the reduction of the PI relative to when the arrays were presented to the model in the absence of a saccade. This result is consistent with our findings and suggests that top-down attention influences the salience computation already at the level of feature maps.
Footnotes
This work was supported by National Institutes of Health Grant EY014924, the Pew Charitable Trusts, the McKnight Foundation, and National Research Service Award F31MH081500 to B.E.B. We thank D. S. Aldrich for technical assistance, S. X. Xian for equiluminant color protocols, R. J. Schafer for help with all analyses, and N. A. Steinmetz for the Fano factor analysis.
- Correspondence should be addressed to Tirin Moore at the above address. tirin{at}stanford.edu