Many visual tasks require deployment of attention to multiple objects or locations. We used functional magnetic resonance imaging and behavioral experiments to investigate the relative processing efficiency of two putative attentional mechanisms for performing such tasks: the “zoom lens” and “multiple spotlights.” Two key questions were investigated: (1) does splitting the spotlight into multiple foci incur an overhead cost that diminishes the efficacy of attention compared with the zoom lens, and (2) does splitting the spotlight provide a benefit relative to the zoom lens by conserving attention resources that otherwise would be directed to task irrelevant stimuli? For both mechanisms, attending to multiple object locations decreased processing efficiency at a single location, resulting in both decreased behavioral performance and decreased blood oxygenation level-dependent (BOLD) signal attentional modulation. When the two mechanisms attended to multiple objects across the same spatial extent, the multiple spotlight mechanism, which ignores intervening stimuli, yielded better performance and higher BOLD signal. When the two mechanisms processed the same number of stimuli, splitting the spotlight neither impaired performance nor diminished BOLD signal in occipital cortex. The surprising efficiency of the multiple spotlight mechanism supports the emerging view that spatial attention is easily deployed in a diverse range of spatial configurations.
Most previous visual attention studies have investigated the deployment of attention to a single object or location; however, many tasks require the simultaneous processing of multiple stimuli. Such tasks include tracking of multiple objects (Pylyshyn and Storm, 1988; Scholl et al., 2001), perceptual comparisons (Pashler, 1998; Awh and Pashler, 2000; Muller et al., 2003a; McMains and Somers, 2004), and wide-field target detection (Castiello and Umilta, 1990). Three main mechanisms of multiple object selection have been proposed: the “zoom lens,” the “rapidly moving spotlight,” and “multiple spotlights.” Here, the relative processing efficiency of the zoom lens and multiple spotlight mechanisms is compared. To focus on these two mechanisms, stimulus presentation rates were set to exceed the speed limit of the moving spotlight (Weichselgartner and Sperling, 1987; Peterson and Juola, 2000).
The zoom lens mechanism selects multiple object locations by expanding a single attentional spotlight (Eriksen and Yeh, 1985; Eriksen and St. James, 1986). Because the total amount of available attention is limited (Eriksen and St. James, 1986), the zoom lens mechanism predicts a tradeoff between the size of the attended region and processing efficiency. A decrease in processing efficiency with an increase in the attended area has been observed in both behavioral (Eriksen and St. James, 1986; Castiello and Umilta, 1990) and functional magnetic resonance imaging (fMRI) studies (Muller et al., 2003b); however, there are two major shortcomings of the zoom lens mechanism. First, attention is needlessly deployed to spatially intervening, task-irrelevant locations. Second, distractor stimuli at intervening locations are selected inadvertently and may interfere with task performance.
The multiple spotlight mechanism simultaneously selects spatially distinct locations and ignores intervening regions (Awh and Pashler, 2000; Muller et al., 2003a; McMains and Somers, 2004), thus avoiding the problems caused by selecting distractor stimuli. Although the ability to divide attention into multiple spotlights has been well demonstrated in tasks requiring exclusion of an intervening distractor (Awh and Pashler, 2000; Muller et al., 2003a,b; McMains and Somers, 2004), it remains unclear whether this represents a general-purpose selection mechanism. We approach the question of the generality of multiple spotlight selection from a utilitarian stance. We suggest that multiple spotlight selection will likely be used whenever it is efficient to do so; therefore, we have performed an analysis of processing efficiency to infer the utility and, by extension, the generality of multiple spotlight selection. Processing efficiency was measured both behaviorally and with fMRI. The premise of our experiments is that decreased behavioral performance reflects decreased efficiency. Similarly, decreased amplitude of attentional modulation of the blood oxygenation level-dependent (BOLD) signal in retinotopic visual cortex reflects decreased neural processing of selected stimuli.
Similar to the zoom lens mechanism, multiple spotlight selection is predicted to exhibit a tradeoff between the size or number of the attended regions and processing efficiency. Processing of a stimulus at a particular location is expected to become less efficient as the number of attended object locations increases. In addition to testing this prediction, we also tested two hypotheses regarding how the processing efficiency of the multiple spotlight mechanism might differ from that of the zoom lens. We hypothesize that splitting the attentional spotlight might be attentionally demanding, incurring an “overhead cost” that impairs performance and diminishes brain activity modulation. We further hypothesize that multiple spotlight selection might obtain a “resource conservation” benefit relative to the zoom lens, because attention need not be deployed to the intervening, task-irrelevant regions of space. Our experiments reveal a resource conservation benefit, but not an overhead cost, for multiple spotlight selection.
Materials and Methods
Subjects. Ten healthy volunteers (three women) participated in the fMRI study, and 10 (four women) volunteers participated in the psychophysical study (six subjects overlapped). Informed consent was obtained from each subject in writing [Massachusetts General Hospital (Boston, MA) Institutional Review Board (IRB) assurance #FWA00003136; Boston University IRB file #1040E]. Two fMRI subjects were excluded because of failure to adequately maintain fixation.
Stimuli. Stimuli consisted of letters displayed for ≤173 ms in rapid serial visual presentation (RSVP). Five RSVP streams were displayed simultaneously. The letter height subtended a visual angle of 0.6° in the central RSVP stream and 1.1° in the peripheral streams. One “peripheral” stream was placed in each visual field quadrant centered 3.6° diagonally from central fixation. Both experiments used identical stimulus configurations.
Trials consisted of a 2 s target detection phase followed immediately by a 1.5 s response phase signaled by a series of the letters X and O (Fig. 1a). RSVP streams ran without interruption across trials (intertrial interval, 3.5 s). Subjects maintained fixation on the central RSVP stream while covertly monitoring zero, one, or two peripheral RSVP streams. On 50% of trials, one of two target letters (S or K) appeared in one of the attended streams. A yes-no target detection response was elicited. Unattended streams contained distracting target letters and provided no information about trial type. Trial blocks lasted 40 s, with a 3 s attentional cue at the beginning of each block.
Psychophysical task and analysis. Four different attentional conditions were tested (Fig. 1b), attending to the following: a single peripheral RSVP stream (SPOT); two adjacent streams (ZOOM2); two nonadjacent peripheral streams (MULTI2); or three adjacent streams (ZOOM3). These conditions were performed for two spatial configurations: those shown in Figure 1b and a 180° rotation of these conditions. Because the MULTI2 and ZOOM3 conditions are unchanged with this rotation, only six conditions were required. Trial blocks (40 s) were followed by rest periods (5 s). Blocks were counterbalanced across runs, and each block was performed 18 times. The RSVP stream rate was adjusted to keep performance in the SPOT conditions at ∼85%. The average adjusted percentage correct [(correct - incorrect)/100] for each condition was entered into an ANOVA with two factors: condition (SPOT, ZOOM2, ZOOM3, or MULTI2) and subject. Post hoc analysis was performed with Fisher's protected least significant difference (PLSD).
Eye movement controls. Subjects were required to maintain central fixation throughout all conditions. During training sessions, subject eye position was monitored (Viewpoint; Arrington Research, Scottsdale, AZ), and auditory feedback was given when fixation was broken. The ability to maintain fixation was a key requirement for the completion of subject training. Fixation maintenance during fMRI experiments was verified by post hoc examination of the retinotopic patterns of activation in differing attentional conditions. The retinotopic representations of the stimuli during central fixation were revealed by the comparison of passive viewing and blank fixation conditions. If subjects exhibited patterns of attentional modulation that deviated from the stimulus locations under central fixation, the subjects were deemed to have insufficiently held fixation. Two subjects were excluded on these grounds.
fMRI data acquisition and analysis. Each subject participated in two or more scan sessions with a 3-T Allegra magnetic resonance imager (Siemens Medical Systems, Erlangen, Germany) at the Martinos Center for Biomedical Imaging at Massachusetts General Hospital. Cortical hemispheric surfaces were unfolded and flattened (Dale et al., 1999; Fischl et al., 1999, 2001) with standard anatomical scanning parameters (McMains and Somers, 2004). Retinotopic visual field representations of polar angle and eccentricity were also mapped with standard techniques (Sereno et al., 1995; Engel et al., 1997; Wade et al., 2002) to identify five visual cortical regions (V1, V2, V3, V3A, and hV4).
fMRI experiments consisted of the six conditions used previously in the psychophysical study plus four baseline conditions: attention to the fovea (SPOT); attention to the lower left stream (AWAY); passive viewing (PASSIVE); and FIXATION (no RSVP streams). Each block was followed by a 10 s fixation period, and trial blocks were counterbalanced across runs. Subjects performed six to nine runs (echo time, 45 ms; repetition time, 2000 ms; 30 slices; in-plane resolution, 2.65 × 2.65 mm; 3.3 mm slices; scan duration, 8 min, 40 s) of the attentional scans (46,080-69,120 images per subject). Motion correction (Cox and Hyde, 1997) and intensity normalization were performed before signal averaging (FS-FAST; CorTech Labs, La Jolla, CA).
Region of interest (ROI) analysis was used to investigate BOLD signal changes at a retinotopic location as attentional configuration and selection mechanism changed. Experimental conditions were designed so that all critical comparisons could be made within a single retinotopic ROI, the one corresponding to the RSVP stream attended in the SPOT condition. This analysis was repeated for the 180° rotated SPOT condition, and the symmetric data sets were combined. ROIs were defined retinotopically from separate localizer and retinotopic mapping scans. Localizer scans functionally identified ROIs corresponding to the visual cortical retinotopic representations of each RSVP stream (alternating 16 s blocks of the central RSVP stream and the four peripheral streams).
The ROIs corresponding to the top left and bottom right RSVP streams are the critical ROIs (Figs. 1b, 2). Eccentricity and polar angle functional maps were used to subdivide these two ROIs on the basis of cortical area (V1, V2, V3, V3A, and hV4). Percentage signal change data, measured relative to the average activation level during FIXATION, were averaged by block condition (over many runs) to construct time course data for all voxels within a functionally defined ROI. Time points within blocks were averaged, excluding the first 6 s for cue processing and shifting by 4 s for hemodynamic delay, resulting in a single average signal change per condition, region, and subject. These percentage signal change values, relative to PASSIVE, were entered into an ANOVA with three factors: attentional condition (SPOT, ZOOM2, ZOOM3, or MULTI2), visual area (V1, V2, V3, V3A, or hV4), and subject. Post hoc analysis was performed with Fisher's PLSD.
The goal of the experiments was to directly compare the relative processing efficiency of the zoom lens and multiple spotlight mechanisms for attending to multiple object locations. To perform this comparison, it was first necessary to demonstrate that subjects used both forms of selection within the same experimental paradigm.
The processing efficiency of zoom lens selection was analyzed by comparing data from the SPOT, ZOOM2, and ZOOM3 conditions, each of which select a single region but differ in the number of attended RSVP streams. The behavioral data reveal the predicted results: as the zoom lens expands to encompass more RSVP streams, overall performance declines (adjusted percentage correct: SPOT, 83; ZOOM2, 76; and ZOOM3, 70). All comparisons were significantly different: SPOT versus ZOOM2, p < 0.01; SPOT versus ZOOM3, p < 0.001; and ZOOM2 versus ZOOM3, p < 0.05. Similarly, BOLD signal activation within the ROI representing the SPOT location declined as the zoom lens expanded (ROI encompassed all visual cortical areas; see Table 1 for visual area breakdown). fMRI attentional modulation is reported as the percentage signal change of the BOLD signal in an attentional condition relative to PASSIVE. For the zoom lens conditions, the attentional modulation was as follows (Figs. 2, 3): SPOT, 0.35; ZOOM2, 0.29; and ZOOM3, 0.18. Again, all comparisons were significant: SPOT versus ZOOM2, p < 0.05; SPOT versus ZOOM3, p < 0.0001; and ZOOM2 versus ZOOM3, p < 0.0001. These results not only mirror previous behavioral findings (Eriksen and St. James, 1986), but also closely replicate the fMRI zoom lens results reported by Muller et al. (2003b). Behavioral performance during fMRI was similar to that of the full psychophysical experiments (adjusted percentage correct: SPOT, 87; ZOOM2, 75; and ZOOM3, 74) but reflects fewer trials.
Here, as in our previous multiple spotlights study (McMains and Somers, 2004), attention was directed to two distinct target regions separated by a distractor region. This MULTI2 condition yielded attentional enhancement within the ROIs representing the two attended streams (percentage signal change vs PASSIVE; MULTI2, 0.30), whereas the AWAY condition did not (AWAY, -0.003). The MULTI2 activation in these ROIs was significantly greater than the AWAY (t = 4.60; p < 0.01) or PASSIVE (t = 3.59; p < 0.01) conditions. In the fovea, the intervening distractor region, MULTI2 activation did not differ significantly from AWAY (percentage signal change vs PASSIVE: MULTI2, -0.26; AWAY, 0.01; t = 1.69; p = 0.13). Thus, this pattern of attentional modulation captures the key properties of multiple spotlight selection (McMains and Somers, 2004).
Processing efficiency of multiple spotlight selection was analyzed by comparing data from the SPOT and MULTI2 conditions. The SPOT condition served as the single spotlight condition. As expected, decreased behavioral performance (adjusted percentage correct: SPOT, 83 vs MULTI2, 75; p < 0.001) and decreased BOLD signal amplitude within the ROI representing the SPOT RSVP stream (percentage signal change vs PASSIVE: SPOT, 0.35 vs MULTI2, 0.30; p < 0.05) were observed with the addition of a second spotlight. As with the zoom lens mechanism, processing efficiency decreased as the number of attended RSVP streams increased. The behavioral data collected during fMRI scanning revealed less of a difference, but they were also based on many fewer trials (adjusted percentage correct: SPOT, 87%; MULTI2, 85%).
BOLD signal by cortical area and condition
ROIs were subdivided into separate retinotopic ROIs for each visual cortical area. Table 1 shows the average percentage signal change versus PASSIVE for each area and condition. ANOVAs were performed for each visual area and included a condition in which subjects attended to a separate stream (AWAY) to test for significant attentional enhancement for each condition for each visual area. All comparisons were significant (p < 0.05) except ZOOM3 versus AWAY in V1 (p = 0.79) (Table 1). The data for each individual area were entered into an ANOVA with three factors: attentional condition (SPOT, ZOOM2, ZOOM3, or MULTI2), visual area (V1, V2, V3, V3A, or hV4), and subject. This revealed a main effect of attentional condition (F(3,84) = 22.60; p < 0.0001), area (F(4,84) = 59.06; p < 0.0001), and subject (F(7,84) = 126.65; p < 0.0001). Post hoc analyses investigating the main effect of area (collapsed across subjects and conditions) revealed increasing attentional enhancement as one ascends the visual hierarchy (all p < 0.001; except V2 vs V3 and V3A vs hV4, p > 0.05). There was no significant interaction between attentional condition and area (F(12,84) = 1.06; p = 0.40). The general pattern of activation among conditions was similar for all of the visual areas (Table 1); therefore, data were collapsed across visual areas.
Comparing the zoom lens with multiple spotlights
To test the overhead cost and resource conservation hypotheses, two comparisons were made between the multiple spotlight and zoom lens mechanisms. The first analysis compared the MULTI2 and ZOOM3 conditions, which deploy attention over the same spatial extent (distance between the farthest attended targets) but differ in their mechanisms. If multiple spotlight selection were a cumbersome and infrequent form of selection, then one would expect decreased efficiency compared with the zoom lens that would potentially result from an active suppression of the intervening region. This would support the overhead cost hypothesis. Alternatively, multiple spotlight selection might increase processing efficiency compared with the zoom lens because attention is distributed over a smaller area of visual space (the resource conservation hypothesis). Our experiments revealed significantly greater behavioral performance (p < 0.05) in the MULTI2 condition than in the ZOOM3 condition (Fig. 3). Similarly, relative to PASSIVE, the MULTI2 condition produced greater attentional modulation of BOLD signal in the two peripheral ROIs than did the ZOOM3 condition (p < 0.0001). This processing efficiency advantage for dividing the spotlight versus zooming over the same spatial extent supports the resource conservation hypothesis. Evidence of an overhead cost for dividing the spotlight was not observed but could have been masked by the resource conservation benefits.
To increase our chances of revealing an overhead cost for multiple spotlight selection, a second analysis of the zoom lens and multiple spotlight selection mechanisms was performed in which the number of attended object locations was held constant. This analysis compared the MULTI2 and ZOOM2 conditions. Both conditions direct attention to two RSVP streams but use different mechanisms. No resource conservation benefit was expected because the same number of targets was attended in both conditions. Again, no significant overhead cost associated with dividing the spotlight was observed, in terms of either behavioral performance (p = 0.73) or attentional enhancement of BOLD signal within the ROI representing the peripheral RSVP stream (p = 0.76). Similarly, no significant BOLD signal activation differences were observed for any visual cortical area (all p > 0.4).
Our primary goal in these experiments was to directly compare two mechanisms of attentional selection that permit the selection of multiple spatial locations: the zoom lens and the multiple spotlight. The key theoretical difference between these mechanisms is that the multiple spotlight mechanism selects multiple, spatially distinct regions, whereas the zoom lens selects a single contiguous region of space. Our analysis focused on the relative processing efficiency of these two selection strategies. Previous research has quantified attentional influences on processing efficiency in terms of reaction times (Posner et al., 1980), event-related potential magnitudes (Mangun and Buck, 1998), and BOLD signal amplitudes (Muller et al., 2003b). Here, processing efficiency was analyzed in terms of behavioral accuracy, and BOLD signal activation amplitude was analyzed in retinotopic visual cortex.
We observed a decrease in behavioral performance and BOLD signal amplitude as the number of attended streams increased, independent of the attentional mechanism used. The zoom lens results verify previous findings of decreased behavioral (Eriksen and St. James, 1986) and fMRI (Muller et al., 2003b) processing as the size of the attended region increased. The multiple spotlight behavioral results confirm previous findings of decreased behavioral processing as the number of spotlights increased (Castiello and Umilta, 1990; McMains and Somers, 2004). The multiple spotlight fMRI results demonstrate decreased brain activation when attention is directed to more then one spotlight. Previous work suggested (McMains and Somers, 2004), but did not directly demonstrate, this result within a single ROI.
Two key hypotheses about the relative processing efficiency of these mechanisms, the overhead cost hypothesis and the resource conservation hypothesis, were tested in our experiments. Direct comparison of the two mechanisms failed to reveal any significant cost associated with dividing the spotlight. When attention is allocated over the same spatial extent, there is a significant benefit for dividing attention. Any potential cost associated with splitting the spotlight is outweighed by the benefit of spreading attention over less visual space. Even when the two mechanisms selected the same number of RSVP streams, no loss of processing efficiency was observed for splitting the spotlight. No difference in task difficulty was observed in this comparison of the mechanisms. In terms of the expenditure of attentional resources, the rate-limiting factor is the number of object locations attended rather than the spatial extent or the mechanism used. Our experiments provide both behavioral and fMRI evidence supporting the resource conservation hypothesis for multiple spotlight selection but fail to support the overhead cost hypothesis. Given the clear benefits of multiple spotlight attention, we argue from a utilitarian position that multiple spotlight selection is a practical and prevalent form of attentional selection.
Several caveats deserve mention. The current experiments confound attention to objects (letter streams) with attention to spatial locations, so we cannot determine whether the primary factor in determining processing efficiency in retinotopic cortex is the number of attended objects or the number or area of attended locations. Because the present study investigated only the early visual areas, or what are commonly thought of as sites of attention (Kastner et al., 1999; Somers et al., 1999; Culham et al., 2001), we cannot rule out processing differences in the frontoparietal control circuitry involved in directing attention. Further research is required to answer this question. As is the case when interpreting any negative result, further research might reveal some overhead cost; however, any such effect is likely to be relatively small. Also, we note that fMRI lacks the temporal resolution to determine whether the two selection mechanisms differ in how they influence the time course and computations of stimulus processing in early visual cortex. Such questions will need to be addressed with other techniques.
The present behavioral and fMRI results for the zoom lens conditions replicate previous reports (Eriksen and St. James, 1986; Muller et al., 2003b), thus confirming that subjects used a zoom lens strategy in this condition; however, we must note the failure to observe the simple unitary spread of attentional modulation that the zoom lens model suggests. The zoom lens model predicts a relatively uniform spread of attention across both attended stimuli and the intervening regions of visual space that do not contain stimuli; however, this prediction could never be tested fully by purely behavioral studies, which required the use of a probe stimulus to measure attentional spread. The present results (Fig. 2c), much like those of the one previous fMRI study of zoom lens attention [Muller et al. (2003b), their Fig. 3], reveal a landscape of peaks and valleys of attentional modulation across the cortical representation of the visual field. It is unclear how to reconcile these observations with the predictions of the zoom lens model. One suggestion is that the amplitude of the attentional modulation of the BOLD signal depends on the presence of a stimulus; however, previous studies have indicated that a stimulus need not be visible or even present to support strong attentional modulation of the BOLD signal (Kastner et al., 1999; Ress et al., 2000; Culham et al., 2001). Alternatively, the BOLD signal maps may accurately reflect the distribution of spatial attention. This would imply that the zoom lens model is inadequate to explain the complexity of spatial attention even in this relatively simple condition. Additional investigation of this issue is necessary, but our results clearly support the overall view that spatial attention can be easily deployed in a diverse range of spatial configurations.
In summary, we have observed that it is relatively efficient to deploy spatial attention in complex configurations. In terms of both behavior and BOLD signal amplitude in occipital cortex, we observed that splitting the attentional spotlight into multiple foci produced no decrease in efficacy; moreover, splitting the spotlight provided a significant advantage when distractor stimuli separated two stimuli of interest. The classic spotlight and zoom lens models suggest a simple, unitary form of spatial attention. The present data instead support the emerging view (Castiello and Umilta, 1990; Awh and Pashler, 2000; Muller et al., 2003a; Gobell et al., 2004; McMains and Somers, 2004) that the deployment of spatial attention is highly flexible in that it can adapt to task demands to select stimuli and filter out distractors in a diverse range of spatial configurations.
This material is based on work supported by National Science Foundation Grant BCS-0236737. This work was supported in part by National Center for Research Resources Grant 5P41RR14075A05 and the Mental Illness and Neuroscience Discovery Institute. We thank Jascha Swisher for editorial assistance.
Correspondence should be addressed to David C. Somers, Department of Psychology, Boston University, 64 Cummington Street, Boston, MA 02215. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/259444-05$15.00/0