We investigated the interaction between object- and space-based attention by measuring activity in early visual cortex. After central cueing, when subjects directed attention to a spatially defined part of an object, activity in early visual areas was enhanced at corresponding retinotopic representations but also at representations of other locations covered by the object. Different from the assumption of automatic attentional “spreading” within an object, however, activity was greater for representations of cued than of uncued locations on the same object. These findings support an interaction of object-based spatial selection with object-independent spatial mechanisms in directing attention. When the target stimulus did not appear at the expected location, we found higher activation in areas representing other locations on the same object than equidistant locations on other objects. Objects, hence, also guide spatial search, and this may account for the behaviorally observed delay in processing parts of an unattended object.
When attention is directed to a certain location in the visual field, neuronal activity is enhanced in corresponding locations of retinotopically organized early visual cortices. This is a putative substrate of improved behavioral performance for stimuli appearing at the attended location (Tootell et al., 1998; Brefczynski and DeYoe, 1999; Kastner et al., 1999; Martinez et al., 1999; Somers et al., 1999; Müller et al., 2003). However, attention is often directed at certain objects rather than locations (e.g., when one searches for a well known friend in a crowd of people). Support for object-based mechanisms in attentional selection comes from behavioral experiments in which subjects are faster and more accurate at reporting two properties from a single object than from separate objects, although the spatial distance between the properties is the same or even smaller (Duncan, 1984; Baylis and Driver, 1993; Egly et al., 1994; Lavie and Driver, 1996).
Yet it remains unclear how spatial- and object-based attention interact and from which internal representations they select (Vecera and Farah, 1994; Cave and Bichot, 1999; Awh et al., 2001; Scholl, 2001; Shomstein and Yantis, 2002; Yantis and Serences, 2003). One view holds that object-based attention selects from internal object representations that are not space related at all (Duncan, 1984; Vecera and Farah, 1994; O'Craven et al., 1999). We refer to this concept as “space-independent object-based attention.” According to this model, attending an object should not have any impact on early retinotopic visual representations of space. A second theory proposes that attention to objects modulates early sensory, spatially coded representations by selecting the locations covered by an object; in other words, attention “spreads” automatically within the boundaries of an object (Vecera and Farah, 1994; Kramer et al., 1997; Weber et al., 1997; Davis et al., 2000). We refer to this theory as “object-based spatial selection.” This model anticipates enhanced activity in retinotopic areas coding all of the locations covered by an object. A third model assumes that objects merely affect the strategy with which attention is directed to a specific region in a visual scene, giving priority to locations within an already attended object (Moore et al., 1998; Avrahami, 1999; Shomstein and Yantis, 2002). We refer to this model as “object-based search strategy.” This theory predicts that activity “travels” from retinotopic areas representing the primarily attended spatial part of an object to areas representing other parts of the object, but only in case the relevant information is not detected at the primarily attended location.
Here, we tested predictions derived from these theoretical accounts by measuring neural activity, using functional magnetic resonance imaging (fMRI) with blood oxygenation level-dependent (BOLD) contrast, in early visual areas representing defined locations on objects. These locations could either be cued or were equidistant to a cued location while pertaining to either the same object or a different object (Fig. 1a). This task is known to show behavioral effects of both space- and object-based attention (Egly et al., 1994). Here, the crucial test was how early visual activity (V1–V4) at retinotopic representations of uncued locations would behave in response to the spatial cue and during search for an occasional target stimulus, depending on whether or not these locations were on the same object as the cued location (Fig. 1b,c).
Materials and Methods
Five healthy, right-handed students (four females and one male; age, 21–30 years) with normal color vision and visual acuity were paid for their participation as subjects in the study conducted in conformity with the Declaration of Helsinki.
The experimental design is shown in Figure 1. Stimuli were two wrench-like objects (height, 8.5°; width, 2°), oriented either vertically or horizontally, presented on a dark blue background and centered 3.75° to the left and right or above and below fixation. Subjects fixated the middle of five squares (each 0.15° in diameter) throughout each trial. In every trial, the top left central square brightened after 9.5–12 sec, indicating that a top left location would most likely contain the following target stimulus. Depending on the orientation of the wrenches, either the uncued bottom left or the uncued top right end belonged to the cued wrench or to the other wrench (Fig. 1b).
After a cueing period of 6 or 9 sec, a broad (width, 0.5°) or narrow (width, 0.25°) slit was presented for 120 msec at an end of a wrench. In 75% of the trials, the cue was valid and the slit appeared at the precued location. In 25% of the trials, however, the cue was invalid and the slit appeared either at the uncued end of the same wrench or the other wrench, but always equidistant to the cued location (Fig. 1c). Subjects had to report slit width by pressing one of two buttons in each trial within 2 sec. Speed and accuracy were stressed. Every 33 trials the wrenches changed orientation, and this trial was discarded because of strong sensory signal change. Trials with different cueing duration, target type, and offset times were randomized. Altogether, 384 trials in three experimental sessions were completed by each subject. Subjects trained the task extensively, until performance was stable. During training, fixation was controlled with a digital infrared eyetracker (Ober 2; Permobil Meditech, Timra, Sweden), and only subjects who performed saccades in <1% of trials were included.
Stimulus presentation was controlled by a personal computer using the Experimental Run Time System software package (Berisoft, Frankfurt, Germany) that was triggered by the scanner. For better temporal resolution of BOLD responses, the offset between trigger and stimulus presentation varied within a range of 0–2500 msec in 500 msec steps.
fMRI data were acquired at 1.5 T (MAGNETOM Vision; Siemens, Erlangen, Germany). The subjects' heads were stabilized with a vacuum pillow in a standard head coil. Stimuli were projected on a back-projection screen that subjects fixated via a mirror attached to the head coil. They used a fiberoptic two-button response box for report. From each subject, we obtained BOLD contrast (T*-weighted) echoplanar image volumes [repetition time (TR), 3000 msec; echo time (TE), 51 msec; flip angle, 90°; 26 axial slices; voxel size, 3.3 × 3.3 × 3.3 mm] and T1-weighted three dimensional structural scans [magnetization prepared–rapid gradient–echo sequence: TR, 10 msec; TE, 4 msec; flip angle, 12°; inversion time (TI), 100 msec; 265 × 256 matrix; 170 sagittal slices; voxel size, 1 mm3; high-quality fast low angle shot sequences: TR, 38 msec; TE, 5 msec; flip angle, 30°; TI, 100 msec; 265 × 256 matrix; 180 sagittal slices; voxel size, 1 mm3].
Behavioral data. Mean reaction times for correct answers and errors (as a percentage) were entered in separate one-way repeated-measure ANOVAs with the factor “validity” (valid, invalid/same object, invalid/different object).
fMRI: preprocessing. Brainvoyager 2000 software (BrainInnovation, Maastricht, The Netherlands) was used for all fMRI analyses. The first four volumes of each functional run were discarded, and the remaining were corrected for slice scan time differences within a volume, coregistered with the three-dimensional structural data sets, and transformed into Talairach space (Talairach and Tournoux, 1988). Volume-time courses were motion corrected and temporally high-pass filtered (three cycles per run).
Cortical surface reconstruction. The cortical surface of each subject was reconstructed from the high-quality three-dimensional data set. The white matter was segmented using a grow-region function, a sphere was covered smoothly around the segmented region, and the reconstructed white matter was expanded into the gray matter. After separation of the hemispheres, the sulci were smoothed using a cortical inflation procedure. Finally, the surfaces were cut along the Calcarine sulcus and unfolded into the flattened format.
Retinotopic mapping and regions of interest. BOLD responses during cueing and target period were measured in regions of interest (ROI) in retinotopic visual areas, similar to previous studies (Ress et al., 2000; Saenz et al., 2002; Müller et al., 2003). Briefly, ROI were mapped separately by 8 Hz checkerboard stimulation at the corresponding locations and subdivided according to retinotopic boundaries (ventral and dorsal visual areas V1, V2, V3, and ventral V4) that were, again separately, mapped by checkerboard stimulation along the horizontal and vertical meridians (Sereno et al., 1995). The ROI were then marked on the reconstructed and flattened cortical surfaces for each of the five subjects.
fMRI: attention task. The BOLD response to the cue was averaged across voxels of a region of interest and trials with the same orientation of objects. The 2 sec preceding the cue served as a baseline. From the averaged data, the peak signals were extracted. Because behavioral data showed no systematic differences between performance for vertical and horizontal objects (F(1,4) = 4.08), the peak values for the bottom left and top right locations were collapsed and resorted with respect to uncued/same object and uncued/different object conditions. Also, data from trials with different delays between cue and target (6 or 9 sec) were collapsed. With the extracted peak values, a repeated-measures ANOVA with location (cued, uncued/same object, uncued/different object) and area (V1, V2, V3, V4) as within-subject factors was calculated.
To avoid confounding from the cueing period, the event-related averages of the BOLD responses to the target were analyzed relative to the 2 sec preceding target presentation as a baseline. The averages took into account validity of a trial, and, to exclude stimulus-driven activation by the target, only trials without the target at their corresponding location were analyzed for each given region of interest. For example, for the top right location, only trials in which the target appeared at the cued top left end or at the uncued bottom left end were included. Because this procedure further reduced the already small number of invalid trials that could be analyzed for a given region of interest, the data were collapsed across visual areas. The analyses of interest were restricted to the ROI representing the uncued locations. Peak values extracted from the averaged curves were entered in a repeated-measures ANOVA with validity (valid, invalid) and object type (same, different object) as factors.
The behavioral results are shown in Figure 2b. The subjects' responses were fastest when the target slit appeared at the cued location (i.e., when the cue was valid) (F(2,8) = 22.71; p < 0.01 for validity). With invalid cues, when the slit appeared at an uncued location subjects were faster when it belonged to the cued object than when it did not (p < 0.03, for pairwise comparison). Accuracy was ∼88% and did not differ between conditions (F(2,8) = 1.15 for validity).
In all visual areas assessed (Fig. 2a,c), the BOLD signal was enhanced in response to the cue. We found no systematic differences across areas (F(3,12) = 2.46 for area; F(6,24) = 1.71 for area × validity). The signal enhancement was strongest in visual subregions representing the cued top left location (F(2,8) = 5.09; p < 0.04 for validity). To a lesser extent, attention also modulated subregions representing the uncued locations but, crucially, the neural response at these locations was stronger in same object trials than in different object trials (p < 0.02, for pairwise comparison).
During target processing and in case the cue had been valid, BOLD responses for uncued locations did not depend on whether they belonged to the cued object or the other object (Fig. 2c). In other words, valid trials were not associated with a redistribution of attention relative to the preceding cueing period. With invalid cues and for locations on the uncued object, the same level of activity was observed as with valid cues. However, when in invalid trials the uncued location belonged to the cued object, the activity level rose higher than in all other conditions (F(1,4) = 7.8; p < 0.05, for validity × object type; p < 0.03, for pairwise comparison, invalid/same object vs invalid/different object). Hence, when the target did not appear at the expected location and needed to be searched at other locations, there was a strong and preferential activation for locations pertaining to the same object as the initially cued location.
The present study confirms behavioral results of previous studies (Egly et al., 1994; Watson and Kramer, 1999; Abrams and Law, 2000; Davis et al., 2000) in showing both object- and space-based attention effects in the same task. Space-based attention can be inferred from the fact that subjects performed better when a cue validly indicated the relevant spatial part of an object than in invalid trials. Object-based attention, in contrast, is shown by faster reaction times for stimuli at unexpected locations on the already attended object as opposed to stimuli at equidistant locations on another object. As described previously, these behavioral results can be explained by various models that differ in their assumptions on the interaction between object- and space-based attentional selection. The imaging data presented here may help to constrain the conceivable explanations.
Retinotopic activity in areas V1, V2, V3, and V4 that represented locations outside a spatially cued region was enhanced when the locations were bound to the cued location by a common object. This activation occurred long after the object stimuli had been introduced and also before an actual target stimulus had to be identified. Activity in early visual cortex is known to correlate with behavioral performance in detection tasks (Ress et al., 2000; Müller et al., 2003). Therefore, our finding of object-based modulation of the BOLD response to cueing could provide a neural basis for the faster behavioral reaction to stimuli appearing at uncued locations within, as opposed to without, an attended object. This result demonstrates that object-based attentional mechanisms do not necessarily operate on space-independent representations (Duncan, 1984; Vecera and Farah, 1994). It also argues against views in which object-based mechanisms only operate once a search has to be initiated. Instead, the data are consistent with models that feature object-based spatial selection (Vecera and Farah, 1994; Kramer et al., 1997; Weber et al., 1997; Davis et al., 2000) and propose that the deployment of attention in space is guided by the presence of objects, as detected by preattentive segmentation of the visual scene. Our observation complements previous demonstrations of a role of feature-based attentional mechanisms in determining early visual activity. It has been demonstrated that attending a stimulus feature increases the neural response of cortical visual areas coding for spatially distant, ignored stimuli if they share the same feature (Saenz et al., 2002). Here, we found that “sharing the same object” has a like-wise effect on activity in early retinotopic visual areas.
Our results obtained during the cueing phase are in accordance with the object-based spatial selection theory. In the simplest case of this model, attention should automatically and evenly spread to all locations within the same object. One would then expect that activity within the respective retinotopic object boundaries should increase to the same extent. Yet the BOLD signal enhancement corresponding to the cued location of the object was considerably stronger than for the uncued part. This suggests that the distribution of attention across spatially distinct parts of an object is under a form of top-down attentional control that involves object-independent spatial mechanisms instead of purely automatic “grasping” of objects or shapes as wholes, as suggested by He and Nakayama (1995) and O'Craven et al. (1999). In accordance with our finding, it has been shown that features of a common object part are preferentially processed (Vecera et al., 2000, 2001). Other evidence suggests that attention is not evenly distributed in space but rather forms a gradient (LaBerge, 1983; Downing and Pinker, 1985; Shulman et al., 1986; LaBerge and Brown, 1989), that the attention focus can be adjusted in size (Eriksen and St. James, 1986; Müller et al., 2003), and that attention can adopt the form of simple shapes such as a ring (Egly and Homa, 1984; Juola et al., 1991). A combination of these assumptions can explain the present results that attentional selection of retinotopic representations follows the contour of an object and is organized as a gradient such that representations of the most relevant parts of (or locations on) an object are activated most strongly.
The results we obtained during target selection are in line with assumptions of object-based search models (Moore et al., 1998; Avrahami, 1999; Shomstein and Yantis, 2002). If the target appeared at the expected location, there was little activity increase at the other locations. However, if the target did not appear at the expected location, a substantial signal increase was observed in representations of locations that belonged to the same object but not of those belonging to the other object. This analysis only included trials in which the target did not appear at the location under investigation and was, hence, not subject to influences from actual visual target processing. In the absence of stimulus-driven processes, this activation must be taken as evidence for realignment of attention. This realignment seems to follow object-based mechanisms that bias for unattended locations on the same object as the originally cued location. Thus, reacting to a target on an uncued object involves additional attentional shifting once cued and uncued locations on the cued object have been searched without success. The behavioral delay in reacting to targets on an unattended object may, therefore, reflect the combination of weak activation of early visual representations by cueing and by disadvantageous search strategies during target selection.
In conclusion, the present results point to the interaction of object- and space-based attentional modulation of early visual cortex activity at different processing steps. Object identification and attentional orienting in space are commonly believed to be driven by later, primarily nonretinotopic areas of the ventral stream [such as the lateral occipital complex (LOC)] (Malach et al., 1995; Grill-Spector et al., 1998, 2001) and the dorsal stream (involving the dorsal parietal cortex) (Corbetta et al., 2000), respectively. Accordingly, feedback signals from these areas are the most likely source for activity modulations in early visual cortex (Hopfinger et al., 2000; Murray et al., 2002). Consistent with this notion, a whole-brain analysis of the BOLD response during cueing in the present study revealed strong activation in the LOC as well as in parietal and frontal regions. It remains a challenge for future studies to clarify whether the effects observed here are precisely modulated by the stochastic validity parameter. Although our study demonstrates that differences between uncued locations depend on what object they belong to, we cannot resolve whether the neural activity gradient between cued and uncued locations within objects reflects the incidence with which targets appeared at the respective locations. It could be that the frequency of invalid cues determines the extent to which attention is allowed to “drift away” from a cued locus, thereby regulating the extent to which it spreads within a given object or beyond. What could be addressed as a “cue-induced static attentional gain map” of the visual field is then followed by attentional biasing of the search process. Such a dynamically adjusted interaction between object- and space-based attentional selection mechanisms could ensure high flexibility and, thus, good performance across widely different functional contexts.
This work was supported by the Volkswagen Foundation.
Correspondence should be addressed to Dr. Notger G. Müller, Department of Neurology, Johann Wolfgang Goethe-University, Schleusenweg 2–16, 60528 Frankfurt/Main, Germany. E-mail:.
Copyright © 2003 Society for Neuroscience 0270-6474/03/239812-05$15.00/0