Humans can use advance information to direct spatial attention before stimulus presentation and respond more accurately to stimuli at the attended location compared with unattended locations. Likewise, spatially directed attention is associated with anticipatory activity in the portion of visual cortex representing the attended location. It is unknown, however, whether and how anticipatory signals predict the locus of spatial attention and perception. Here, we show that prestimulus, preparatory activity is highly correlated across regions representing attended and unattended locations. Comparing activity representing attended versus unattended locations, rather than measuring activity for only one location, dramatically improves the accuracy with which preparatory signals predict the locus of attention, largely by removing this positive correlation common across locations. In V3A, moreover, only the difference in activity between attended and unattended locations predicts whether upcoming visual stimuli will be accurately perceived. These results suggest that the locus of attention is coded in visual cortex by an asymmetry of anticipatory activity between attended and unattended locations and that this asymmetry predicts the accuracy of perception. This coding strategy may bias activity in downstream brain regions to represent the stimulus at the attended location.
The locus of human attention is influenced by both stimulus-driven and goal-directed factors. Neural theories of attention contend that these influences are combined into a cortical map, in which activity in a particular portion of cortex represents the attentional priority of a particular portion of space (Koch and Ullman, 1985; Wolfe, 1994; Itti and Koch, 2001; Serences and Yantis, 2006). Studies of goal-directed, top-down attention usually report average activity modulations for different locations within this priority map, rather than considering the trial-by-trial relationship between activity for attended and unattended locations. (Kastner et al., 1999; Ress et al., 2000; Serences et al., 2004). Several pieces of evidence, however, suggest that trial-by-trial activity for attended and unattended locations are tightly linked. First, spatially nonspecific processes such as feature-based attention simultaneously modulate activity for all locations (Treue and Trujillo, 1999; Saenz et al., 2002; Serences and Boynton, 2007). These nonspecific processes may induce trial-by-trial activity fluctuations that could not be accounted for by simply comparing mean activity on attended trials versus mean activity on unattended trials. Second, it is well established that attending to a visual stimulus concurrently enhances activity for the attended stimulus while depressing activity for unattended stimuli (Tootell et al., 1998; Somers et al., 1999; Smith et al., 2000; Muller and Kleinschmidt, 2004; Silver et al., 2007). Here, we test whether simultaneously considering the neural activity for multiple locations, on a trial-by-trial basis, better predicts the top-down locus of spatial attention versus activity for the attended location alone.
Psychological and physiological models of visual attention generally imply that behavioral enhancements and neural increases for attended locations are necessarily coupled with behavioral decrements and neural decreases for unattended locations. These models have variously described the spatial distribution of attention as a gradient (Downing and Pinker, 1985; Shulman et al., 1986; Mangun and Hillyard, 1988), one or multiple spotlights (Posner, 1980; Brefczynski and DeYoe, 1999; Awh and Pashler, 2000; McMains and Somers, 2004), a zoom lens (Eriksen and Yeh, 1985; Muller et al., 2003), or a Mexican hat (Muller and Kleinschmidt, 2004; Muller et al., 2005), among other descriptions. Although each of these models implies a relative, rather than absolute, advantage for attended locations, physiological studies have not explicitly considered the difference in activity for attended versus unattended locations on a trial-by-trial basis.
The build-up of evoked stimulus representations depends critically on the nature of top-down signals. Selective enhancement of the attended location (Tsotsos et al., 1995) would suggest that evoked signals depend only on the preparatory activity at the corresponding location. Top-down modulation of multiple locations, however, would be consistent with a biasing signal that grants an attended location a competitive advantage over unattended locations, such that objects that appear at the attended location become represented at the expense of unattended stimuli (Desimone and Duncan, 1995; Reynolds and Chelazzi, 2004).
We examined the encoding of preparatory, top-down signals. We demonstrate that the encoding of these top-down signals is critical to the study of spatial attention.
Materials and Methods
Six subjects (three female) were recruited (aged 26–30, right handed) with no history of neurological illness, and normal or corrected-to-normal vision. Informed consent was obtained as per the guidelines of the human studies committee at Washington University School of Medicine. Subject 2 was author CS.
Eye position was monitored to ensure subjects always fixated a central crosshair. Each trial began with a 500 ms auditory “preparatory” cue, the spoken word “left” or “right,” indicating one of two locations: 5° eccentricity, 45° of radial angle to the left or right of the vertical meridian (see Fig. 1). After a stimulus-onset asynchrony (SOA) of 6.192 s (25%), 8.256 s (25%), or 10.32 s (50%) targets appeared for 100 ms centered at both locations, concurrent with an auditory report cue (left or right). Targets were 3.5 cycle-per-degree Gabor patches with a Gaussian envelope SD of 0.3°. On valid trials (75%), the report cue matched the preparatory cue. Subjects indicated the orientation (left tilt, vertical, right tilt) of the report-cued Gabor with a button press. There was a random intertrial interval (ITI) of 16.512 s (33%), 18.576 s (33%), or 20.64 s (33%). Each 6.2 min functional magnetic resonance imaging scan consisted of 13 trials; each subject performed between 700 and 900 trials over 8–12 scanning sessions. Subject 5 performed only 524 trials. Scans were intermixed with an equal number of scans with high-contrast Gabor patches used as targets (only performance predictability data reported here); subjects were always aware of block type.
Practice sessions and target parameters.
Before test scans, each subject performed ∼600 practice trials over two in-scanner sessions, to determine stimulus parameters using the lowest possible target contrast that would plateau performance at 70%. Timing was 2 s SOA and 2–4 s ITI. Across the six subjects, contrast at plateau performance ranged from 5 to 12% and the difference in orientation between targets varied from 6 to 45°. Occasional small adjustments were also made during test sessions.
Images were acquired with a Siemens (Erlangen, Germany) Allegra 3T scanner. Structural images used a sagittal magnetization-prepared rapid acquisition gradient echo T1-weighted sequence [repetition time (TR), 1810 ms; echo time (TE), 3.93 ms; flip angle, 12°; time for inversion, 1200 ms; voxel size, 1 × 1 × 1.25 mm]. Blood oxygenation level-dependent (BOLD) contrast images were acquired with an asymmetric spin-echo echo-planar sequence (TR, 2.064 s; TE, 25 ms; flip angle, 90°; 32 contiguous 4 mm axial slices; 4 × 4 mm in-plane resolution). BOLD images were motion corrected within and between runs, and timing differences across slices were corrected. Images were resampled into 3 mm isotropic voxels and warped into a standardized atlas space (Talairach and Tournoux, 1988).
Stimuli were presented with a Power Macintosh G4 computer (Apple, Cuperino, CA) using Matlab software (Mathworks, Natick, MA) with the psychophysics toolbox (Brainard, 1997; Pelli, 1997). Images were projected to the head of the bore of the scanner via an LCD projector (Sharp LCD C20X) and viewed with a mirror attached to the head coil. A magnet-compatible fiber-optic key-press device recorded subject responses. Eye position was measured in five of six subjects (not subject 2, author CS) with an ISCAN (Burlington, MA) ETL-200 system.
The BOLD data at each voxel, for each subject, were subjected to a general linear model using in-house software. Constant and linear terms over each BOLD run accounted for baseline and linear drift, and sine waves modeled low-frequency noise (<0.009 Hz). Separate δ function regressors coded each of the 11–13 time points (22.0704–26.832 s, depending on SOA) after the preparatory cue of each of the 24 different event types [3 SOAs × (left vs right cue) × (valid vs invalid) × (correct vs incorrect)]. A “residuals” dataset was created by summing the modeled responses (but not the constant, linear drift, or sine wave terms) with the residuals unaccounted for by the linear model. Therefore, this dataset contains the original time series minus the constant, linear drift, and sine-wave terms.
Derivation of regions of interest.
In all analyses, only trials with the longest SOA [five magnetic resonance (MR) frames] were used, to avoid contaminating preparatory signals with stimulus-evoked activity. To create regions of interest (ROIs) outside of retinotopic cortex, we performed a voxelwise ANOVA over the first six trial time points using the residuals dataset, separately in each subject. The ANOVA effects of interest were cue direction, target contrast, validity, performance, and time. An in-house clustering algorithm defined ROIs based on the resulting map of the main effect of cue direction. ROIs were initially defined as 8 mm spheres centered on map peaks with z-scores >3; spheres within 12 mm of each other were consolidated into a single ROI. An ROI was retained for subsequent analyses if present in at least six of 12 subject hemispheres. If a subject lacked a particular ROI, the z threshold was lowered to 2; if the subject still lacked the ROI, we did not include that ROI in that subject. Each ROI was masked with the contrast [contralateral, low contrast, correct valid trials minus ipsilateral, low contrast correct valid trials] (z > 0) to ensure that each voxel had a contralateral preference (except Fov, which had an ipsilateral preference). This procedure yielded ROIs in posterior superior frontal gyrus (SFG), frontal eye fields (FEF; located at the conjunction of the superior frontal sulcus and the precentral sulcus), posterior middle frontal gyrus (MFG), posterior inferior frontal sulcus (IFS), anterior intraparietal sulcus (aIPS), posterior intraparietal sulcus (pIPS), precuneus, and the posterior occipital pole (Fov; located at the foveal confluence). Because of across-subject variability, we collapsed the MFG and IFS regions into a single region, MFG/IFS, and the aIPS and pIPS regions into a single region, IPS.
We defined ROIs in visual cortex based on two different methods. First, we defined three separate regions in each hemisphere as the portions of retinotopic visual cortex [V7, V3A, and a third region encompassing V1v, V2v, VP, and V4 (V1–V4)] that varied with the direction of the preparatory cue. These regions were created by taking a conjunction of voxels in the appropriate retinotopic area with voxels showing a preference for contralateral cues. To determine the retinotopic areas, subjects passively viewed contrast reversing checkerboard stimuli extending along the horizontal and vertical meridians. A contrast of responses to the horizontal and vertical meridians was used to hand-draw borders of early visual areas, on a flattened representation of the subject's own anatomy using Caret software (Van Essen et al., 2001). To determine voxels with a contralateral cue preference we subjected each voxel to a contrast (t test) of (contralaterally cued, low contrast, correct, valid trials vs ipsilaterally cued, low contrast, correct, valid trials). Voxels preferring contralateral cues with z > 2 survived this test.
The second set of occipital ROIs was defined with independent localizer scans as the portions of early visual cortex (V1–V4) representing different locations in the visual field. This procedure also created six total ROIs (three in each hemisphere), representing a total of five locations in the visual field: one ROI for each of the two target locations (in the upper visual field), one ROI for each of the two target locations mirrored across the horizontal meridian (in the lower visual field), and one ROI in each hemisphere representing a single central field location. These ROIs were created by taking a conjunction of voxels in the appropriate retinotopic area (retinotopy scans described above) with voxels representing one of these five locations. To localize voxels representing these locations, in a separate set of scans, subjects passively viewed high contrast (∼50%) Gabor patches flickering at 4 Hz in 12 s blocks. In each block, a Gabor randomly appeared at one of the five locations. We constructed contrasts (t tests) of each passive stimulus with its mirror stimulus across the vertical meridian. The central location (1° width) was contrasted with the summed responses to all other locations. Subdivisions of early visual cortex were made by taking the conjunction of the voxels with a stimulus preference during the localizer scans (z > 2) and the earliest of the retinotopic regions (V1v, V2v, VP, and V4 for upper field locations; V1d, V2d, V3, and V3a for lower field locations).
Accuracy data were subjected to a three-way ANOVA (target location, cue validity, SOA) with subject used as a repeated measure.
In each ROI, data were averaged across voxels using the residuals dataset. For subtraction analyses, trial-by-trial data were next subtracted between ROIs in opposite hemispheres. Preparatory time courses (the first six time points) were extracted for each trial. Using half of the trial data, a discriminant was calculated as the time point-by-time point difference between average preparatory activity on leftward versus rightward cued trials. This discriminant was cross-correlated with the remaining trial data to calculate trial-by-trial magnitudes. The degree of separation between magnitudes for leftward versus rightward trials was quantified with a receiver-operator characteristic (ROC) curve. To obtain the ROC curve, the conditional probabilities P(α > crit|Rleft) and P(α > crit|Rright) were evaluated as a function of crit, where α is the derived magnitude, Rleft indicates the subset of trials with leftward cues, and Rright is the subset of trials with rightward cues. We repeated this procedure 1000 times for each ROI and took the average area under the ROC curve as the attention discriminability index (ADI). For individual time point attention discriminability indices (see Fig. 7), the entire dataset was used. Trial-by-trial magnitudes were simply the BOLD magnitude at a particular time point. ROC curves were constructed as above.
Group-wise statistical tests.
For tests comparing ADIs across conditions, we used nonparametric tests because of the low subject number and because the distribution of ROC values is unlikely to be normal. When comparing more than two groups, we used a two-tailed Friedman's test (nonparametric version of the two-way ANOVA). To compare two groups, we used the two-tailed Wilcoxon signed-rank tests (nonparametric version of a paired t test). Both of these tests account for across-subject variability by assigning ranks to each group separately for each subject. Note that with six subjects, each subject must have one group higher than the other for the test to be significant (p < 0.05) so these tests are very conservative for this study. Because FEF was defined in both hemispheres in only four subjects, the Wilcoxon test could not detect group differences in this region, and so for FEF we used the more powerful Friedman's test.
Individual-subject nonparametric tests.
In each of the individual-subject nonparametric tests comparing ADIs across regions or conditions, 10,000 bootstrapped datasets were created. Each bootstrapped dataset had the same number of entries as the original dataset, where each data entry was randomly selected from the residuals dataset, with replacement. In each bootstrapped dataset, ADIs were calculated (as above) and compared across the two ROIs or conditions. The p value was calculated as the percentage of times that one condition yielded a higher ADI than the other condition. To compare subtracted ROIs to individual ROIs, ADIs were calculated for the left, right, and left minus right ROI on each bootstrap. With each iteration, the subtracted ADI was compared with the average of the left hemisphere ADI and the right hemisphere ADI. To compare shuffled versus nonshuffled datasets, in each bootstrap, a dataset was created in which the order of the trials from the right hemisphere was shuffled before subtraction. Trial-type was always preserved between subtracted trials. ADIs were compared between the shuffled and nonshuffled datasets for each bootstrap. To compare the increase in the ADI caused by removal of positive correlation between homologous versus nonhomologous regions, either 14 (data-derived regions) or 12 (localizer-derived regions) ADIs were calculated for each bootstrap: shuffled and nonshuffled subtractions between homologous regions, and shuffled and nonshuffled subtractions for each of the nonhomologous pairs including the left (L) or right (R) hemisphere ROI being tested. With each iteration, the homologous shuffled/nonshuffled difference was compared with the average nonhomologous shuffled/nonshuffled difference. So, for FEF, we compared with the shuffling effect of (L FEF − R FEF) to the average shuffling effect of (L FEF − R IPS), (L FEF − R V3A), (L FEF − R V1–V4), (L IPS − R FEF), (L V3A − R FEF), and (L V1–V4 − R FEF).
Trial-by-trial magnitudes were created in each ROI by cross-correlating the six-time point trial preparatory activity with the average response across all subjects and ROIs to contralateral cues (0.072920053; 0.136640788; 0.402087447; 0.569316815; 0.529799933; 0.45774233). For each subject, we computed a matrix of correlation coefficients across all ROIs. This correlation matrix was computed separately for trials with leftward and rightward preparatory cues and the coefficients were averaged across conditions and across subjects.
For each ROI dataset (individual or derived from a subtraction), a discriminant was calculated as the difference between the average preparatory activity after leftward and rightward cues, using all trials. This discriminant was cross-correlated with the preparatory activity from each trial to derive trial-by-trial magnitudes. We used the ROC analysis, as above, to quantify the degree of separation between correct and incorrect trials, separately for valid trials with leftward and rightward cues. Because magnitudes should have been higher for trials with leftward cues and lower for trials with rightward cues, we assumed that correct magnitudes were grater than incorrect magnitudes for leftward trials, and incorrect magnitudes were grater than correct magnitudes for rightward trials. To determine whether performance predictability for a given ROI was greater than chance at the group level, we averaged the values for leftward and rightward trials within each subject to get six total values. We then created averages from 10,000 bootstrapped datasets in which six random performance predictability values were selected with replacement. A region was considered to predict performance better than chance if over 97.5% of these bootstrapped averages were >0.50.
BOLD data were collected from six subjects as they performed between 700 and 900 trials of a difficult task requiring directed spatial attention (Fig. 1). At the beginning of each trial, subjects were given an auditory preparatory cue indicating that they should covertly attend to a specific location in the upper left or upper right portion of the visual field. After a random delay (6–10 s), stimuli appeared at both of these locations, concurrent with an auditory report cue. Subjects had to judge the orientation of the stimulus at the location that had been indicated by the report cue, which was predicted by the preparatory cue on 75% of trials (valid trials). Because the targets were near detection threshold, the use of the report cue was necessary to ensure that subject errors were not caused by uncertainty regarding the location of the stimulus (Shiu and Pashler, 1994; Dosher and Lu, 2000; Carrasco, 2006). Activity during the prestimulus delay, when all relevant locations were devoid of visual stimuli, represented purely endogenous modulations.
Behavioral analyses confirmed that subjects covertly attended to the cued location throughout the interval between the auditory preparatory cue and the onset of the visual targets. Task accuracy was significantly affected by the validity of the preparatory cue (F(1,5) = 27.4; p = 0.009), but not by the target location [F(1,5) = 1.9; not significant (ns)] nor by the duration of the cue-target interval (F(2,4) = 0.5; ns). Furthermore, there were no significant interactions involving cue validity, indicating that spatial attention was equally beneficial for targets at each of the two locations and across all cue-target intervals. Accuracy on validly cued trials averaged 69.5% whereas accuracy on invalid trials averaged 59.7%. Analyses of BOLD data are restricted to the preparatory, pretarget activity of trials with the longest cue-target interval: 10 s, comparable with other studies of preparatory activity (Kastner et al., 1999; Hopfinger et al., 2000; Serences et al., 2004; Giesbrecht et al., 2006).
Top-down attention modulates the entire visual field
Covert attention to a single location, before visual stimulus onset, modulated neural activity representing all locations. Within visual cortex, focused attention to a peripheral location in the upper left or upper right visual field created a sharp peak of activity for the attended location (V1v, V2v, VP, and V4), coupled with suppression of nearby cortex representing the central visual field. Activity in more distant cortex representing the lower visual field in the same hemifield as the attended location was somewhat enhanced. This pattern of activity is illustrated in Figure 2, and time courses for selected portions of the visual field are presented in Figure 3 and supplemental Figure 1 (available at www.jneurosci.org as supplemental material). In addition, activity in the following regions modulated with the locus of attention, before visual stimulus onset: the FEFs, the posterior MFG/IFS, the IPS, precuneus, V7, and V3A. Figure 3 displays all of the regions modulating with cue direction from one subject, and supplemental Table 1 (available at www.jneurosci.org as supplemental material) lists average Talairach coordinates. To avoid a large number of multiple comparisons, we largely focus analysis on FEF, IPS, V3A, V1–V4, and subdivisions of visual cortex representing specific portions of the visual field. [Throughout the text, V1–V4 refers to the portion of early visual cortex varying with locus of attention. Subdivisions within visual cortex (the target regions, lower field regions, and central field regions) were defined with independent localizer scans. V1–V4 and the target region are partially overlapping, depending on the subject. Furthermore, V3A refers to the portion of V3A varying with the locus of attention, which, presumably, is mostly the upper field representation of V3A.] FEF and IPS are the core regions of the “dorsal attention network” (Kastner and Ungerleider, 2000; Corbetta and Shulman, 2002), the putative source of top-down signals to visual cortex (Moore and Armstrong, 2003). V3A has widespread anatomical connections to both extrastriate visual cortex and the dorsal attention network (Felleman and Van Essen, 1991; Schall et al., 1995) and is strongly modulated by the locus of attention (Tootell et al., 1998; Nakamura and Colby, 2000).
Advantages of encoding attention across a map
The widespread top-down modulation of both attended and unattended locations (Fig. 2) suggests that the locus of attention may be encoded as the relative activity difference between multiple locations. The principal advantage of such an encoding scheme is the removal of signals common to both locations, manifesting in positive correlation. Taking the relative activity difference between parts of cortex representing different target locations could improve encoding by (1) constructively combining signals with opposite preferences and (2) removing any positively correlated activity unrelated to the locus of attention. Although it is trivial that combining two independent sources of information improves encoding of the locus of attention, the removal of positive correlation represents a unique contribution of considering multiple sources of information at once. Figures 3 and 4 illustrate these points with the portions of visual cortex (and, separately, FEF) representing the two target locations. Figure 3 shows that the signal related to the locus of attention goes in opposite directions for portions of cortex in opposite hemispheres. That is, activity from the left hemisphere is higher for rightward cues whereas activity from the right hemisphere is higher for leftward cues. Figure 4 illustrates that BOLD activity in these regions with opposite cue preferences is nevertheless positively correlated across trials. This positive correlation undermines the selectivity of each region, and its removal could dramatically improve the ability to decode the locus of attention.
Our data suggest that comparing activity for mirror locations in opposite (left/right) hemifields will optimally remove positively correlated noise. For simplicity, we only consider sampling activity for two locations. Activity should be compared between portions of cortex with opposite (attend left vs attend right) preferences, because this will constructively combine their preferences. Theoretical work indicates that when taking the difference in activity between portions of cortex with opposite preferences, information content increases as the degree of positive correlation between those portions increases (Chen et al., 2006). We examined the correlation structure, therefore, across six subdivisions of visual cortex representing different portions of the visual field. As shown in the Tables 1 and 2, each subdivision displayed significantly higher correlation with the portion of cortex representing the mirror location in the opposite hemifield compared with any other opposite-hemisphere subdivision (Wilcoxon, p = 0.03, for all comparisons). Additionally, as displayed in Table 1, activity between portions of homologous regions representing the two target locations (e.g., left FEF and right FEF) was, in general, more highly correlated than activity between nonhomologous regions representing the two target locations (e.g., left FEF and right IPS). This effect was significant for V3A and V1–V4 (Wilcoxon, p = 0.03), but not FEF or IPS. FEF, however, had a higher correlation with the opposite hemisphere homolog than with any other region in three of the four subjects in which this region was defined.
Quantification of relative encoding scheme for the locus of attention
The relative activity between portions of cortex representing different locations better predicted the locus of attention compared with activity for only a single location. To facilitate comparisons between these encoding schemes, we quantified how well each scheme could discriminate between trials with leftward versus rightward covert attention. The ADI was derived from ROC curves and ranged between 0.5 (chance discrimination) and 1.0 (perfect discrimination). The white bars in Figure 5A present regional ADIs when considering activity only in the portion of the priority map representing one of the two potentially attended locations. The gray bars in Figure 5A present the ADIs when considering the difference in activity between the portions of priority maps representing these two locations. Supplemental Table 2 lists ADIs for all regions. In each region tested (FEF, IPS, V3A, V1–V4), comparing activity between portions of cortex representing the two potentially attended locations gave a significantly higher ADI relative to activity at just a single location (Friedman's test, FEF, p = 0.04; Wilcoxon tests, IPS, p = 0.031; V3A, p = 0.031; V1–V4, p = 0.031; portion of visual cortex representing targets, p = 0.062; the marginal significance in visual cortex was caused by a single subject; the remaining five subjects were all significant at p < 0.001) (see Materials and Methods). Comparing activity between portions of visual cortex representing two lower field locations (p = 0.094) or portions of cortex representing the central visual field (p = 1) did not improve discriminability relative to individual regions.
Critically, the difference in activity between portions of cortex representing different locations improved prediction of the locus of attention because of the removal of positively correlated activity. For each set of regions representing the two potential target locations, we created 500 new datasets by shuffling the trials such that activity representing the right location was randomly paired with activity representing the left location from any other trial, so long as the same preparatory cue was given. Subtracting activity between these randomly paired activities can only improve the ADI by combining signals with opposite signs, because the trial-to-trial correlation is effectively removed. The black bars in Figure 5A represent average ADIs for the shuffled datasets (data for all regions are listed in the third column of supplemental Table 2, available at www.jneurosci.org as supplemental material). Shuffling trials caused a significant reduction in the ADI compared with unshuffled trials in all regions tested, indicating that removal of a positively correlated signal contributed to the improvement in decoding (Friedman's tests, FEF, p = 0.05; all remaining regions, p = 0.01). Figure 5A represents graphically the contributions of (1) combining signals with opposite preferences and (2) removing positively correlated activity.
Additional analyses confirmed that the comparison yielding the most information about the current locus of attention was between activity for the attended location and the mirror location in the opposite hemifield. As illustrated in Figure 5, the difference in ADI between shuffled and nonshuffled datasets indicates the degree of improvement in discriminability accounted for by removing positive correlation. Within visual cortex, improvement caused by removal of positive correlation was significantly greater when comparing activity in the portion of cortex representing the attended location to its mirror location versus comparing it to any other location (p = 0.06; significant at p < 0.05 in five of six individual subjects). Furthermore, improvement caused by removal of positive correlation was significantly greater for homologous (L and R V3A) versus nonhomologous (L V3A and R V1–V4) comparisons for V1–V4 (Wilcoxon, p = 0.03) and marginally significant in V3A (p = 0.06). In individual subjects, this effect was significant (>95% of bootstraps) in FEF (three of four subjects), V3A (four of six subjects), and V1–V4 (five of six subjects). This effect was not significant for IPS (group, p = 0.31; two of six individual subjects). Figure 5B compares the subtraction of homologous versus nonhomologous opposite hemisphere regions using V3A as an example.
Spatial attention signals predict perception
If the relative activity between attended and unattended locations best captures the locus of attention, then this relative activity difference may predict how well subjects perform the upcoming task. Indeed, we found this to be the case in one region, V3A. Using ROC analyses, we quantified how well each region (FEF, IPS, V3A, early visual cortex) predicted performance, testing both single location and multiple location encoding schemes. Across the regions tested, none significantly predicted performance when only considering activity for the attended location. When considering the relative difference in activity between portions of each region representing the two potential target locations, however, V3A significantly predicted performance above chance (p = 0.007, two-tailed). The relationship of V3A activity to performance was exactly as expected: relatively more activity in left V3A (representing the right target location) predicted accurate performance for rightward targets, although relatively more activity in right V3A predicted accurate performance for leftward targets. This effect is displayed separately for each subject in Figure 6. The average performance predictability for the left minus right V3A region was 0.54 (compared with 0.50 for the individual left and right V3A regions). Because the performance predictability was relatively low, we verified this result in a second, independent dataset, collected for a different study. In this second dataset, methods were exactly the same, with the exception that targets were higher contrast (∼50%), and performance was limited by small orientation differences between targets rather than contrast. In this independent dataset, the spatial attention signal in V3A again significantly predicted performance above chance (performance predictability, 0.54; p = 0.045, two-tailed). These results demonstrate that the strong spatial attention signal distributed across left and right V3A is relevant to behavior and bolster the notion that spatial attention is encoded in the relative activity levels for different locations.
Sustained versus transient signals
Relative encoding of the locus of attention implies that attending to a particular location is not necessarily associated with sustained activity increases in the portion of cortex representing that location. In fact, the ability of individual portions of cortex to indicate the locus of attention was independent of whether the absolute signal modulation was sustained or near resting baseline. Figure 7A displays the net BOLD modulations of FEF and the portion of visual cortex representing the target location during the prestimulus interval. Figure 7B shows time point-by-time point ADI values for each region. Although FEF displayed sustained BOLD modulations throughout the preparatory period, activity in visual cortex peaked ∼6–8 s postcue and then returned nearly to baseline by stimulus presentation. Nevertheless, BOLD activity in each region best indicates the locus of attention at the end of the preparatory period.
We have provided evidence that the top-down locus of attention is encoded as the difference in activity between portions of cortex representing attended versus unattended locations. First, this relative activity measure improved prediction of the locus of attention compared with measurement of activity for only a single location, mostly by removing a signal that was highly correlated across regions representing different locations (Gold and Shadlen, 2001; Chen et al., 2006; Fox et al., 2006). Second, only the relative activity between attended and unattended locations predicted whether subjects would correctly perceive objects that subsequently appeared at the attended location. Third, the amount of information embedded across cortical maps concerning the locus of attention was independent of whether the magnitude of activity for the attended location was above resting baseline. A critical difference from most previous studies of preparatory attention (Kastner et al., 1999; Corbetta et al., 2000; Hopfinger et al., 2000; Serences et al., 2004; Giesbrecht et al., 2006) is that we took into account the trial-to-trial variation of preparatory signals at attended and unattended locations, rather than just considering the mean signal over trials at each location.
Spatial attention as moment-to-moment asymmetries across cortical maps
Previous studies of visual attention have implied that one can infer whether a particular portion of space is being attended by the level of activity in the portion of visual cortex representing that location. Preparatory attention has been associated with BOLD increases for attended locations (Kastner et al., 1999; Hopfinger et al., 2000; Muller et al., 2003; Serences et al., 2004) and BOLD decreases for unattended locations (Silver et al., 2007), relative to a resting baseline condition. Similarly, stimulus-evoked activity has been associated with increases and decreases relative to a passive viewing condition (Tootell et al., 1998; Somers et al., 1999; Smith et al., 2000; Muller and Kleinschmidt, 2004). Because these studies measure average (across trials) activity associated with attended and unattended locations, the implication is that cortex representing a distinct location displays one particular level of activity when that location is attended and another particular, lower level of activity when that location is unattended.
Our results are more consistent with an alternative model: activity relative to resting baseline (and perhaps passive viewing of stimuli) is determined by multiple ongoing processes. These processes, most of which are unrelated to the focus of attention, induce large trial-by-trial fluctuations in activity at all locations in visual cortex. The locus of attention is determined by the moment-to-moment asymmetries across the map of all locations, and does not depend on an increase or decrease in activity relative to rest (or passive viewing) per se. The advantage of this relative encoding is that the locus of attention is robustly coded across different task demands and different states of arousal.
The positive trial-to-trial correlation in preparatory activity across locations in the current study may have been partly attributable to task-relevant but nonspatial processes such as feature-based attention (Treue and Trujillo, 1999; Saenz et al., 2002; Serences and Boynton, 2007) or noise suppression, which has been shown to affect activity at both the attended location (Serences et al., 2004) and at distant unattended locations (Ruff and Driver, 2006). Task-independent processes such as arousal or spontaneous BOLD fluctuations (Biswal et al., 1995; Fox et al., 2005; Mantini et al., 2007; Vincent et al., 2007) may have also contributed to the positive correlation. Interestingly, correlations were highest between regions representing mirror locations in opposite hemifields. A speculative hypothesis is that representations of homotopic locations are highly connected in part so that noise becomes shared across these locations. This noise is then easily accounted for by relative decoding. In support, anatomical studies reveal a higher density of callosal connections between cells representing homotopic versus heterotopic locations (Spatz and Tigges, 1972; Wagor et al., 1975; Segraves and Rosenquist, 1982; Dougherty et al., 2005).
Importantly, sustained attention in the current study did not depend on sustained activity in cortex representing the attended location but only on an asymmetry in activity between regions representing attended versus unattended locations. Although previous studies have reported positive sustained activity in visual cortex during covert attention (Kastner et al., 1999; Hopfinger et al., 2000; Serences et al., 2004; Silver et al., 2007), we reported visual cortex activity near baseline by the end of the preparatory period. This return to baseline did not indicate reduced spatial attention in the course of the delay as indicated by two observations. One, behavioral data showed that performance did not depend on the duration of the cue-target interval. Two, activity in visual cortex best predicted the locus of attention at the end of the preparatory period. Thus, even as absolute activity in visual cortex decreased toward the resting baseline, activity across visual cortex became more strongly correlated with the locus of attention.
Several lines of evidence have suggested that attention is influenced by activity for multiple locations. Bisley and Goldberg (2003) reported that attention for a particular location could be known only by considering the firing rates of LIP cells representing two task-relevant locations, not just by the activity of LIP cells representing the target location alone. The premotor theory of visual attention suggests that covert attention is closely related to overt eye movements (Rizzolatti et al., 1987), which have been shown to depend on the distribution of activity across entire neural maps (Sparks et al., 1976; Lee et al., 1988). Furthermore, frontal eye field lesions cause conjugate eye deviation (Pedersen and Troost, 1981; Tijssen et al., 1991; Singer et al., 2006), consistent with eye movements being driven by the difference in activity between portions of FEF representing different portions of space. Additional evidence comes from the syndrome of unilateral neglect, characterized by inattention to the contralesional hemifield and hyperattention to the ipsilesional hemifield. This pattern suggests to some that the intact control regions operate through competitive mutual inhibition, and that the activity difference between regions eventually determines the locus of attention (Kinsbourne, 1987). In support, the severity of attentional impairment in neglect is associated with disrupted connectivity across regions in opposite hemispheres (He et al., 2007), whereas recovery is associated with a rebalancing of activity across these regions (Corbetta et al., 2005).
Functional significance of preparatory signals for spatial attention
Although there have been many studies of preparatory activity in visual cortex (Kastner et al., 1999; Corbetta et al., 2000; Hopfinger et al., 2000; Ress et al., 2000; Muller et al., 2003; Sapir et al., 2005; Giesbrecht et al., 2006), only a few have linked preparatory activity to perception (Ress et al., 2000; Sapir et al., 2005; Giesbrecht et al., 2006), and this study is the first to link purely preparatory signals in visual cortex to perception of upcoming stimuli as a function of the locus of attention. Ress et al. (2000) demonstrated a strong relationship between performance and stimulus-independent activity in the portion of visual cortex corresponding to the attended location. Although the relationship between BOLD activity and performance was stronger in Ress et al. (2000) than in the present study, Ress et al. (2000) recorded BOLD signals during a phase of the task in which the subjects were attending to and analyzing the stimulus, making the perceptual decision, and making a manual response. Feedback signals for motor responses (Astafiev et al., 2004) and end-of-trial (Jack et al., 2006) might have contributed to the predictive activity. In the present study, we link perception to signals that only reflect prestimulus preparation and do not reflect the magnitude of activity during the perceptual decision. Notably, many other studies of preparatory activity have not reported whether there is a link between prestimulus activity in visual cortex and perception (Kastner et al., 1999; Corbetta et al., 2000; Hopfinger et al., 2000; Muller et al., 2003). It is not surprising, therefore, that we found only a modest relationship between prestimulus activity and perception, given the paucity of such relationships in the literature.
Interestingly, the strength of the preparatory activity bias in V3A toward the attended location, relative to the unattended location, predicted perception. Variability in the strength of this preparatory attentional bias was probably caused by both variable use of the preparatory cue (Sapir et al., 2005) and variable efficiency of the spatial orienting mechanism. We speculate that a preparatory activity bias was especially critical in V3A because it may have ultimately provided the task-relevant stimulus representation, as studies have shown that V3A is sensitive to stimuli at low contrast (Tootell et al., 1997) and is highly tuned for orientation (Zeki, 1978; Fang et al., 2005; Larsson et al., 2006).
This study highlights the importance of assessing the locus of spatial attention as the asymmetry of activity between cortex representing attended versus unattended locations. Comparing activity for multiple locations better captures the actual top-down modulation across cortical maps and is robust to nonspecific, task-dependent modulations affecting activity for all locations. This study supports the hypothesis that top-down signals bias perception by granting attended locations a competitive advantage over unattended locations (Desimone and Duncan, 1995).
This work was supported by the J. S. McDonnell Foundation, National Institute of Neurological Disorders and Stroke Grants F30 NS057926-01 and R01 NS48013, National Institute of Mental Health Grant R01 MH71920-06, and Marie Curie Chair European Union Grant MEXC-CT-2004-006783. We thank Drs. Giovanni d'Avossa, Ayelet Sapir, and Mark McAvoy for guidance with statistical analyses.
- Correspondence should be addressed to Maurizio Corbetta, Department of Radiology, Washington University School of Medicine, 4525 Scott Avenue, Campus Box 8225, St. Louis, MO 63110.