Abstract
We present a functional MRI experiment investigating the neural basis of feature-based attention in humans using the Stroop task. Cortical areas specifically involved in color processing and word reading were first identified in individual participants using independent tests. These areas were then probed during the Stroop task (in which participants must selectively attend to the font color of a word while ignoring the word itself). We found that activation in functionally defined color areas increased during the task relative to a neutral color-naming task while activation in functionally defined word areas decreased. These results are consistent with a biased competition model of feature-based attention in which the processing of attended features is enhanced and the processing of ignored features is suppressed.
Introduction
Spatial attention enhances neural processing associated with attended locations and suppresses processing of unattended locations (Moran and Desimone, 1985). Similarly, attending to an object enhances neural processing associated with that object and suppresses processing associated with other objects (Gazzaley et al., 2005). In this study, we used functional MRI to investigate whether feature-based attention works the same way: does attending to a specific feature lead to enhancement of neural processing for the attended feature and suppression of processing for an ignored feature?
The earliest neural studies of feature-based attention in humans found that selectively attending to color, shape, or motion (relative to dividing attention across features) led to increased activation in extrastriate visual areas that were plausibly associated with the processing of those features (Corbetta et al., 1990, 1991). Subsequent fMRI studies similarly found activation in cortical areas plausibly associated with color and motion processing when participants attended to color and motion (Chawla et al., 1999; Liu et al., 2003) [see also O'Craven et al. (1997), Shulman et al. (1997), Giesbrecht et al. (2003), and Sàenz et al. (2003)]. In most of these studies, attention to one feature was contrasted with attention to another feature. It is therefore difficult to determine whether the activation was due to enhanced processing of the attended feature, suppressed processing of the unattended feature, or both.
The most popular task for studying feature-based attention in humans is the Stroop paradigm (Stroop, 1935). Color words are presented in conflicting colored fonts, e.g., the word RED printed in green font, and participants must name the font color while ignoring the word itself. The task involves attending to a relevant feature (the font color) while ignoring a salient, but irrelevant feature (the word). Most imaging studies of the Stroop task have investigated the anatomical source of attention and have found activation in anterior cingulate cortex, dorsolateral prefrontal cortex, and/or posterior parietal cortex (Pardo et al., 1990; Bench et al., 1993; Carter et al., 1995, 2000; Banich et al., 2000a,b; MacDonald et al., 2000; Milham et al., 2001; van Veen and Carter, 2005; Liu et al., 2006; Coderre et al., 2008).
To determine the effects of feature-based attention on the processing of attended features and unattended features, it is necessary to examine the posterior brain areas that are the targets of attention. Stroop studies that have addressed posterior brain areas have yielded conflicting results, ranging from no evidence for enhancement or suppression (Pardo et al., 1990), to evidence for suppression but not enhancement (Bench et al., 1993), to evidence for both (Carter et al., 1995). Banich et al. (2000a,b, 2001) found no activation in areas associated with color processing, but reported evidence that word areas may actually be enhanced rather than suppressed. Using a variant of the Stroop task with the names and faces of politicians and actors as stimuli, Egner and Hirsch (2005) found evidence for the neural enhancement of task-relevant information, but not for the neural suppression of task-irrelevant information.
A potential explanation for these mixed results is that in most of these studies no independent test was performed to confirm the posterior areas involved in processing the features being attended and ignored in individuals. Because these areas are relatively small and because their exact location varies across individuals, it is hard to know whether the posterior areas activated and/or deactivated were actually involved in the processing of the attended or ignored features.
In this study, we functionally and independently identified cortical areas involved in processing colors and words in individual participants and then probed those areas during the Stroop task. We confirm the hypothesis that attention both enhances the processing of attended features and suppresses the processing of unattended features in humans.
Materials and Methods
Participants.
A total of 14 University of Michigan students participated after providing informed consent. All were right-handed native English speakers with normal vision. They were paid $30/h plus a small bonus based on their accuracy and reaction time (RT) in the scanner (mean = $14.57; range = $5–28). They had no history of neurological problems or learning disability and were not taking medications.
Instructions.
Participants memorized a finger–color mapping and then practiced the color-naming task using colored bars instead of Stroop stimuli for 5 min before entering the fMRI scanner. The instructions were: “When it says ‘Identify Color’, the task is to press the key corresponding to each color with the appropriate finger. The stimuli may be colored bars or words written in colored font.” If a participant asked whether to emphasize speed or accuracy, they were told not to sacrifice accuracy for speed. Since these individuals participated in a second experiment during the same session in the scanner, they also briefly practiced an event-related Stroop experiment that required them to respond to the word or font color of Stroop stimuli on different trials.
Regions of interest.
The stimuli used for identifying color and word areas are shown in Figure 1. The color contrast compared colored versus gray patterns. The word contrast compared words versus consonant strings. The four runs devoted to area identification alternated with the runs used for hypothesis testing. Area identification runs consisted of 8 min of passively viewing 20 s blocks of each stimulus category in pseudorandom order, with stimuli being presented at a rate of one per second. The first and last area identification runs were for identifying word areas and the middle two were for color areas.
We used PickAtlas (http://www.fmri.wfubmc.edu/cms/software#PickAtlas) (Maldjian et al., 2003) to create two anatomical masks based on previous experiments on color and word processing (Beauregard et al., 1997; Büchel et al., 1998; Chao and Martin, 1999; Bartels and Zeki, 2000; Cohen et al., 2000; Polk and Farah, 2002): (1) a bilateral lingual gyrus and fusiform gyrus mask that excluded the anterior half of the fusiform gyrus (for use in finding the color area) and (2) a left fusiform mask (for use in finding the word area). We then normalized each participant's data and found the top five voxels for the color contrast within anatomical mask 1 for each participant. These were the individual color areas. We also found the top five voxels for the word contrast within anatomical mask 2 for each participant. These were the individual word areas.
Hypothesis testing.
The stimuli for the hypothesis testing conditions are shown in Figure 2. Four hypothesis testing runs lasted 5 min each and consisted of five 20 s blocks each of Stroop (top of figure) and neutral (bottom of figure) trials in pseudorandom order with 10 s of fixation between blocks. In both the Stroop and neutral conditions, participants indicated the color of the stimulus by pressing a finger of their right hand corresponding to the four colors used in the experiment: red = index, yellow = middle, green = ring, and blue = pinky. Trials were self-paced.
We used manual as opposed to verbal responses to avoid motion artifacts associated with speech and we used neutral rather than congruent control stimuli to avoid facilitation effects. We chose four frequency-matched abstract control stimuli (TAX, DEAL, FAITH, and SYMBOL) instead of animal names used in some previous experiments (Carter et al., 1995) to reduce differences in concreteness between the conditions (Whatmough et al., 2004).
fMRI data acquisition.
Stimulus presentation was controlled by E-Prime software on a PC. MRI experiments were performed at 3T (GE Signa). Foam padding was used to restrict head motion comfortably. We used fMRI with BOLD (blood oxygenation-level dependent) contrast (Ogawa et al., 1990, 1992; Bandettini et al., 1992; Kwong et al., 1992). Each imaging protocol began with a 10–15 min acquisition of standard images used for determining regional anatomy. We first performed a three-plane localizer followed by a T1 sagittal localizer (19 5 mm slices with 1 mm skip, TR = 300 ms, TE = 3.4 ms, FOV = 24 cm, 1 NEX, 256 × 160 matrix). We then acquired 30 T1-weighted oblique-axial structural images approximately parallel to the AC–PC line (5 mm slices with 0 mm skip, TR = 300 ms, TE = 3.5 ms, FOV = 24 cm, 1 NEX, 256 × 256 matrix). Functional acquisitions consisted of 30 5 mm contiguous oblique-axial slices obtained every 2 s and prescribed at the same locations as the 30 oblique-axial structural scans (0 mm skip, TR = 2000 ms, TE = 30 ms, FOV = 24 cm, 1 NEX). Images were reconstructed into a 64 × 64 display matrix for an effective spatial resolution of 3.75 mm × 3.75 mm × 5 mm. After the functional scans, we acquired a T1-weighted 3-dimensional spoiled gradient echo image (60 2.5 mm axial slices, TR = 38 ms, TE = 3 ms, FOV = 24 cm, 1 NEX, 256 × 160 matrix).
The raw k-space data were reconstructed into 2D images using a 2D FFT, with appropriate corrections for N/2 ghost removal. Correction for head movement was performed using Automatic Image Registration (AIR release 3.1) (Woods et al., 1998), which uses a rigid body (6 parameter) model. Voxelwise analysis was performed in Voxbo (voxbo.org) using the modified general linear model approach (Worsley and Friston, 1995).
Results
Behavioral results
Behavioral data are shown in Figure 3. A typical Stroop effect was observed in RTs, as mean responses were slower in the Stroop condition (654 ms) relative to the neutral condition (606 ms; t(13) = 5.84, p = 0.0001; one-tailed). Mean accuracy was high for both neutral (97.3%) and Stroop conditions (96.8%), with no significant difference in accuracy between the two conditions (t(13) = 0.82, p = 0.21). Stroop effects are often larger in designs in which the incongruent stimuli are rare events (Carter et al., 2000) or in which participants emphasize speed over accuracy (van Veen et al., 2008).
fMRI whole-brain results
We normalized the statistical parametric maps for the Stroop versus neutral contrast in each individual participant, smoothed them with a 5 × 5 × 5 mm kernel, and submitted them to a second-level whole-brain random effects analysis. Results are presented in Figure 4 and Table 1 (thresholded at p < 0.001, uncorrected). Consistent with many other Stroop studies, we observed activation in left (and to some extent right) dorsolateral prefrontal cortex [Brodmann areas (BA) 46 and 9] as well as in right (and some left) posterior parietal cortex (BA 7). In addition, we found activation in left and right ventrolateral prefrontal cortex (BA 47) and in superior and more medial frontal cortex (BA 8 and 6). Although anterior cingulate activation is regularly observed in Stroop studies, we found only minimal evidence [two voxels exceeded a t value of 3.5 at Talairach coordinates (2, 18, 24)].
fMRI region-of-interest results
To test our hypotheses, we found the average t value for the Stroop versus neutral contrast in each participant's functionally defined ROIs and performed second-level analyses on the average t values to test for enhancement in the color areas and suppression in the word areas. Figure 5 shows the average t values for the Stroop versus neutral contrast in color and word areas in individual participants and averaged across participants (right of figure). We used nonparametric tests because the relatively small sample size undermines the distributional assumptions of a parametric test like the t test. Nevertheless, we confirmed that the same pattern of results is obtained using t tests. Consistent with the enhancement hypothesis, the Stroop condition activated color areas relative to the neutral condition (W+ = 78, W− = 27, n = 14, p = 0.059 by Wilcoxon one-sample signed-rank test; p = 0.029 by sign test) and deactivated word areas relative to the neutral condition (W+ = 24, W− = 81, n = 14, p = 0.039). Furthermore, when participants who exhibited the least color area activation in the color-area identification runs were excluded from the analysis (under the assumption that their color area may not have been accurately identified), the evidence for enhancement in the color area was even stronger (W+ = 55, W− = 0, n = 10, p = 0.001). We chose an average t value of 1.96 for the color versus gray contrast in the area identification runs for this threshold. This is the standard threshold for significance for a t test with a very large number of degrees of freedom. Four subjects out of 14 failed to pass this threshold. More generally, as Table 2 illustrates, participants with strong color and word areas tended to show the best evidence for enhancement and suppression while participants with weaker color and word areas tended to show less or no evidence for the hypotheses. We would not necessarily predict strong correlations between Stroop activations and the color/word activations (and, indeed, those correlations are not particularly large: r = 0.28 between Stroop activation and color activation in the color area, n.s.; r = −0.23 between the Stroop activation and word activation in the word area, n.s.). Rather, what we would predict (and what is observed) is that participants who do not show strong color/word activation would often fail to show the predicted neural enhancement/suppression effects (under the assumption that we failed to identify an actual color/word processing area in that participant). These results further confirm the importance of obtaining accurate, functionally defined ROIs for feature-processing areas in each individual participant to test the enhancement and suppression hypotheses accurately.
Discussion
We used fMRI to investigate how feature-based attention modulates activation in posterior brain areas involved in processing attended and ignored features. Our results demonstrate both neural enhancement of the task-relevant feature (color) and neural suppression of the task-irrelevant feature (word). Our findings are consistent with previous reports of combined enhancement and suppression for spatial attention (Moran and Desimone, 1985; Tootell et al., 1998) and object-based attention (O'Craven et al., 1999; Gazzaley et al., 2005).
A critical aspect of this experiment was that the cortical areas involved in processing the features under study were independently identified in each individual participant, thus ensuring that the posterior activations and deactivations observed during the selective attention task were associated with the processing of the relevant features. In fact, participants who exhibited the strongest color and word areas also exhibited the strongest evidence for the enhancement and suppression hypotheses.
Relationship to previous work
Most studies of feature-based attention in animals have investigated the preferential processing of whole objects that match a target feature (Motter, 1994; Treue and Martínez Trujillo, 1999; Bichot et al., 2005), rather than attention to different features of a single object. One exception is a recent study by Mirabella et al. (2007) in which the activity of V4 neurons was recorded while macaques attended either to the color or orientation of a colored bar. These authors found that the activity of 28% of the neurons was significantly influenced by the attended feature. Some of the neurons responded more when the animal attended to color and others responded more when the animal attended to orientation.
As mentioned in the introduction, previous imaging studies of the Stroop task have yielded conflicting results regarding enhancement and suppression. Pardo et al. (1990) found no evidence for enhancement or suppression of putative color and word areas. Bench et al. (1993) found no evidence for enhancement, but a negative correlation between anterior cingulate activation and superior temporal cortices that they interpreted as reflecting inhibition of word processing. Carter et al. (1995) found activations in lingual gyrus and deactivations in left extrastriate cortex, consistent with both enhancement of color areas and suppression of word areas. Banich et al. (2000a,b, 2001) found no activation in areas associated with color processing, but reported evidence that left posterior areas previously associated with word processing may actually be enhanced rather than suppressed during the Stroop task [these results are modeled in a recent computational model of the Stroop task (Herd et al., 2007)]. We would argue that some of the variability in these results is due to the fact that previous studies did not independently identify color and word areas in individual participants.
Using a variant of the Stroop task with the names and faces of politicians and actors as stimuli, Egner and Hirsch (2005) showed that transient control mechanisms sensitive to high conflict modulate face processing when faces are task-relevant targets (similar to the current finding of enhancement of color areas in the Stroop task). They even found effects of attention on a trial-by-trial basis (an issue that we could not address using the present block design). In contrast to the present study, Egner and Hirsch (2005) failed to report significant suppression of the face area when the face was being ignored, whereas we did observe significant suppression of word areas during the Stroop task. Perhaps the most important difference between the studies is that Egner and Hirsch (2005) investigated attention to different objects (a printed word vs a face), whereas the current study investigated attention to different features of the same object (the text and color of a single printed word). One could therefore argue that the two studies are investigating different types of attention (object-based vs feature-based).
Theoretical implications
Most recent neurally inspired models of visual attention have been based on the idea of biased competition (Desimone and Duncan, 1995; Desimone, 1998; Kastner and Ungerleider, 2000). This model is most commonly applied in the domain of object-based attention where it is assumed that different visual objects compete for neural representation by suppressing each other. This competition can be biased by top-down attentional signals that enhance the processing of a target object and thereby increase its chances of winning the competition and influencing downstream processing. The present results are consistent with this model and support the idea that it can also apply to feature-based attention by biasing the competition between feature processing areas. According to this view, feature-based attention works by enhancing the processing of attended features, thereby biasing the competition between features in favor of the attended feature. Unattended features would be suppressed either directly by a top-down signal or indirectly by lateral inhibition from the attended feature. [See Treue and Martínez Trujillo (1999) for a related proposal in which attention enhances the neural response to objects that match a target feature even if the object is outside the neuron's spatial receptive field.]
One problem with this account is that in studies of object-based attention, attention to one feature of an object has been shown to enhance the processing of other features of that object, even if those features are irrelevant to the task. For example, O'Craven et al. (1999) showed participants a face superimposed on a house. One of the two stimuli was moving and participants had to attend either to the face, to the house, or to the movement. When attention was focused on motion, the face-sensitive area (the so-called fusiform face area or FFA) in ventral visual cortex was more active when the face was the moving stimulus than when the house was. Similarly, the place-sensitive area was more active when the house was moving than when the face was. They interpreted these findings as evidence for object-based attention. In contrast, we find that when participants attend to one feature of a printed word (the font color) then processing of other features of that word (its orthography) is suppressed.
One possible way to reconcile these findings is based on the distinction between integral and separable dimensions (Garner and Felfoldy, 1970; Garner, 1974). Separable dimensions can be processed relatively independently and attention can be focused on each of the dimensions separately. In contrast, integral dimensions are invariably processed together as a unitary whole and attention to one dimension virtually guarantees attention to both. For example, behavioral studies have demonstrated that shape and color can be processed relatively independently (they are separable), whereas hue and saturation cannot (they are integral).
The integral-separable distinction offers a hypothesis about when attention to a feature will enhance neural processing associated with other irrelevant features and when it will suppress that processing: for integral features, attention to one feature will enhance the processing of the other feature. For separable features, attention to one feature will suppress the processing of the other feature. Assuming that the font color and orthography of a printed word are separable dimensions, whereas the motion of a face is not separable from the face itself (i.e., they are integral), this hypothesis could explain both our results and those of O'Craven et al. (1999).
Another potential explanation is that the Stroop task demands suppression of the irrelevant feature more than the tasks used by O'Craven et al. (1999). Word reading is a much more prepotent response than is color naming and the response associated with the word always conflicts with the response associated with the font color in the Stroop task. If word processing were not suppressed, it could therefore significantly interfere with performance on the color-naming task. In the O'Craven et al. (1999) experiment, none of the tasks were significantly more practiced than any of the others and interference would presumably not be as severe, so perhaps suppression was not as important. If this hypothesis is correct, it suggests that attention can be very finely controlled depending on the degree of conflict (see, for example, Botvinick et al., 2001).
Footnotes
-
This work was supported by National Institutes of Health Grant R01-MH60655-01A1 and by the National Science Foundation. We thank Rob Park and Heather Scharr for help with the experiments. We thank Keith Newnham and Eve Gochis at the fMRI center, and Tor Wager for useful scripts. Patricia Reuter-Lorenz, Rick Lewis, Eric Lormand, Daniel Weissman, anonymous reviewers, and others provided helpful feedback on earlier drafts.
- Correspondence should be addressed to Dr. Thad A. Polk, Department of Psychology, University of Michigan, 530 Church Street, Ann Arbor, MI 48109. tpolk{at}umich.edu