The human posterior parietal cortex (PPC) is widely believed to subserve visually guided spatial behavior, including the control of visual attention, eye movements, and reaching. To explore the generality of this function, we measured human brain activity using functional magnetic resonance imaging during spatial and nonspatial shifts of auditory attention. Both spatial and nonspatial shifts of auditory attention evoked transient activity in the medial superior parietal cortex. These results reveal that the PPC is not exclusively devoted to visuospatial behavior; similar regions within a dorsomedial subcompartment provide a domain-independent reconfiguration signal for the control of spatial and nonspatial attention in both visual and nonvisual modalities.
- attentional control
- auditory attention
- posterior parietal cortex (PPC)
- superior parietal lobule (SPL)
- functional magnetic resonance imaging (fMRI)
The deployment of visual attention to locations, features, and objects grants selected information access to awareness and memory, improves the speed and accuracy of detection and identification (Posner et al., 1980), and modulates neural activity in the corresponding regions of the visual cortex (Motter, 1994; Yantis et al., 2002; Liu et al., 2003). For example, visual attention can select retinotopically precise spatial locations (DeYoe et al., 1996; Tootell et al., 1998; McMains and Somers, 2004), one of two visual objects that occupy the same spatial location (O'Craven et al., 1999; Serences et al., 2004), or features such as color and motion that are rendered in the same visual stimuli (Liu et al., 2003). Of course, selective attention also operates in modalities other than vision. Indeed, the earliest behavioral studies of selective attention were conducted in the auditory modality (Cherry, 1953), and neurophysiological evidence has documented the effects of attention in nonvisual modalities (Petkov et al., 2004).
Regions of the prefrontal cortex (PFC) and posterior parietal cortex (PPC) have long been implicated in the voluntary control of oculomotor behavior and of covert shifts of visuospatial attention (Colby et al., 1996; Kastner and Ungerleider, 2000; Vandenberghe et al., 2001; Corbetta and Shulman, 2002; Yantis et al., 2002; Bisley and Goldberg, 2003). The PPC, in particular, is widely thought to play a special role in coordinating visually guided spatial behavior, including attending, looking, and reaching (Culham and Kanwisher, 2001; Andersen and Buneo, 2002; Astafiev et al., 2003), as well as in coding the spatial and nonspatial attributes of auditory stimuli (Grunewald et al., 1999; Cohen et al., 2004; Gifford and Cohen, 2005; Mullette-Gillman et al., 2005). Little is known, however, about the neural substrate of spatial and nonspatial attentional control in nonvisual modalities; the extent to which the frontoparietal network of areas is involved in goal-directed attentional reorienting within the auditory modality remains an open question.
We measured cortical activity in human observers during spatial shifts of auditory attention (the “spatial experiment”) and during nonspatial shifts of auditory attention (the “nonspatial experiment”) using rapid event-related functional magnetic resonance imaging (fMRI). In the spatial experiment, attention was shifted between two voices in the left and right ear, respectively; in the nonspatial experiment, the two voices (one female, one male) were presented binaurally and simultaneously, yielding a perception in which the voices appeared to originate from the same spatial location at the center of the head; thus, spatial location could not be used as a basis for selection.
The redeployments of attention evoked by this procedure were strictly voluntary and endogenous (targets and distractors were matched on sensory dimensions and differed only in the attentional instruction associated with each target), permitting us to measure neural events that were time locked to the voluntary act of shifting or maintaining attention isolated from sensory responses that would have been evoked by a discrete attention cue.
Materials and Methods
Subjects. Twelve neurologically healthy young adults (eight women; age, 20–31 years; mean age, 23.8 years) for the spatial experiment and 13 neurologically healthy young adults (seven women; age, 20–34 years; mean age, 24.6 years) for the nonspatial experiment were recruited from the Johns Hopkins University community. Informed consent was obtained from each subject in accordance with the human subjects research protocol approved by the Institutional Review Board at the Johns Hopkins University.
Stimuli and procedure. Auditory stimuli consisted of 16 letters recorded digitally using the Computerized Speech Lab software (Kay Elementics Corporation, Lincoln Park, NJ). A female and male talker, both native English speakers, produced 16 letters and two digits (“2,” “4,” “A,” “C,” “F,” “G,” “H,” “J,” “K,” “M,” “N,” “P,” “R,” “T,” “U,” “V,” “X,” “Y”). Each stimulus (letters and digits) was uttered and recorded 20 times. The utterance closest to 240 ms was selected and edited to be precisely 240 ms. The edited versions were clearly identifiable and not distorted. An additional 10 ms of silence was added at the end of each utterance, yielding a total duration of 250 ms. The auditory stimuli were presented to the subjects via a custom auditory headphone set. Sound was carried through hollow tubes that terminated in a headphone set. The tubes were inserted into a pair of ear plugs, which were inserted into the subject's auditory canal, and covered with a sound-attenuating shield.
Subjects were presented with a small fixation cross that they were instructed to fixate throughout the duration of each experimental run. At the beginning of each run, participants were verbally instructed to begin by attending to either the male or the female auditory stream. In the spatial experiment, one auditory stream was spoken monaurally to each ear (female stream in the left ear and male stream in the right ear). In the nonspatial experiment, both streams were presented binaurally (and therefore spatially superimposed “in the center of the head”). The two streams (in both experiments) were synchronized, perfectly overlapping in time. Subjects pressed a button whenever they detected the digits “/two/” or “/four/” in the attended stream. For one half of the subjects, the digit /two/instructed them to shift attention from the currently attended stream to the unattended stream (e.g., male to female), whereas the digit /four/instructed them to maintain their attention on the currently attended stream; this mapping was reversed for the remaining subjects.
The order of distractor letters as well as targets was random with a few constraints: (1) no more than two hold events occurred in succession; (2) targets never appeared in the unattended streams; (3) the same stimuli were never presented at the same time; and (4) distractor stimuli did not appear twice in immediate succession. Targets within the attended stream were separated by a temporal interval that was pseudorandomly jittered between 3 and 5 s, with an average intertarget interval of 4 s (Dale and Buckner, 1997). Such temporal jittering allows for the extraction of individual event-related blood oxygenation level-dependent (BOLD) signal time courses after the target events (Burock et al., 1998). Each subject performed 2 practice runs (before scanning) and 10 experimental runs; each run was 2 min, 28 s in duration and included eight occurrences of each of the four target types: attend female, attend male, switch attention from female to male, and switch attention from male to female. The subjects were instructed to hold attention on the currently attended stream even if they thought they had missed a target. Only detected target events were included in our analysis.
Monitoring eye position. To check for the possibility that switch-related activity could be attributed to systematic movements of the eyes away from the central fixation region or to subjects closing their eyes at certain points during the task, eye position was monitored for five subjects while they performed the task in the scanner via a custom-made video camera. The output of the camera was digitally recorded and later analyzed for eye movements. A calibration procedure established that eye-position deviations of ≥2.5° could reliably be detected, as could blinks and eyelid closure. All of the subjects kept their eyes open and directed to the fixation point throughout the task, and no significant changes in fixation were made during the experimental runs.
fMRI data acquisition. Twelve (spatial experiment) and 13 (nonspatial experiment) neurologically healthy young adults were scanned on a 3T Philips Gyroscan ACS-NT MRI scanner. Whole-brain functional scans were acquired with a SENSE head coil (MRI Devices, Waukesha, WI) and a T2*-weighed echoplanar imaging (EPI) sequence [echo time (TE), 30 ms; repetition time (TR), 1480 ms; flip angle, 65°). Twenty-seven transverse slices were acquired with SENSE level 2 (field of view, 240 mm; matrix, 64 × 64; slice thickness, 3 mm with 1 mm gap). These parameters allowed for a whole-brain coverage with 100 volume acquisitions per run. High-resolution anatomic images were acquired with a T1-weighted 200 slice MPRAGE sequence with SENSE level 2 (TR, 8.1 ms; TE, 3.8 ms; flip angle, 8°; prepulse T1 delay, 852.1 ms; 1 mm isotropic voxels).
fMRI data analysis. Data from three subjects in the nonspatial experiment were removed because of excessive head motion (two participants) and scanner malfunction (one participant). Neuroimaging data were analyzed using Brain Voyager software (Brain Innovation, Maastricht, The Netherlands). Functional data were first motion and slice-time corrected and high-pass filtered to remove components occurring three or fewer times over the course of the run. Spatial smoothing was not used. To correct for between-scan motion, each subject's EPI volume was coregistered to the 10th run acquired for that subject (the last functional run performed before the anatomical scan). After this interscan motion correction, functional data were registered with the anatomical volume.
Using a general linear model, the hemodynamic response function was estimated for each event type using deconvolution. The time of the onset of the target and the following 11 time points (0–16.3 s) was used to estimate scaled fit coefficients (β weights) at each of the modeled time points for each event (B. Ward, http://afni.nimh.nih.gov/pub/dist/doc/manual/3dDeconvolve.pdf).
To test for shift-related transient responses, shift events were contrasted with hold events. Six regressors for time points 2–7 (∼3–10 s after the onset of the target) were isolated for each event. For the shift events (shift to female and shift to male), contrast weights were assigned positive values, and for the hold events (hold female and hold male), contrast weights were assigned negative values.
For each contrast, the single voxel threshold in the group data was set to t(12) = 4.0 (for spatial) and t(10) = 4.0 (for nonspatial) (p < 0.002, uncorrected). A spatial cluster extent threshold was used to correct for multiple comparisons using AlphaSim with 2000 Monte Carlo stimulations taking into account the entire EPI matrix (Ward, http://afni.nimh.nih.gov/pub/dist/doc/manual/AlphaSim.pdf). This procedure yielded a minimum cluster size of 0.081 ml (three voxels in the original acquisition space) with a map-wise false-positive probability of p < 0.01.
Event-related averages of the BOLD signal were extracted from each cluster as revealed by the contrast. The time course was extracted for each subject using the following procedure: a temporal window was defined extending from 6 s before the target onset (four TRs) to 16 s after the target onset. Time courses were then averaged across all subjects for each of the four event types. The baseline (or 0% signal change) was defined as the mean activity during the 6 s preceding each target. It is important to note that negative deflections in activity cannot necessarily be interpreted as “deactivations,” but rather as relative differences in activity after a given event.
Mean response times (RTs) and probability of correct responses for the two experiments are summarized in Table 1. It is important to note that subjects were instructed to be as accurate as possible, but speed was not emphasized. Overall accuracy in the spatial experiment was 92%. A two-way, repeated-measures ANOVA was conducted for the RT and accuracy data, respectively, with target type (hold and shift) and target location (left and right) as within-subject factors. No significant main effects or interaction were observed for either RTs or error rates (F < 1).
Overall accuracy in the nonspatial experiment was 88%. A two-way ANOVA was conducted for the RT and accuracy data, respectively, with target type (hold and shift) and target gender (male and female) as within-subject factors. The ANOVA revealed a significant main effect of target type on RTs (F(1,9) = 6.47; p < 0.05): subjects were slower to respond to shift targets (M = 778) than to hold targets (M = 707). In addition, the ANOVA revealed a significant main effect of target gender on accuracy (F(1,9) = 11.56; p < 0.01) as well as a significant interaction between target type and gender (F(1,9) = 13.70; p = 0.01).
Only correctly detected target events were included into the neuroimaging analysis. The slightly lower accuracy in the nonspatial experiment suggests that it was the more challenging of the two; this was corroborated by the fact that subjects generally required more training on this task. In the nonspatial experiment, subjects missed more male targets (i.e., hold male and shift from male to female) than female targets (i.e., hold female and shift from female to male), a 7% difference. This minor behavioral difference, however, was not accompanied by a corresponding difference in the pattern of brain activity in the two experiments (see below).
We analyzed the fMRI data in this event-related design by using multiple regression to estimate weights for regressors representing each of the four target types and contrasting the weights associated with the shift and hold targets, respectively. Cortical regions exhibiting a significantly greater BOLD response after shift than after hold targets in the spatial experiment included the PPC [right precuneus/superior parietal lobule (SPL)], left SPL, right cingulate, and superior PFC [right precentral sulcus (PCS)/middle frontal gyrus (MFG)] (Fig. 1a, Table 2). The latter region is thought to contain the human homolog of the frontal eye field (Petit et al., 1997; Courtney et al., 1998; Astafiev et al., 2003). The same contrast applied to the nonspatial paradigm revealed an area in the right precuneus/SPL (Fig. 2a), as well as other areas listed in Table 2 including the bilateral PCS/MFG (Table 2). These areas are among those that have been implicated previously in the control of spatial and nonspatial visual attention using very similar tasks (Yantis et al., 2002; Liu et al., 2003; Serences et al., 2004). The group mean event-related BOLD time course for the four target types extracted from the SPL are shown in Figure 1, b and c, for spatial attention shifts and in Figure 2c for nonspatial attention shifts.
The event-related time courses revealed an increase in BOLD signal after shift events compared with hold events. We observed no difference in activity for the attend-female and attend-male events at the onset of the target event (i.e., at time 0) before that target could have evoked an event-related response. This indicates that these shift-related regions do not continuously maintain a spatial- or gender-specific attentive state throughout the task, but instead issue a transient attentional reconfiguration signal when a shift of attention is required. Furthermore, the BOLD activation magnitude did not differ significantly after “shift from male to female” versus “shift from female to male” targets (F < 1). This suggests that the initiation of spatial and nonspatial auditory attention shifts does not have a clear anatomical separation that is specific to the location or gender of the sound.
The present study examined cortical activity changes measured with fMRI during shift of attention within an auditory modality. Two types of auditory shifts were examined, spatial and nonspatial attentional shifts, in response to auditory target events. Because the sensory input and the response demands of the task were nearly identical for all target events, we were able to examine cortical activity that uniquely reflected the initiation of a voluntary shift of attention for spatial and nonspatial control of auditory attentional selection. Transient increases in activity, time locked to the initiation of an auditory attentional shift, were observed in the SPL and some regions of the PFC. These dorsal frontoparietal regions have been reported previously as being involved in subserving goal-directed attentional control in vision (Corbetta and Shulman, 2002; Serences and Yantis, 2006), and several previous studies from our laboratory have revealed similar transient increases in activation in these anatomical areas after different types of attentional shifts, including shifts of visual attention between spatial locations (Yantis et al., 2002), between features (Liu et al., 2003) and objects (Serences et al., 2004), and between sensory modalities (Shomstein and Yantis, 2004).
We have emphasized the common outcomes of the two experiments: both required shifts of attention and both evoked transient activity in the medial SPL, mirroring results observed in several previous reports. These findings support the claim that this may be the source of a domain-independent reconfiguration signal. The nonspatial and spatial experiments differed in two respects. First, the nonspatial experiment, but not the spatial experiment, yielded significantly slower behavioral responses to shift than to hold targets (Table 1). Second, the nonspatial experiment evoked shift-related cortical activity in several regions that were not activated by the spatial experiment, and vice versa (Table 2). One interpretation of these differences is that the nonspatial experiment required that the overlapping voices be segmented (in the spatial experiment, the spatial separation of the two voices provided segmentation “for free”), and additional perceptual mechanisms were recruited to perform the segmentation and to implement the shift of attention between these streams This interpretation is speculative at this point and will require targeted investigations to fully unravel the cortical mechanisms of perceptual organization and attention in spatial and nonspatial tasks.
An alternative account of the present results is that they do not reflect the control of auditory attention at all. There is some evidence that spoken words evoke activity in an extrastriate visual region called the visual word form area (VWFA) (Cohen and Dehaene, 2004). It is conceivable that when spoken letters are perceived they are automatically transformed into a visual representation and that selective attention operates on this representation (and not an auditory one). Although the existence of the VWFA and its functional properties are controversial (Price and Devlin, 2003, 2004), and we know of no direct evidence that rapidly spoken letters are automatically represented visually, this alternative remains a logical possibility.
Several points against this alternative can be made. First, we have shown previously that shifts of attention between visually presented and spoken letters modulates cortical activity in the superior temporal gyrus (along Heschl's gyrus), the site of the early auditory cortex (Shomstein and Yantis, 2004), when attention is directed to auditory stimuli; activity in the extrastriate cortex was selectively modulated when attention was directed to visual stimuli. Thus, directing attention to spoken letters enhances activity in auditory, not visual, cortical regions. Second, a visual representation of male and female voices that are spatially superimposed (as in the nonspatial experiment) offers no obvious basis for visually mediated selection. The representation of these items would somehow have to be marked, in a visual manner, according to whether the letters in question were spoken by the male or female voice in order for visual attention mechanisms to operate on them.
Finally, in the spatial experiment, there is an obvious candidate for representing the two voices: because they are presented in different spatial locations, they could, on this visual account, be represented spatially in the extrastriate cortex. A great deal of evidence exists that attention to left versus right visual space evokes robust and systematic modulation of the corresponding locations in the visual cortex. However, when we contrasted hold attention left versus hold attention right in the spatial experiment, we observed no differential modulation of the spatially organized visual cortex. This contrast also failed to reveal spatially specific modulations of the left and right auditory cortices. The absence of attentional modulation of the auditory cortex is not surprising given the substantial binaural cortical representation in the auditory system (unlike the strictly contralateral organization within the early visual cortex). However, the absence of attentional modulation of the visual cortex constitutes evidence against the claim that attention was operating on a visuospatial representation, rather than an auditory one as we have claimed.
Our findings support two conclusions about the role of the PPC in goal-directed attentional control. First, transient BOLD increases in dorsomedial PPC are time locked to the initiation of auditory attention shifts. Several previous studies have observed such transient SPL activity time locked to shifts of visual attention (Yantis et al., 2002; Liu et al., 2003; Serences et al., 2004) or between vision and audition (Shomstein and Yantis, 2004). The present study provides the first evidence for the involvement of the PPC in the control of attention in a purely nonvisual modality. Second, we observed transient activity increases in the PPC after signals to shift both spatial and nonspatial auditory attention. This finding corroborates previous findings in the visual domain of nonspatial attentional control in the PPC (Coull and Nobre, 1998; Liu et al., 2003; Serences et al., 2004; Shomstein and Yantis, 2004), but once again, for the first time in a purely nonvisual modality.
The present data do not permit any conclusion about whether the very same subregions of the PPC initiate spatial and nonspatial shifts of attention in all sensory modalities. The evidence for PPC involvement in multiple modalities has been performed in separate experiments involving different groups of people, and no direct comparison has yet been conducted. The data suggest the hypothesis that there may be a single region that is responsible for shifts of attention in multiple modalities. Additional experiments will be required to test this hypothesis versus an alternative one, according to which separate specialized subcompartments of the PPC are each responsible for the control of attention shifts in different sensory domains.
The present results do show, however, that nonspatial shifts of auditory attention are initiated by transient increases in cortical activity in the dorsomedial PPC, suggesting that the attentional control functions of the PPC are not devoted exclusively to the visual and spatial domains.
This work was supported by National Institutes of Health Grant R01-DA13165 (S.Y.). We thank J. T. Serences, T. Liu, C. E. Connor, A. Shelton, B. Rapp, J. B. Sala, T. Brawner, and K. Kahl for comments, advice, and assistance.
Correspondence should be addressed to Dr. Sarah Shomstein, Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213. E-mail:.
Copyright © 2006 Society for Neuroscience 0270-6474/06/260435-05$15.00/0
aBecause the baseline for these plots is the mean of the signal in the 6 s preceding the target event and because, during the course of a run, this interval contained both hold and shift events, it cannot be interpreted as a “neutral” baseline state. Therefore, the changes in activity after shift and hold events can only be understood in relative terms (i.e., that shift events evoke a greater response in these areas than hold events).