Abstract
EEG studies from subdural electrodes have demonstrated a face-specific event-related potential (face-N200) recorded from human ventral occipitotemporal cortex. The insensitivity of face-N200 to task manipulations has supported the proposal that face-N200 reflects an initial obligatory response to faces. This result stands in striking contrast to results of neuroimaging studies that have demonstrated strong task sensitivity of the fusiform hemodynamic response evoked by faces, and thus has created a paradox in the face perception literature. We recorded field potentials directly from the cortical surface of 16 patients while they selectively attended to faces or houses. Here we report that face-specific gamma activity recorded at face-N200 sites is strongly modulated by selective attention, while face-N200 is not. Our results reconcile prior electrophysiological and hemodynamic studies of face perception, and suggest that attentional modulation of the face response follows an initial phase that is largely insensitive to attention.
Introduction
The face-N200 is a face-specific negative potential recorded from the surface of ventral occipitotemporal cortex (VOTC) primarily within the fusiform and adjacent inferior temporal gyri (Allison et al., 1994, 1999; McCarthy, 2001) that occurs at ∼180–200 ms. Face-N200 is largely insensitive to face repetition and to higher cognitive task manipulations, and this insensitivity has led to the proposal that face-N200 reflects an initial and obligatory neural response to faces (McCarthy et al., 1999; Puce et al., 1999). In contrast to face-N200's task insensitivity, neuroimaging studies of face processing using both PET (Haxby et al., 1994) and fMRI (Wojciulik et al., 1998; Vuilleumier et al., 2001; Furey et al., 2006) have revealed that the hemodynamic response along VOTC, primarily at fusiform gyrus, is significantly modulated by selective attention and other cognitive task demands. To date, the sensitivity of the VOTC face-N200 to selective attention has not been directly tested. However, scalp-recorded electroencephalography (EEG) (Cauquil et al., 2000; Eimer, 2000; Carmel and Bentin, 2002) and magnetoencephalography (MEG) (Furey et al., 2006) studies have found little modulation of early EEG/MEG face-specific responses (the N170 and M170, respectively) [but see Jacques and Rossion (2007) and Mohamed et al. (2009)]. The discrepancy between the results of electrophysiological and hemodynamic experiments has created a paradox in the face perception literature.
A resolution to this paradox might be found in induced EEG oscillations. High-frequency oscillations in the gamma band (30–100 Hz) have been closely associated with hemodynamic activity (Mukamel et al., 2005; Niessing et al., 2005; Lachaux et al., 2007; Koch et al., 2009; Sirotin and Das, 2009; Ojemann et al., 2010) and are thought to support cognitive and sensorimotor processes such as visual feature integration (Singer and Gray, 1995; Rodriguez et al., 1999; Tallon-Baudry and Bertrand, 1999) and selective attention (Steinmetz et al., 2000; Fries et al., 2001; Lachaux and Ossandón, 2008). Indeed, it is well known that gamma oscillations are modulated by attention (Steinmetz et al., 2000; Fries et al., 2001; Lachaux and Ossandón, 2008). However, it is not clear whether category-specific oscillations (Fisch et al., 2009) are further sensitive to attentional manipulations within that category.
We recorded local field potentials (LFPs) directly from the cortical surface of 16 patients being evaluated for epilepsy surgery (Spencer et al., 1982). The sensitivity of the face-induced gamma response to selective attention was investigated by showing patients a display that contained both faces and houses and directing them, in different blocks, to covertly attend to one or the other. The evoked face-N200 was not sensitive to selective attention, whereas a subsequent slow evoked potential and the induced gamma response were significantly larger when patients attended to faces than to houses.
Materials and Methods
EEG acquisition.
Recordings were conducted at Yale–New Haven Hospital and obtained from 16 patients (ages 16–42 years, 12 female, 4 male) with medically intractable epilepsy who were being evaluated for possible surgery by the Yale Epilepsy Surgery Program (Spencer et al., 1982). In these patients, strips or grids of stainless steel electrodes (2.2 mm surface diameter) were placed subdurally on the cortical surface. The placement of the strips was determined by the clinical needs of each patient, and thus electrode locations varied across patients. The studies reported here were among several sensory and cognitive event-related potential (ERP) experiments in which each subject participated, typically 4–8 d following implantation of electrodes. At the time of participation, medication levels to control seizures and postoperative pain varied across patients. The EEG experiments were not conducted immediately before or after seizures, nor were any of our sites of interest revealed to be at epileptogenic cortex. The EEG protocol was approved by the Institutional Review Board of the Yale University School of Medicine. All participants provided informed consent.
LFPs were recorded simultaneously from up to 128 electrode sites and amplified with reference to a mastoid electrode using an SA Instruments EEG amplifier system with a 0.1–100 Hz bandpass. We recorded from 1648 sites, 598 of which were located on the cortical surface of the right hemisphere (including 140 on the VOTC surface), 748 of which were located on the cortical surface of the left hemisphere (including 166 on the VOTC surface), and 102 of which were located along the medial wall (e.g., precuneus, cingulate gyrus). Penetrating depth electrodes, often targeted to medial temporal structures, accounted for 200 of the recording sites. The continuous EEG signal was acquired and digitized with 14-bit resolution using a Microstar 4200 A/D data acquisition board. The digitized signal was sampled at 250 Hz and written to disk using a custom PC-based acquisition system. A digital code unique to each experimental condition was recorded in a separate channel at the onset of each stimulus presentation.
Stimuli and procedure.
Stimulus presentation was controlled using the CIGAL software package (Voyvodic, 1999) and displayed on a CRT monitor (640 × 480 pixels) positioned on a table over the patient's bed. The viewing distance was adjusted for patient comfort.
Screening task.
All patients participated in a screening task designed to identify face-specific N200 electrode sites. In this task patients viewed sequentially presented exemplar images from different object categories such as faces, tools, nouns, and scrambled faces (Fig. 1c,f). Data collection for the attention task occurred over the span of 3 years, during which time there was a change in the stimuli used for the screening task. As a result, 10 of the patients participated in version 1 of the screening task, while 6 patients participated in version 2.
In version 1, patients viewed sequentially presented black and white images randomly selected from one of seven categories: Faces, Nouns, Number Strings, Scrambled Faces, Single Letters, Scrambled Letters, and Single Numbers (Fig. 1c). The scrambled faces were constructed by performing a two-dimensional (2D) Fourier transform of each face stimulus, permuting the phase spectrum, and then performing an inverse transform. These “phase-scrambled” faces thus had the same spatial frequencies and overall luminance as the face stimuli, but were unrecognizable as faces, thus allowing us to differentiate between electrode locations selective for the low-level features of a face as opposed to those selective to the high-level percept of the intact face. The same process was used to create the scrambled letter strings. Participants were instructed to press the spacebar as quickly as possible when a target (a grayscale image of a flower) appeared. The target stimulus would appear on ∼12% of all trials. Each image was displayed for 500 ms with a jittered interstimulus interval (ISI) that varied randomly between 1700 and 1900 ms. Patients viewed a total of 48 exemplar images from each category. Version 2 was similar to version 1 except the stimuli were presented in color and were taken from one of six categories: Animals, Faces, Fruits, Letter Strings, Phase-Scrambled Faces, and Tools (Fig. 1f). The ISI varied between 1800 and 2200 ms. Participants were instructed to press the spacebar when the target, a large circle, appeared. The target stimulus would appear on ∼15% of all trials. Patients viewed a total of 60 exemplar images from each category.
Attention task.
Each stimulus in the attention task comprised a central fixation cross-flanked horizontally and vertically by a pair of house pictures and a pair of face pictures. Stimuli were visually degraded to increase task difficulty (Fig. 2a) and measured 525 × 405 pixels when the face pair was on the vertical axis and 440 × 350 pixels when the face pair was on the horizontal axis. The 100 stimulus images were divided such that 25% contained matching house and matching face pictures, 25% contained only matching house pictures, 25% contained only matching face pictures, and 25% contained no matching pictures. Across all stimuli, the face and house pairs appeared in the horizontal and vertical locations on an equal number of trials. Stimulus images were displayed for 200 ms with a stimulus onset asynchrony that varied between 2500 and 2800 ms. Participants were instructed to maintain central fixation throughout all trials. In separate runs participants were instructed to covertly attend only to the face stimuli and detect trials containing a matching face pair, or to covertly attend only to the house stimuli and detect trials containing a matching house pair. Participants indicated the presence of match by pressing the spacebar on a standard keyboard.
ERP analysis.
ERP analyses were performed using custom MATLAB software. Preprocessing of the EEG data consisted solely of the removal of line noise by a 60 Hz notch filter. No other preprocessing or artifact rejection was applied before analysis. Baseline normalized ERPs were created by signal averaging the EEG signal across trials for each experimental condition and subtracting from each time point the average of a 100 ms prestimulus epoch. A moving average window with a span of seven time points was used to smooth the ERP waveforms.
The screening task data were used to identify electrodes demonstrating category specificity to faces (for details, see supplemental material, available at www.jneurosci.org). Twenty-two face-specific sites (13 right hemisphere, 9 left hemisphere) were identified (Fig. 1a,d). Specifically, right hemisphere sites were located along the ventral fusiform (seven) and inferior temporal gyri (six). Left hemisphere sites were also located along the ventral fusiform (one) and inferior temporal gyri (eight). At these sites, faces evoked a sharp negative potential (−86 μV) that peaked 196 ms (face-N200) after onset of the face (Fig. 1b,e). In total, the 22 face-specific sites were all located along VOTC (Fig. 2b) and represented 1.3% of all sampled electrodes. We further anatomically localized 11 electrodes sites along the parahippocampal gyrus (PHG) (see Fig. 4a), given this region's sensitivity to house stimuli (Epstein and Kanwisher, 1998).
To test the effect of directed attention on the ERP at these face-specific locations, we estimated the area under the curve (AUC) of the ERP to each attention condition during ten 75 ms epochs from stimulus onset to 750 ms after stimulus onset. These estimates were then analyzed using repeated-measures ANOVA and post hoc t tests.
ERSP analysis.
Before event-related spectral perturbation (ERSP) analysis, we removed the mean signal averaged ERP from the raw EEG signal for each trial. This ensured that any significant spectral differences between categories did not merely reflect the frequency composition of the phase-locked ERP. As a result of this approach, the frequency-domain analysis will be insensitive to spectral changes that undergo phase resetting (i.e., phase-locked “evoked” EEG responses). However, these spectra are captured in the time-domain analysis (i.e., ERP), resulting in a full characterization of the data. Furthermore, the evoked responses in these data were below 30 Hz and, therefore, did not contribute to any of the reported gamma band effects (see supplemental Fig. 1, available at www.jneurosci.org as supplemental material). Autospectra were estimated using multitaper fast Fourier transform with five independent Slepian sequences using the Chronux function library (Mitra and Bokil, 2007) (http://www.chronux.org) for MATLAB. Gamma band (30–100 Hz) power was estimated using a 200 ms moving window (40 ms steps) over the duration of each trial. A wider (500 ms) moving window was used for estimating lower frequency spectra (<30 Hz). ERSP estimates were created by calculating the ratio of log power (db) between the poststimulus and prestimulus (−1000 ms to stimulus onset) epochs and then averaging across all trials and all 22 sites of interest. However, only those estimates in which the moving window did not extend into the poststimulus epoch were used in the prestimulus baseline for poststimulus spectral changes. Statistical analyses were performed on the average spectral estimates from six discrete frequency bands; delta (0–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), low gamma (30–60 Hz), and high gamma (60–100 Hz).
Electrode localization.
To facilitate visualization of the electrode locations across participants, the approximate location of each face-specific recording site was represented on the ventral surface of a standard brain. The locations of individual electrodes were derived from T1-weighted MR images obtained on the day following implantation in which the susceptibility artifact of each electrode was visualized. Using 2D axial MR images of each patient's brains, we measured the distance from the cerebral aqueduct to the edges of the brain (e.g., the posterior edge and right edge) and the distance from the cerebral aqueduct to the recording site. The ratio of the former to the latter was then used to estimate the electrode's location on the ventral surface of a standard brain. This approach allows for a convenient graphical representation of the overall distribution of electrodes on the brain's ventral surface (e.g., Fig. 2b). However, as the exact gyral and sulcal boundaries of the ventral brain varied considerably among our subjects, this summary view does not precisely reflect the exact position of any individual electrode.
Data from the screening task were used to identify face-specific electrodes. Guided by previously published criteria (Allison et al., 1999), face-specific sites were defined as those with a peak negativity occurring between 160 and 240 ms after stimulus onset (N200) that was at least −50 μV and at least twice as large to faces than to any other stimulus categories. Similar selection criterion (i.e., a category response twice as large as to all other tested categories) has previously been used in both single cell (e.g., Perrett et al., 1982; Baylis et al., 1985; Leonard et al., 1985) and human LFP (e.g., Puce et al., 1997, 1999; Allison et al., 1999; McCarthy et al., 1999) investigations of face specificity. A computer algorithm was used to identify potential face-specific sites for further visual inspection by the authors. Twenty-two sites were so identified.
Results
Attention modulation of the ERP
Attention to faces and attention to houses evoked a sharp negative potential that peaked (−28 and −25 μV, respectively) 180 ms after onset of the stimulus (Fig. 2c). A 2 (attention) × 10 (epoch) repeated-measures ANOVA of the AUC estimates (see Materials and Methods) showed main effects of attention (F(1,21) = 9.09, p = 0.007), time (F(9,21) = 7.17, p < 0.001), and a significant attention by time interaction (F(9,21) = 7.33, p < 0.001). To investigate the timing of the attention effect relative to the face-N200, we used paired-samples t tests of the simple effect of attention at each level of time. There was no significant effect of attention at each of the first three 75 ms epochs (comprising 0–225 ms after stimulus, all p values >0.05). Importantly, this indicates no significant effect of attention during the epoch including the face-N200 (epoch 3: 150–225 ms). In each of the subsequent seven epochs, which spanned 225–750 ms after stimulus, attention to faces evoked a significantly more negative ERP than attention to houses (all p values <0.05) (Table 1).
To achieve a more precise estimate of the timing of the attention modulation effect, we used paired-samples t tests to compare the ERP response across attention conditions at each sampled time point (i.e., every 4 ms). We did not correct these time point by time point contrasts for multiple comparisons, as increasing the alpha threshold would have biased the tests in favor of our hypothesis that attentional effects would not be evident before the face-N200. Significant differences due to attention appeared 240 ms after stimulus onset, at which time attended faces evoked a slow negative shift that was sustained until ∼1300 ms into the trial (Fig. 2d).
Attention modulation of the gamma ERSP
The broadband gamma response data were analyzed using a 2 (attention: face or house) × 5 (epoch: 0–250 ms, 250–500 ms, 500–750 ms, 750–1000 ms, 1000–1250 ms, 1250–1500 ms) × 2 (frequency: low gamma or high gamma) repeated-measures ANOVA. The “frequency” factor was included to test whether the broadband gamma response (30–100 Hz) had functionally distinct low (30–60 Hz) and high (60–100 Hz) subbands. The test revealed significant main effects of attention (F(1,21) = 22.98, p < 0.001) and time (F(5,21) = 15.32, p < 0.001), but no main effect of frequency (F(1,21) = 1.06). There were significant two-way interactions of attention by time (F(5,105) = 4.85, p < 0.001) and of time by frequency (F(5,105) = 8,26, p < 0.001). The attention by frequency (F(1,21) = 0.89) and three-way attention by time by frequency (F(5,105) = 1.37) interactions were not significant. Importantly, the absence of a significant three-way interaction indicates that the gamma subbands did not meaningfully differ as function of attention over time. Therefore, the subsequent post hoc t tests were performed on the full broadband gamma response. To investigate the timing of the attention effect, we used paired-samples t tests of the simple effect of attention at each level of time. Attention to faces resulted in a significantly larger gamma response than attention to houses in each of the four 250 ms epochs between 250 and 1250 ms after stimulus onset (i.e., epochs 2–5), p values <0.05, but not in epoch 1 or epoch 6, p values >0.05 (Table 1, Fig. 3).
The magnitude of induced gamma at was considerably smaller at PHG than at face-specific sites (Fig. 4). Inspection of the effect of attention at each of the five epochs revealed a small increase in the gamma response during attention to houses during epoch 3 (500–750 ms) and epoch 4 (750–1000 ms) (Fig. 4c). AUC estimates from the PHG sites were analyzed using a 2 (attention: face or house) × 5 (epoch: 0–250 ms, 250–500 ms, 500–750 ms, 750–1000 ms, 1000–1250 ms, 1250–1500 ms) × 2 (frequency: low gamma or high gamma) repeated-measures ANOVA. The ANOVA revealed no significant effects (all p values >0.05).
Attention modulation of the subgamma ERSPs
Although our hypothesis only concerned the gamma frequency band, a 2 (attention: face or house) × 4 (time: 0–400 ms, 400–800 ms, 800–1200 ms, 1200–1600 ms) repeated-measures ANOVA was used to analyze each of the four lower frequency bands. The results for the delta, theta, and alpha bands were qualitatively identical. There was a significant main effect of time (Fdelta(3,21) = 23.61, Ftheta(3,21) = 44.79, Falpha(3,21) = 50.90, all p values <0.001), but no significant effect of attention or an attention by time interaction (all p values >0.05).
The ANOVA for the beta band revealed significant main effects of attention (F(1,21) = 5.00, p = 0.036) and time (F(3,21) = 4.42, p = 0.007) as well as a significant attention by time interaction (F(3,21) = 3.50; p = 0.021). Paired-samples t tests of the simple effect of attention at each level of time showed that attention to houses resulted in a significantly greater loss of beta power than attention to faces during the 400–800 ms epoch (t(21) = 3.15; p = 0.005) and the 800–1200 ms epoch (t(21) = 2.25; p = 0.035).
Discussion
We have demonstrated for the first time that the electrophysiological responses evoked by faces along VOTC are influenced by selective attention, but that the effects of attention are not reflected in the face-specific N200. Rather, the effects of selective attention were manifest as a slow sustained negative ERP, and increased gamma power. Thus demonstrating a heretofore unseen temporal evolution of response properties and face-N200 sites and resolving the paradox created by incongruent findings between electrophysiological and hemodynamic face perception studies.
Attention modulation of the ERP
Directing attention to faces did not affect the face-specific N200 ERP (Allison et al., 1994, 1999), but did significantly affect a longer latency slow negativity beginning at ∼240 ms. The invariance to selective attention of the early component is consistent with prior studies that concluded that face-N200s recorded directly from human VOTC are obligatory and largely immune to top-down modulation (Puce et al., 1999). The subsequent differentiation is similar to the “selection negativity” observed when attention is allocated to a particular stimulus feature (cf. Anllo-Vento et al., 1998). However, a similar slow negativity at face-specific intracranial sites, dubbed the N700, has been previously reported (Allison et al., 1999) and is evident in the results of the current screening task (Fig. 1b,e). In theses cases the face stimuli were not the attended stimulus nor explicitly privileged in any way (see supplemental material, available at www.jneurosci.org). The relationship, if any, of the attention modulated negativity reported here, the N700, and the selection negativity will require future elucidation.
Noninvasive scalp EEG has revealed a face-specific N170 ERP that shares some features of the VOTC face-N200 (Bentin et al., 1996). Consistent with our present findings, several studies have failed to find attentional modulation of the N170 (Cauquil et al., 2000; Eimer, 2000; Carmel and Bentin, 2002) [but see Jacques and Rossion (2007) and Mohamed et al. (2009)]. However, we note that the neural source of the scalp-recorded N170 ERP is unknown. We have previously reported a second and separate locus of face-specific ERPs from the posterior lateral temporal region (Allison et al., 1999). It is likely that N170 reflects lateral rather than ventral occipitotemporal lobe sources based on biophysical plausibility and functional dissociations between the scalp N170 and the ventral N200 (McCarthy, 2001; Itier and Taylor, 2004).
An MEG investigation of attention modulation of face processing similarly found that the early magnetic evoked response (M170) was unaffected by attention, whereas a slow longer latency response was sensitive to directed attention (Furey et al., 2006). Furey et al. (2006) argue that the hemodynamic response noted in previous imaging experiments likely reflects the long-latency low-frequency evoked response seen in their study. While this is an interesting speculation, we believe it more likely that the attentionally modulated hemodynamic response along the fusiform reflects the induced gamma response observed here (see below). Gamma power is positively correlated with hemodynamic activity in both nonhuman animals (Niessing et al., 2005; Sirotin and Das, 2009) and humans (Mukamel et al., 2005; Lachaux et al., 2007; Koch et al., 2009; Ojemann et al., 2010), whereas low-frequency responses, such as those described by Furey et al. (2006), have been reported to be negatively correlated with hemodynamic activity (Mukamel et al., 2005; Niessing et al., 2005).
Attention modulation of the VOTC ERSP
Of particular interest, we found that event-related gamma power was significantly larger when attention was directed to faces than when directed to houses. This effect offers a potential resolution to the inconsistent results given by electrophysiological and hemodynamic studies of the automaticity of the ventral face response. In contrast to the face-N200's invariance to cognitive task demands and its relative insensitivity to lower-level manipulations such as face repetition (Allison et al., 1999; McCarthy et al., 1999), fMRI and PET studies have consistently found that the hemodynamic response of the fusiform gyrus to face perception is highly susceptible to attention modulation (Haxby et al., 1994; Wojciulik et al., 1998; Vuilleumier et al., 2001; Furey et al., 2006). Attentionally modulated face-specific gamma oscillations, as reported here, are consistent with these findings given the rapidly growing body of studies demonstrating a tight coupling of neural oscillations in the gamma frequency range and hemodynamic activity (Mukamel et al., 2005; Niessing et al., 2005; Lachaux et al., 2007; Koch et al., 2009; Sirotin and Das, 2009; Ojemann et al., 2010). A subset of these investigations has found that this coupling is weaker or nonexistent in the low-gamma range (<60 Hz) (e.g., Niessing et al., 2005; Sirotin and Das, 2009). However, we did not find that low- and high-gamma ERSPs differed significantly in their sensitivity to attention modulation.
Attention modulation of PHG ERSPs
Unlike the face-specific sites described above, the electrodes in PHG did not respond with increased gamma power when faces were attended, demonstrating that the gamma increase to attended faces did not generalize to all of VOTC. Indeed, there was a small, though statistically insignificant, trend for increased gamma to attended houses (Fig. 4b,c), consistent with fMRI studies that found the PHG responds preferentially to the perception of “scenes,” often operationalized as images of houses (Epstein and Kanwisher, 1998). An equivalent comparison of face and house attentional effects was not possible because, to date, scene-specific electrophysiological responses akin to the face-specific N200 (e.g., Allison et al., 1994) have not been reported, and so our analysis was limited to anatomically, rather than functionally, localized electrode sites.
In summary, we have shown that the initial face-specific N200 response in the VOTC is uninfluenced by selective attention, while a subsequent sustained evoked response and an induced gamma response were strongly influenced by selective attention. Thus, the functional properties of the VOTC regions giving rise to face-N200 suggest that an initial obligatory response to faces is followed by a elaborative processing that is sensitive to attention. This conclusion is broadly consistent with a study demonstrating that the information carried by single face-selective neurons in the macaque temporal lobe varied as a function of latency (Sugase et al., 1999). Our findings demonstrate that subdural electrophysiological recordings have sufficient temporal resolution to differentiate early and late phases of processing at the same neural locus. These temporal dynamics may be indistinguishable when using hemodynamic imaging due to its low temporal resolution. The current approach has thus revealed a novel functional property of category-specific visual cortex.
Footnotes
This research was supported by National Institutes of Health Grants MH05286 and NS41328. We thank Drs. Truett Allison, Kenneth A. Vives, and Dennis D. Spencer as well as Joseph Jasiorkowski for their help in acquiring the intracranial EEG data reported here. We also thank William Walker and for assistance in localizing the electrodes.
- Correspondence should be addressed to Gregory McCarthy, Department of Psychology, Yale University, P.O. Box 208205, New Haven, CT 06520-8205. gregory.mccarthy{at}yale.edu