Abstract
Goal-directed visual attention is a fundamental cognitive process that enables animals to selectively focus on specific regions of the visual field while filtering out irrelevant information. However, given the domain specificity of social behaviors, it remains unclear whether attention to faces versus nonfaces recruits different neurocognitive processes. In this study, we simultaneously recorded activity from temporal and frontal nodes of the attention network while macaques performed a goal-directed visual search task. V4 and inferotemporal (IT) visual category-selective units, selected during cue presentation, discriminated fixations on targets and distractors during the search but were differentially engaged by face and house targets. V4 and IT category-selective units also encoded fixation transitions and search dynamics. Compared with distractors, fixations on targets reduced spike–LFP coherence within the temporal cortex. Importantly, target-induced desynchronization between the temporal and prefrontal cortices was only evident for face targets, suggesting that attention to faces differentially engaged the prefrontal cortex. We further revealed bidirectional theta influence between the temporal and prefrontal cortices using Granger causality, which was again disproportionate for faces. Finally, we showed that the search became more efficient with increasing target-induced desynchronization. Together, our results suggest domain specificity for attending to faces and an intricate interplay between visual attention and social processing neural networks.
- attention
- face
- functional connectivity
- Granger causality
- house
- inferotemporal cortex
- LPFC
- macaques
- OFC
- spike–LFP coherence
- V4
- visual search
Significance Statement
Visual attention stands as a cornerstone in the tapestry of visual perception. This study explores the neurocognitive mechanisms underlying goal-directed visual attention, specifically in the context of social versus nonsocial stimuli. By simultaneously recording neural activity from temporal and frontal nodes of the attention network in macaques, we elucidated how attentional processes differed when directed toward social or nonsocial targets. Our findings revealed distinct neural signatures for social versus nonsocial stimuli, suggesting domain specificity in the allocation of attentional resources. Moreover, we demonstrated an intricate interplay between visual attention and social processing neural networks. These insights advance our understanding of the neural basis of social attention in primates.
Introduction
Goal-directed visual attention, a crucial aspect of cognitive processing, reflects the brain's remarkable ability to prioritize visual information relevant to a specific task or objective. In understanding how animals navigate and interact with their environment, researchers have elucidated neural networks, pathways, and dynamics that govern how the brain selects, processes, and integrates information to achieve desired goals (Kastner and Ungerleider, 2000; Corbetta and Shulman, 2002; Maunsell and Cook, 2002; Petersen and Posner, 2012; Moore and Zirnsak, 2017; Fiebelkorn and Kastner, 2020). Goal-directed attention involves a complex interplay of various brain regions and circuits, including the prefrontal and temporal cortices (Chelazzi et al., 1993; Bichot et al., 2005; Buschman and Miller, 2007; Burrows and Moore, 2009; Zhou and Desimone, 2011; Zhou et al., 2016). These regions work in concert, contributing to the orchestration of attentional resources and forming intricate networks that adaptively modulate sensory processing based on the goals and intentions of the animal (Buschman and Kastner, 2015). On the other hand, evidence argues for domain specificity in social processing (Brothers, 1990; Stanley and Adolphs, 2013). For example, there is a dedicated face processing system in the macaque inferotemporal (IT) cortex (Tsao et al., 2006; Hesse and Tsao, 2020). While faces may inherently attract preferential attention (Wang and Adolphs, 2017), few studies have systematically investigated the detailed behavioral and neurophysiological mechanisms underlying face versus nonface attention. In particular, it remains unclear whether attention to faces recruits different neural substrates and differentially engages the attention neural network.
In this study, we set out to investigate whether visual attention is stimulus specific and address the debate on whether the processing of faces, a highly social stimulus, in the context of attention is fundamentally different from the processing of nonface stimuli. We employed a free-gaze, goal-directed visual search task using natural face and object stimuli matched for low-level visual properties. Monkeys were trained to detect search targets that belonged to the same category as the cue (but different from the cue), beyond merely matching targets and cues. This allowed for a detailed analysis of attention to faces versus nonfaces. We simultaneously recorded from a large number of units from the key brain areas engaged in visual attention and face processing, including V4, IT cortex (TE and TEO), lateral prefrontal cortex (LPFC), and orbitofrontal cortex (OFC). We hypothesize that attention to faces recruits distinct neural processes compared with attention to houses. Furthermore, prior studies have suggested that target-induced desynchronization—where attention to search targets results in reduced spike–local field potential (LFP) coherence in the lower frequency band—occurs (Fries et al., 2001; Yan and Zhou, 2019). Here, we explored whether target-induced desynchronization between the temporal and prefrontal cortices was disproportionate for face targets. Together, by comprehensively analyzing face-specific visual attention through behavior, neuronal firing rate, and functional connectivity, our study contributes to the broader debate on the domain specificity of social processing in the brain (i.e., the notion of the “social brain”).
Materials and Methods
Subjects and recording sites
Two male rhesus macaques, weighing 12 and 15 kg, were used in the study. All experiments were performed with the approval of the Institutional Animal Care and Use Committee of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (No. SIAT-IRB-160223-NS-ZHH-A0187-003).
The monkeys were implanted under aseptic conditions with a post to fix the head and recording chambers over areas V4, inferotemporal (IT) cortex (including both TE and TEO), lateral prefrontal cortex (LPFC), and orbitofrontal cortex (OFC; see Fig. 1C for details). The localization of the chambers was based on MRI scans obtained before surgery. Notably, given the presence of face-selective regions in the IT cortex (Tsao et al., 2006), we recorded neural activity from a wide range of the IT cortex to include units with diverse visual category selectivity. The recording sites were determined based on stereotaxic coordinates relative to Ear Bar Zero (EBZ) in the atlas of the rhesus monkey brain and the known regions of face patches (Tsao et al., 2008). Recordings spanned from the central IT cortex, encompassing the area between the anterior middle temporal sulcus (AMTS) and the posterior middle temporal sulcus (PMTS). TE recordings were concentrated from +7 to +8 mm rostral to EBZ, near the posterior edge of the AMTS, while TEO recordings were concentrated from +2 to +5 mm rostral to EBZ, near the anterior edge of the PMTS (Fig. 1C). Most face-preferring units were located from +7 to +8 mm rostral to EBZ, likely within the ML face patch in TE. In contrast, most house-preferring units were found from +2 to +5 mm rostral to EBZ, specifically within the TEO area.
Tasks and stimuli
Monkeys were trained to perform a free-gaze visual search task. A central fixation was presented for 400 ms, followed by a cue lasting 500–1,300 ms. After a delay of 500 ms, the search array was on. The search array contained 11 items, including two targets, randomly selected from a total of 20 predefined locations (Fig. 2F). Monkeys were required to find one of the two targets within 4,000 ms and maintain fixation on the target for 800 ms to receive a juice reward. A new trial started once the monkeys received the juice reward, and they did not continue searching for the second target. No constraints were placed on their search behavior to allow animals to perform the search naturally. Before the onset of the search array, monkeys were required to maintain a central fixation. The two target stimuli belonged to the same category as the cue stimulus, though they were distinct images. We utilized four categories of stimuli—face, house, flower, and hand—each comprising 40 images. Only faces and houses were used as search targets, and the search cue/target was randomly selected from either the face or house stimuli with equal probability. The remaining nine stimuli in the search array were drawn from the other three categories. Each stimulus subtended an area of ∼2° × 2°, with the hue and saturation in the HSV color space, aspect ratio, luminance, and spatial frequency of these images matched across categories. The 20 locations, covering the visual field of eccentricities from 5° to 11°, included 18 locations located symmetrically in the left and right visual field, with 9 on each side, and 2 locations on the vertical middle line (Fig. 2F). There was no stimulus at the center of the screen.
A visually guided saccade task was employed to map the peripheral receptive fields (RFs) of recorded units, following an established procedure (Bichot et al., 2019). After a 400 ms central fixation, a stimulus randomly appeared in 1 of the 20 locations (Fig. 2F), and monkeys were required to make a saccade to the stimulus within 500 ms and maintain fixation on it for 300 ms to receive a reward. We compared the responses before (−150 to 0 ms) and after (50–200 ms for V4, IT, and LPFC; 100–200 ms for OFC) the peripheral stimulus onset. If there was a significant increase in response to the peripheral stimulus (Wilcoxon rank-sum test, p < 0.05), the unit was considered to have a peripheral RF.
Behavioral experiments were conducted using the MonkeyLogic software (University of Chicago), which presented the stimuli, monitored eye movements, and triggered the delivery of the reward.
Electrophysiology
Single-unit and multi-unit spikes were recorded from V4, IT, LPFC, and OFC using 24- or 32-contact electrodes (V-Probe or S-Probe, Plexon) in a 128-channel Cerebus System (Blackrock Microsystems). In most sessions, we recorded activities in two of the areas simultaneously. Neural signals were filtered between 250 Hz and 5 kHz, amplified, and digitized at 30 kHz to obtain spike data. The recording locations in V4, IT, LPFC, and OFC were verified with MRI. Eye movements were recorded using an infrared eye-tracking system [iView X Hi-Speed, SensoMotoric Instruments (SMI)] at a sampling rate of 500 Hz.
Data analysis: spike rate
Measurements of neural activity were obtained from spike density functions, which were generated by convolving the time of action potentials with a function that projects activity forward in time (growth, 1 ms; decay, 20 ms) and approximates an EPSP (Thompson et al., 1996). The spike rate of each unit was normalized by the mean baseline firing rate during the fixation spot preceding the cue.
Data analysis: receptive field
The visual response to the cue and the search array in the free-gaze visual search task was assessed by comparing the firing rate during the poststimulus period (50–200 ms after cue/array onset) to the corresponding baseline (−150 to 0 ms relative to cue/array onset) using a Wilcoxon rank-sum test. It is worth noting that, at both cue and search array onsets, monkeys were fixating on the center of the screen before making saccades to initiate the search. Therefore, when the cue was displayed, there was only one stimulus in the center (in the foveal RF), whereas when the search array was displayed, there was no stimulus in the center (Fig. 2F), with all stimuli appearing in the periphery and activating the peripheral RFs (i.e., no simultaneous presence of stimuli in both the fovea and periphery). Based on these responses, we classified the units into three categories of RFs:
Units with a focal foveal RF: These units responded solely to the cue in the foveal region (p < 0.05) but not to the search array that included items in the periphery (p > 0.05).
Units with a broad foveal RF: These units responded to both the cue and the search array.
Units with a peripheral RF: These units only responded to the search array (p < 0.05) but not to the cue (p > 0.05). It is worth noting that when the search array was displayed, there was no stimulus at the center of the screen; instead, all stimuli appeared in the periphery (Fig. 2F). The RFs of these units were additionally mapped based on their activities in the visually guided saccade task (see above).
Units not classified into one of the three categories were excluded from further analysis. In this study, our focus was on units with a foveal RF [categories (1) and (2)], comparing fixations on targets versus distractors.
Data analysis: category selectivity
We determined the category selectivity of each unit by comparing the response to face cues versus house cues in a time window of 50–200 ms after cue onset (Wilcoxon rank-sum test, p < 0.05). We further imposed a second criterion using a selectivity index similar to indices employed in previous IT studies (Freiwald et al., 2009; Freiwald and Tsao, 2010). For each unit with a foveal RF, the response to face stimuli (Rface) or house stimuli (Rhouse) was calculated using the visual search task by subtracting the mean baseline activity (−150 to 0 ms relative to the onset of the cue) from the mean response to the face or house cue (50–200 ms after the onset of the cue). The selectivity index (SI) was then defined as (Rface − Rhouse) / (Rface + Rhouse). SI was set to 1 when Rface > 0 and Rhouse < 0 and to −1 when Rface < 0 and Rhouse > 0. Face-preferring units were required to have an Rface at least 130% of Rhouse (i.e., the corresponding SI was greater than 0.13). Similarly, house-preferring units were required to have an Rhouse at least 130% of Rface (i.e., the corresponding SI was smaller than −0.13). Units were labeled as noncategory-selective if the response to face cues versus house cues was not significantly different (p > 0.05). The remaining units that did not fit into any of the aforementioned types were classified as undefined units (i.e., there was a significant difference but did not meet the second criterion).
Data analysis: attentional effect
We calculated the attentional effect as the difference in firing rate between the same stimuli when they served as targets versus distractors. It is worth noting that fixations could occur on both targets and distractors (i.e., distractors were not always in the periphery), and we only analyzed units with a foveal RF to facilitate a comparison between fixations on targets and distractors. Using the same stimuli to calculate the attentional effect by comparing their firing rates when they served as targets versus distractors could help to isolate the influence of attention on neural activity.
The latency of attentional effect at the population level was determined based on the mean response of each unit using a sliding window method. If a significant difference (Wilcoxon signed-rank test, p < 0.05) was found successively for 35 ms between the target and distractor responses, the first time point of the 35 ms window was defined as the starting point of the attentional effect. To test whether a latency difference at the population level was significant, we used a two-sided permutation test with 1,000 runs, as described in our previous study (Zhou and Desimone, 2011). The latency of the attentional effect for each unit was defined as the first 20 ms bin out of the 12 successive bins that had a significantly greater response for targets than distractors (one-tailed Wilcoxon signed-rank test: p < 0.05).
Data analysis: spike–LFP coherence
We implemented the spike–LFP coherence analysis using the Chronux toolbox (www.chronux.org) in MATLAB. We used a single Hanning taper across frequencies, but we derived similar results using multitaper methods for higher frequencies (>25 Hz; Fries et al., 2008). Coherence between two signals, x and y, was calculated using the following formula:
Data analysis: Granger causality
We utilized the open-source MATLAB toolbox “SpikeFieldGrangerCausality” (Gong et al., 2019) for frequency-domain Granger causality analysis between spikes and LFP. Causality was calculated during the same period as in coherence analysis between spikes and LFP across various brain areas. The power spectrum for the spiking process was estimated with the multitaper method and then frequency-domain Granger causality from LFP xi to spike dNj was calculated as follows:
We utilized the open-source MATLAB toolbox “gcpp” (Kim et al., 2011) to assess Granger causality between spikes of multiple neurons during the same period as described above. A potential causal relationship from neuron(s) j to neuron i is assessed using the log-likelihood ratio, which is given by the following:
Results
Behavior and eye movement
Two monkeys performed a free-gaze visual search task (Fig. 1A) with simultaneous recordings from multiple brain areas (see Fig. 1C for detailed characterization of the recording sites). Monkeys were required to fixate on one of the two search targets that belonged to the same category as the cue. Search items were drawn from four categories of natural object images: faces, houses, flowers, and hands, with 40 images per category (Fig. 2A). Importantly, stimuli from different categories were matched in terms of hue and saturation in the HSV color space (Fig. 2B), aspect ratio (Fig. 2C), luminance (Fig. 2D), and spatial frequency (Fig. 2E). This ensured that the findings presented below between faces and houses could not be attributed to low-level features. Both monkeys could well perform this task (accuracy: 91.78% ± 0.19% for Monkey S and 85.23% ± 0.41% for Monkey E). Monkeys had a similar accuracy in finding faces (89.19% ± 0.31%) versus houses (89.31% ± 0.33%), although the reaction time (RT), from the onset of the search array to the onset of the last fixation, was faster when searching for faces [Fig. 1B; faces: 396.56 ± 1.23 ms, houses: 420.17 ± 1.23 ms; two-tailed Wilcoxon rank-sum test: p = 1.35 × 10−166; Kolmogorov–Smirnov (KS) test: KS = 0.081, p = 7.18 × 10−228].
Monkeys had precise fixations on search items (0.74° to search item center; each search item subtended a visual angle of ∼2°; Fig. 3A–C). We excluded the last fixations on targets from all further analyses, where monkeys fixated on the targets for 800 ms to receive juice rewards. This was because our primary focus was on the target detection process, while the last fixations in rewarded trials likely involved additional cognitive processes such as reward expectation, task set, and decision-making related to the reward. These additional processes could introduce confounding variables that complicate the interpretation of the attentional effects. Since the maximum duration for fixations on distractors was <400 ms (Fig. 3D) and we excluded the last fixations on targets (though target fixations broken close to 800 ms were still included), no fixation exceeded 800 ms. It is also important to note that only fixations from correct trials, where monkeys successfully completed the task and received a reward, were included.
We found that fixations on targets were longer than fixations on distractors [Fig. 3D; two-way ANOVA (target vs distractor × face vs house); main effect of target vs distractor: F(1,79622) = 9,908.87, p < 10−50] and fixations on faces were longer than fixations on houses (Fig. 3D; main effect of face vs house: F(1,79622) = 43.78, p = 3.70 × 10−11; interaction: F(1,79622) = 47.51, p = 5.52 × 10−12). Consistent with a longer RT for houses, there were more fixations when monkeys searched for houses compared with faces (Fig. 3E; p = 1.75 × 10−135).
In the majority of the trials, monkeys completed the search by fixating on the target when they first spotted it (Fig. 3A,F). However, there were trials where monkeys fixated on multiple targets (Fig. 3B,C,F), with house targets eliciting more multi-target trials (Fig. 3F; p = 1.56 × 10−8). Interestingly, in trials with multiple fixations on targets, a substantial percentage of fixations immediately returned to the same target (Fig. 3B,C for a no-return trial; Fig. 3G) and eventually returned to the same target (Fig. 3H); and notably, face targets had both more immediate (Fig. 3G; p = 1.011 × 10−8) and eventual (Fig. 3H; p = 2.82 × 10−6) returns. We next explored the neural mechanisms underlying these search behaviors.
Visual category-selective units signal attention to search targets
We recorded a total of 5,070 units from area V4, 5,051 units from the IT cortex (including both TE and TEO), 1,470 units from the OFC, and 2,997 units from the LPFC that had a significant visually evoked response (i.e., the response to the cue or search array was significantly greater than the response to the baseline; Wilcoxon rank-sum test: p < 0.05; see Fig. 1C for detailed recording locations). Among these units, 1,624 units from V4, 1,419 units from the IT cortex, 888 units from the OFC, and 32 units from the LPFC had a focal foveal receptive field (RF; see Materials and Methods), while 781 units from V4, 268 units from the IT cortex, no units from the OFC, and 514 units from the LPFC had a localized peripheral RF. In this study, we specifically focused on the units with a foveal RF to facilitate a comparison between fixations on targets and distractors. Furthermore, given the presence of face-selective regions in the IT cortex (Tsao et al., 2006), we recorded from a wide range of the IT cortex to include units with diverse visual category selectivity (see below; see Materials and Methods).
We identified visual category-selective units, those distinguishing faces from houses, based on their responses during the cue period (see Materials and Methods). Our results revealed that 266 units from area V4 (14.01%), 518 units from the IT cortex (34.28%; 10.25% of TEO and 67.02% of TE), 4 units from the LPFC (11.43%), and 640 units from the OFC (71.11%) were face-preferring units (Fig. 4A; note that both single-unit and multi-unit spikes were included, but we replicated all results with single units only; see below). Similarly, 304 units from area V4 (16.02%), 340 units from the IT cortex (22.50%; 39.68% of TEO and 5.93% of TE), 13 units from the LPFC (37.14%), and 19 units from the OFC (2.11%) were house-preferring units (Fig. 4A). Importantly, both face-preferring and house-preferring units in V4 (Fig. 4B) and IT (Fig. 4D) signaled target detection: face-preferring units exhibited a greater response for fixations on face targets compared with fixations on the same faces when they served as distractors (Fig. 4B,D). Similarly, house-preferring units exhibited an increased response for fixations on houses when they were targets compared with when they were distractors (Fig. 4B,D). Therefore, these results suggest that units in V4 and IT not only signal the categorical membership of the stimuli but also simultaneously convey attention associated with target detection (note that the attentional effect was greater for V4 units when the preceding fixation was on the central dot compared with a stimulus, while the attentional effect for IT units remained similar regardless of whether the preceding fixation was on the central dot or a stimulus). However, this pattern was not observed for category-selective units in the OFC and LPFC (Fig. 4G–J), except for the weak but significant effects in OFC face-preferring units (Fig. 4G). Notably, face targets elicited a different attentional effect in both V4 (Fig. 4C) and IT (Fig. 4E): the response between targets and distractors diverged after ∼100 ms, and the difference in normalized firing rate was greater for faces than for houses. These results suggest that face and house targets differentially engage these brain areas. Additionally, a separate latency analysis for individual units (see Materials and Methods) suggested that attentional enhancement for targets was earlier for houses than faces in both V4 (Fig. 4F; p = 4.40 × 10−3; KS = 0.24, p = 0.010; compare Fig. 4C) and IT (Fig. 4F; p = 3.72 × 10−7; KS = 0.26, p = 1.50 × 10−5; compare the blue vs red top bars indicating a significant difference between targets and distractors shown in Fig. 4D).
To control for the different levels of response between face-preferring and house-preferring units (Fig. 4B,D), we further normalized the difference by the sum of the target and distractor responses, resulting in similar findings (Fig. 5A,B). Additionally, we obtained similar results by normalizing the firing rate by the maximum response of each unit across conditions (Fig. 5C–F). Furthermore, we obtained qualitatively the same results when controlling for fixation durations between targets and distractors (Fig. 5G–J). Finally, similar results were obtained using only single units (Fig. 5K–N).
Visual category-selective units signal fixation transitions
Visual search involves complex fixation transitions between search items. We next examined the neural mechanisms underlying these dynamics. First, we observed that category-selective units from the IT cortex (Fig. 6B) signaled fixation transitions from search targets. Specifically, IT category-selective units exhibited different responses depending on whether the next fixation was on a target or a distractor, with distractors eliciting a greater response (Fig. 6A,B). However, category-selective units did not signal fixation transitions from search distractors (Fig. 6C,D; only weakly in the IT cortex). In other words, these units did not differentiate the content of the next fixation following search distractors. Remarkably, we found that category-selective units in both V4 and IT could even differentiate whether fixations would return to the same search target or not after one fixation (i.e., immediate return; Fig. 6E,F) or after a few fixations (Fig. 6G,H), particularly for face targets in V4 units (Fig. 6E,G; but not in IT units). Together, our results reveal that visual category-selective units can signal fixation transitions and predict search dynamics.
Disproportionate target-induced desynchronization for face versus house targets
Does visual search of faces and houses engage the same neural substrates and functional network? To answer this question, we analyzed the coherence between spikes and LFPs recorded simultaneously across brain areas (see Materials and Methods). We included spikes from V4 and IT (OFC and LPFC were excluded due to having fewer than 20 face-preferring and house-preferring units) and LFPs from all four brain areas. We found that V4 spikes desynchronized with V4 LFPs in the theta frequency band for fixations on targets compared with fixations on distractors, and this was the case for both face targets and house targets (Fig. 7A; see figure legend for detailed statistics). Such target-induced desynchronization in the theta frequency band was also observed for V4 spike–IT LFP coherence (Fig. 7B), IT spike–V4 LFP coherence (Fig. 7E), and IT spike–IT LFP coherence (Fig. 7F), although house targets elicited a stronger desynchronization between V4 spike and V4 LFP (Fig. 7A; two-tailed Wilcoxon rank-sum test between faces and houses: p = 9.51 × 10−10) and between V4 spike and IT LFP (Fig. 7B; p = 0.018) whereas face targets elicited a stronger desynchronization between IT spike and IT LFP (Fig. 7F; p = 9.64 × 10−9). Importantly, compared with houses, faces elicited significantly greater target-induced desynchronization between V4 spike and OFC LFP (Fig. 7C; two-tailed Wilcoxon rank-sum test between faces and houses: p = 7.68 × 10−8), between V4 spike and LPFC LFP (Fig. 7D; p = 1.13 × 10−4), and between IT spike and LPFC LFP (Fig. 7H; p = 0.0020; note that these results all remained significant with Bonferroni's correction for eight comparisons), suggesting that face targets disproportionately engaged the prefrontal cortex compared with house targets.
It is worth noting that we obtained similar results using only single units. We also obtained similar results when controlling for fixation durations between face targets and house targets as well as between fixations on targets and distractors [i.e., no significant difference between the four categories of fixations (face targets, face distractors, house targets, and house distractors); one-way ANOVA: F(3,11636) = 2.46, p = 0.061] for both all units and single units only. Our findings remained robust across various fixation time windows, alignment at saccade onset, different frequency estimations (e.g., choice of multitapers), and units with different types of receptive fields (i.e., focal vs broad). Interestingly, we examined the spike–LFP coherence using the nonpreferred stimuli (i.e., houses for face-preferring units and faces for house-preferring units) and found a greater V4-OFC desynchronization for houses in face-preferring units than faces in house-preferring units [for both all units (Fig. 8A–H) and single units only (Fig. 8I–P)], suggesting that target-induced desynchronization with the prefrontal cortex was specific for cell types (i.e., face-preferring units) rather than stimulus types (i.e., faces).
Directionality of theta influence across brain areas
To investigate the direction of interactions between brain areas, we performed a Granger causality analysis based on spikes and LFPs in the theta frequency band. We first analyzed the influence of LFPs on spikes (Fig. 9A–D). We found that attention modulated interactions between brain areas when comparing fixations on targets versus distractors, with targets inducing a decrease in Granger causality (i.e., desynchronization). Importantly, such modulation was disproportionate for face versus house targets (Fig. 9C,D). Specifically, the influence of V4 LFP on V4 spike (two-tailed Wilcoxon rank-sum test between faces and houses: p = 4.01 × 10−5), the influence of OFC LFP on V4 spike (Fig. 9A; p = 5.88 × 10−11), and the influence of LPFC LFP on IT spike (Fig. 9B; p = 0.0018) were more strongly modulated by face targets, whereas the influence of IT LFP on V4 spike (p = 0.0050) and the influence of IT LFP on IT spike (p = 0.0011) were more strongly modulated by house targets.
We also analyzed the influence of spikes on LFPs (Fig. 9C,E). Again, we found that attention modulated interactions between brain areas, and such modulation was disproportionate for face and house targets (Fig. 9C,E). Notably, the influence of V4 spike on V4 LFP (p = 4.62 × 10−14), the influence of V4 spike on IT LFP (p = 0.015), the influence of V4 spike on LPFC LFP (p = 0.0077), the influence of IT spike on V4 LFP (p = 0.0055), the influence of IT spike on OFC LFP (p = 0.047), and the influence of IT spike on LPFC LFP (p = 0.0074) were all more strongly modulated by face targets.
Lastly, we calculated spike–spike Granger causality (see Materials and Methods) between V4 and IT (note that the OFC and LPFC were excluded from this analysis because they had fewer than 20 face-preferring and house-preferring units) and found that the influence of V4 spike on IT spike was more strongly modulated by house targets (p = 0.0034). However, we did not observe a significant difference between face and house targets for the influence of IT spike on V4 spike (p = 0.53).
Together, our results reveal bidirectional influences between LFPs and spikes in the temporal and prefrontal cortices. In particular, attention modulation of these influences is more pronounced for face targets compared with house targets, with face targets showing a disproportionate engagement of the prefrontal cortex (Fig. 9C–E for a summary).
Relationship between behavior, firing rate, and spike–LFP coherence
Finally, we investigated the relationship between search efficiency (indexed by RT and the number of fixations), attentional effect (indexed by the difference in firing rate for targets vs distractors), and target-induced desynchronization (indexed by the reduction in spike–LFP coherence for targets compared with distractors). Specifically, we found that RT correlated with the target-induced reduction in IT spike–V4 LFP coherence (Fig. 10A), especially for house targets (Pearson’s correlation; face: r = −0.11, p = 0.016; house: r = −0.57, p = 2.33 × 10−8; note that only house targets remained significant with Bonferroni’s correction for eight comparisons). On the other hand, RT correlated with the target-induced reduction in V4 spike–V4 LFP coherence only for face targets and not house targets (Fig. 10B; face: r = −0.24, p = 1.18 × 10−4; house: r = −0.013, p = 0.83). Furthermore, RT correlated with the target-induced reduction in IT spike–IT LFP coherence for both face and house targets (Fig. 10C; face: r = −0.12, p = 0.0051; house: r = −0.20, p = 2.27 × 10−4). Similarly, the number of fixations per trial correlated with the target-induced reduction in IT spike–V4 LFP coherence for house targets (Fig. 10D; face: r = −0.11, p = 0.019; house: r = −0.58, p = 9.83 × 10−9), V4 spike–V4 LFP coherence for face targets (Fig. 10E; face: r = −0.24, p = 8.22 × 10−5; house: r = 0.012, p = 0.84), and IT spike–IT LFP coherence for both face and house targets (Fig. 10F; face: r = −0.13, p = 0.005; house: r = −0.18, p = 0.0011). However, we did not observe significant correlations between search efficiency and attentional effects (all ps > 0.05), nor between attentional effects and target-induced desynchronization (all ps > 0.05). Furthermore, we did not find such correlations with prefrontal areas (Fig. 11; no significant negative correlations: all ps > 0.05).
Together, our results suggest that target-induced desynchronization within the temporal cortex can explain search efficiency: search becomes more efficient (reduced RT and number of fixations) with increasing desynchronization. Notably, faces and nonfaces exhibit different patterns of correlation between neural responses and search behavior, further indicating that different neural processes are involved in searching for faces.
Discussion
In this study, we demonstrated that visual category-selective units in V4 and IT, selected during cue presentation, distinguished between fixations on targets and distractors during visual search. We also elucidated the neural mechanisms underlying fixation transitions and search dynamics, particularly noting differences between face and house targets. Furthermore, we observed reduced spike–LFP coherence in V4 and IT with increased attention to search targets, and such target-induced desynchronization between temporal and prefrontal cortices was only evident for face targets. Finally, we revealed directional theta influences between the temporal and prefrontal cortices under attention modulation, which was again disproportionate for face targets. Together, our results suggest domain specificity in searching for faces, as well as an intricate interplay between visual attention and face processing. Notably, we demonstrated not only local neural computations by showing the attentional effect for neurons within a brain area but also information flow across brain areas using functional connectivity. These neural mechanisms were further linked to behaviors.
Our study represents one of the first examinations of feature attention in foveal units during visual search, complementing the common view of distributed feature attentional effects (Bichot et al., 2005; Maunsell and Treue, 2006) and providing a more comprehensive understanding of the distribution of attentional effects across the entire visual field. Notably, foveal units facilitate the examination of target-induced attentional effects and the comparison between face and house targets because they exhibit stronger category selectivity and more focused receptive fields than peripheral units in V4 and IT (Zhang et al., 2021). Foveal attentional enhancements may aid in maintaining eye fixation on the current target, potentially reducing the likelihood of making saccades toward peripheral items during search (Zhang et al., 2021). Interestingly, we also demonstrated that category-selective units predicted fixation transitions and search dynamics. Importantly, all these aspects differed between face and house targets. While our results could not be simply attributed to low-level visual features (Fig. 2), future studies are needed to elucidate the underlying differences between face and nonface stimuli. This includes investigating whether our findings could be explained by factors such as familiarity, motivation, value, and attractiveness.
We observed desynchronization for attended stimuli (fixations on search targets) compared with the same stimuli that were unattended (fixations on distractors) in the theta frequency band in both V4 and IT, consistent with prior studies showing desynchronization for attended stimuli in V4 in a similar frequency band (Fries et al., 2001; Yan and Zhou, 2019). Desynchronization has been observed for both feature-based attention (distinguishing between target and distractor in the peripheral receptive field) and saccade selection [directing attention into [attention in] or out of (attention out) the peripheral receptive field], and it was the case for V4 spike–V4 LFP coherence, V4 spike–frontal eye field (FEF) LFP coherence, and FEF spike–V4 LFP coherence (Yan and Zhou, 2019). Importantly, we showed that even when the search task remained the same and the accuracy remained similar, the processing for face targets disproportionately engaged the prefrontal cortex compared with house targets. Our results thus suggest a context modulation of the attention neural network based on stimulus category, contrasting with the invariant V4 response to different reward/motivation contexts (Ghosh and Maunsell, 2022). This result may also explain the advantages of searching for faces, as well as the disproportionate deficits in attention to faces observed in individuals with autism spectrum disorder during visual search (Wang et al., 2014).
The neural responses observed when fixating on or saccading to search targets versus distractors are commonly interpreted as indicative of attentional processes (Bichot et al., 2005; Zhou and Desimone, 2011). However, similar to many other studies, we cannot entirely rule out the influence of reward expectation on these neural responses. It is plausible that both template matching and reward expectation contribute to the observed attentional effects. This dual role underscores the complexity of interpreting neural responses in search tasks and suggests that attentional and reward-related processes are intricately linked. Further research is needed to disentangle these contributions and clarify the mechanisms underlying these neural responses. Furthermore, while the reward was the same for face and nonface trials, faces were detected faster than houses (Fig. 1B), indicating that detecting faces requires less mental effort. However, reaction time, which indicates stimulus difficulty and mental effort, correlated only with spike–LFP coherence within the temporal cortex (i.e., V4 and IT; Figs. 10, 11), but not with attentional effects in firing rate. Therefore, the differences in neural response could not be simply explained by differences in behavior. In particular, behavioral differences could not explain the differential engagement of the prefrontal cortex for face targets (Fig. 7) or the differential attentional effect for faces (Fig. 4).
Simultaneous neural recordings across brain areas have delineated the roles of different nodes within the attention neural network. For example, it has been shown that synchrony between LPFC and parietal areas is stronger in lower frequencies during top-down attention and in higher frequencies during bottom-up attention (Buschman and Miller, 2007). Paired neuron recordings in FEF and IT reveal that spatial selection precedes object identification during visual search (Monosov et al., 2010). Top-down attention originates in the prefrontal and parietal cortices and influences the sensory temporal cortex both through direct descending projections (e.g., from FEF to V4) and through a backward cascade (e.g., from the prefrontal cortex, particularly LPFC, to IT to V4; Buschman and Kastner, 2015). Our present study not only supports the backward cascade but also demonstrates a differential backward cascade based on stimulus type. The disproportionate engagement of the prefrontal cortex may result from different interactions between excitatory pyramidal neurons and inhibitory interneurons, which are central to the mechanisms supporting normalization and the generation of synchronous oscillations (Buschman and Kastner, 2015). Future studies will be needed to investigate whether attention to faces versus nonfaces differentially recruits other nodes of the attention network, such as the FEF (Bichot and Schall, 2002). It also remains an interesting question to explore whether the backward progression of attentional effects in the ventral stream (Buffalo et al., 2010) is similar for faces versus nonfaces.
One critical question regarding the notion of the “social brain” is whether any of the neural networks exhibit specialization for processing social information. Earlier notions propose that social processing in primates is subserved by a specific brain system (Brothers, 1990), which is supported by neurons dedicated to face processing in the IT cortex, OFC, and amygdala (Tsao et al., 2006). More recent views tie subsets of the social processing structures together into functional networks that serve particular components of social cognition (Stanley and Adolphs, 2013). Consistent with our previous findings that single neurons in the human prefrontal cortex (Wang et al., 2019) and medial temporal lobe (Wang et al., 2018) encode visual search targets and demonstrate category-selective responses to faces (Wang et al., 2018), our present study provides a network view integrating visual attention and category selectivity during visual search. In particular, our findings that processing social information differentially engages the prefrontal cortex support domain-specific neural processing of social stimuli (Brothers, 1990; Stanley and Adolphs, 2013; Wang and Adolphs, 2017). Together, our comprehensive analyses of behavior, neuronal firing rate, and functional connectivity collectively advance our understanding of the neural mechanisms underlying visual social attention and provide strong support for the domain specificity of social processing and the notion of the “social brain.” Our study contributes to this debate by providing a detailed analysis of how faces (as social stimuli) and nonface stimuli are processed differently in terms of visual attention.
Footnotes
This work was supported by the National Science Foundation (BCS-1945230), National Institutes of Health (R01MH129426), and Air Force Office of Scientific Research (FA9550-21-1-0088). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
↵*H.Z. and S.W. are the co-senior authors.
The authors declare no competing financial interests.
- Correspondence should be addressed to Jie Zhang at zjie{at}wustl.edu or Huihui Zhou at zhouhh{at}pcl.ac.cn.