Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

Distinct Yet Proximal Face- and Body-Selective Brain Regions Enable Clutter-Tolerant Representations of the Face, Body, and Whole Person

Libi Kliger and Galit Yovel
Journal of Neuroscience 12 June 2024, 44 (24) e1871232024; https://doi.org/10.1523/JNEUROSCI.1871-23.2024
Libi Kliger
1The School of Psychological Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Galit Yovel
1The School of Psychological Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
2Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Galit Yovel
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Faces and bodies are processed in separate but adjacent regions in the primate visual cortex. Yet, the functional significance of dividing the whole person into areas dedicated to its face and body components and their neighboring locations remains unknown. Here we hypothesized that this separation and proximity together with a normalization mechanism generate clutter-tolerant representations of the face, body, and whole person when presented in complex multi-category scenes. To test this hypothesis, we conducted a fMRI study, presenting images of a person within a multi-category scene to human male and female participants and assessed the contribution of each component to the response to the scene. Our results revealed a clutter-tolerant representation of the whole person in areas selective for both faces and bodies, typically located at the border between the two category-selective regions. Regions exclusively selective for faces or bodies demonstrated clutter-tolerant representations of their preferred category, corroborating earlier findings. Thus, the adjacent locations of face- and body-selective areas enable a hardwired machinery for decluttering of the whole person, without the need for a dedicated population of person-selective neurons. This distinct yet proximal functional organization of category-selective brain regions enhances the representation of the socially significant whole person, along with its face and body components, within multi-category scenes.

  • category selectivity
  • clutter
  • face perception
  • high-level visual cortex
  • normalization
  • person perception

Significance Statement

It is well established that faces and bodies are processed by dedicated brain areas that reside in nearby locations in primates’ high-level visual cortex. However, the functional significance of the division of the whole person to its face and body components, their neighboring locations, and the absence of a distinct neuronal population selective for the meaningful whole person remained puzzling. Here we proposed a unified solution to these fundamental open questions. We show that consistent with predictions of a normalization mechanism, this functional organization enables a hardwired machinery for decluttering the face, the body, and the whole person. This generates enhanced processing for the socially meaningful whole person and its significant face and body components in multi-category scenes.

Introduction

Intact processing of faces and bodies is critical for effective social interactions. The functional separation and the anatomical proximity of face- and body-selective brain areas in the human and monkey high-level visual cortex is well established (Pinsk et al., 2005, 2009; Schwarzlose et al., 2005; Weiner and Grill-Spector, 2013; Harry et al., 2016; Premereur et al., 2016; Foster et al., 2021; Zafirova et al., 2022). Yet, the functional significance of this anatomical organization remained unclear (for recent reviews, see Hu et al., 2020; Taubert et al., 2022). Why are faces and bodies processed by dedicated distinct mechanisms, despite their natural co-occurrence in the whole person? Why, despite the significance of the whole person for social perception, a distinct population of person-selective neurons/brain areas has not been reported so far? Why do face- and body-selective regions reside in nearby locations? Here we propose a unified account for these questions. In particular, we test the hypothesis that this division of the whole person into distinct but proximally located face- and body-selective areas supports the generation of clutter-tolerant representations for the face alone, the body alone, or the whole person when presented in multi-category scenes (Fig. 1). This mechanism eliminates the need for an additional population of person-selective neurons.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Predicted response of single neurons to a multi-category scene in face- and body-selective cortex. a, The functional organization of face- and body-selective areas in the VTC in a representative subject in the MNI space. Colors indicate category-selective voxels: voxels selective only to faces (red), only to bodies (blue), or to both faces and bodies in the border between them (purple). b, A multi-category scene composed of a face, a body, a chair, and a room and the normalization equation representing the response of a single neuron to that scene. c, A mathematical equivalent of (b), which shows the predicted contribution (weight, β) of each category to the response to the multi-category scene. According to this, the contribution of each category is determined by the sum of responses of the surrounding neurons (i.e., the normalization pool) to that category relative to the sum of responses of the surrounding neurons to all categories. Equation is shown for βFace . See Appendix for detailed mathematical derivations and for the complete formulas for each of the β's. d, A pictorial illustration of the predicted representations of the multi-category scene in voxels that contain a homogeneous population of face-selective neurons. The normalization equation predicts that the response of a neuron to the scene will be biased to the response of the face. e, Same as (c) for body-selective voxels. f, A pictorial illustration of the predicted response to the multi-category scene in voxels that contain adjacent populations of face-selective and body-selective neurons. The normalization equation predicts a biased response to the face and the body, essentially filtering-out nonperson stimuli. Predictions are made for the neuronal response but can also be estimated with fMRI (see Appendix).

A central challenge of the visual system is to generate a veridical representation of objects in multi-category scenes. This effect of clutter is reflected in a reduced neural response to an object when presented with other objects than when presented alone (Miller et al., 1993; Rolls and Tovee, 1995; Zoccolan et al., 2005; MacEvoy and Epstein, 2009, 2011; Bao and Tsao, 2018). Interestingly, this reduced response was not found in category-selective areas, where the response to the preferred category remained unaffected when presented with other categories (Reddy and Kanwisher, 2007; Bao and Tsao, 2018). A normalization mechanism was suggested to account for these findings (Reynolds and Heeger, 2009; Heeger, 2011; Carandini and Heeger, 2012; Bao and Tsao, 2018). This mechanism posits that the response of a neuron is normalized by the response of its neighboring neurons (i.e., the normalization pool; Fig. 1b,c). Therefore, when a neuron is surrounded by neurons that are selective for its nonpreferred categories, its response to simultaneous presentation of the two categories is reduced relative to the response to each object alone (Zoccolan et al., 2005). However, when the surrounding neurons are selective to the same category (i.e., a homogeneous normalization pool), as typically found in category-selective regions, the response to a preferred and a nonpreferred stimuli presented together is similar to the response to the preferred stimulus presented alone (i.e., a max response). This essentially generates a clutter-tolerant representation of the preferred category (Fig. 1d,e; Reddy and Kanwisher, 2007; Bao and Tsao, 2018; Kliger and Yovel, 2020). These findings offer a mechanistic account for the advantage of clustering neurons that are selective for significant categories, such as faces or bodies.

Here we propose that the proximity of clusters of face- and body-selective neurons together with the same normalization mechanism further enables a hardwired clutter-tolerant representation of the whole person (Fig. 1f). This is enabled by the presence of both face-selective neurons and body-selective neurons in the normalization pool (see derivations of the normalization equations in Appendix), as typically found in the border between face- and body-selective areas (Kliger and Yovel, 2020). To test this prediction, we presented the whole person in a multi-category scene and assessed whether the representations of the multi-category scene are biased to the whole person in areas that are selective to both the face and the body (Fig. 1f). These findings would indicate that the adjacent locations of face- and body-selective clusters of neurons generate a clutter-tolerant representation for the face alone, the body alone, or the whole person when presented in multi-category scenes.

Materials and Methods

Participants

Fifteen healthy volunteers (three women, ages 21–31, one left-handed) with normal or corrected-to-normal vision participated in both experiments. Participants were paid $15/h. All participants provided written informed consent to take part in the study, which was approved by the ethics committees of the Sheba Medical Center and Tel Aviv University and performed in accordance with relevant guidelines and regulations. The sample size for each experiment (N = 15) chosen for this study was similar to the sample size of other fMRI studies that examined the representation of multiple objects in the high-level visual cortex (10–15 subjects per experiment; MacEvoy and Epstein, 2009, 2011; Reddy et al., 2009; Baeck et al., 2013; Song et al., 2013; Kaiser et al., 2014; Baldassano et al., 2016; Kaiser and Peelen, 2018; Kliger and Yovel, 2020).

Stimuli

Main experiment

The stimulus set consisted of grayscale images of a multi-category scene as well as its isolated parts: face, body, chair, and room (Fig. 2a). The face and body stimuli were created by using seven grayscale images of a whole person standing in a straight frontal posture with the background removed and downloaded from the internet (Kliger and Yovel, 2020). Each image of a person was cut into two parts in the neck area resulting in a face stimulus and a headless body stimulus for each identity. The chair stimuli included seven images of chairs downloaded from the internet and scaled to a size that fits common proportions between a standing person and a chair. The face, body, and chair stimuli were presented with a gray background. The room stimuli included seven empty rooms created using the website https://roomstyler.com/3dplanner and converted to gray scale. The contrast and luminance of the rooms were scaled such that they had the same contrast and luminance of a chair including its gray background. The complex scene stimuli included seven images of a person inside a room, with the person standing behind a chair (preserving real-life proportion and composition between the person and the chair) at the center of the room. The single-category stimuli were presented at the exact same locations on the screen as they were presented within the multi-category scene. A fixation point was presented in the upper central part of all stimuli, at a location corresponding to the lower part of the neck of the standing person. The size of the multi-category scene images was 13.6 × 13.6° of visual angle.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Example of the stimuli used in the experiment. a, Stimuli of the main experiment, including isolated faces, bodies, chairs, and rooms and multi-category scenes composed of all of the isolated categories. The isolated categories were placed in the same location in the visual field as they appeared in the multi-category scene, and subjects were instructed to maintain fixation on the blue fixation dot throughout the experiment. These stimuli were used to estimate the contribution of each of the isolated categories to the response to the multi-category scene. b, Stimuli of the functional localizer. Stimuli included images of faces, bodies, outdoor scenes, nonliving objects, and scrambled objects. These stimuli were used to assess the magnitude of category selectivity of each voxel to each category.

Functional localizer stimuli

Functional localizer stimuli were grayscale images of faces, headless bodies, outdoor scenes, nonliving objects, and scrambled images of these objects (Fig. 2b). Each category consisted of 80 different images. The size of the stimuli was ∼5.5 × 5.5° of visual angle.

Apparatus and procedure

fMRI acquisition parameters

fMRI data were acquired in a 3 T Siemens MAGNETOM Prisma MRI scanner in Tel Aviv University, using a 64-channel head coil. Echo-planar volumes were acquired with the following parameters: repetition time (TR) = 1 s, echo time = 34 ms, flip angle = 60°, 66 slices per TR, multiband acceleration factor = 6, slice thickness = 2 mm, field of view = 20 cm, and 100 × 100 matrix, resulting in a voxel size of 2 × 2 × 2 mm. Stimuli were presented with MATLAB (MathWorks) and Psychtoolbox (Brainard, 1997; Kleiner et al., 2007) and displayed on a 32″ high-definition LCD screen (NordicNeuroLab) viewed by the participants at a distance of 155 cm through a mirror located in the scanner. Anatomical MPRAGE images were collected with 1 × 1 × 1 mm resolution, echo time = 2.45 ms, and TR = 2.53 s.

Experimental procedure

The study included a single recording session with six runs of the main experiment and three runs of the functional localizer. Each of the six main experiment runs included 15 pseudorandomized miniblocks, three of each of the following experimental conditions: face, body, chair, room, and the multi-category stimulus, as described in the stimuli section (Fig. 2a). Each miniblock included eight stimuli, of which seven were of different images and one image repeated for the 1-back task. Each miniblock lasted 6 s and was followed by 12 s of fixation. A single stimulus display time was 0.375 s, and the interstimulus interval was 0.375 s. Subjects performed a 1-back task (one repeated stimulus in each block). Each run began with a 6 s (6 TRs) fixation (dummy scan) and lasted a total of 276 s (276 TRs). Subjects were instructed to maintain fixation throughout the run, and their eye movements were recorded with an eye tracker (EyeLink®).

Each functional localizer run included 21 blocks: five baseline fixation blocks and four blocks for each of the five experimental conditions: faces, bodies, nature scenes, objects, and scrambled objects. Each block presented 20 stimuli of 18 different images of which two were repeated for a 1-back task. Each stimulus was presented for 0.4 s with 0.4 s interstimulus interval. Each block lasted 16 s. Each run began with a 6 s fixation (6 TRs) and lasted a total of 342 s (342 TRs).

Data analyses

fMRI data analysis and preprocessing

fMRI analysis was performed using SPM12 software, MATLAB (MathWorks) and R (R Development Core Team, 2011) costumed scripts, FreeSurfer (Dale et al., 1999), pySurfer (https://pysurfer.github.io), and Python (http://www.python.org) costumed scripts for the surface generation and presentation. The code that was used for data analyses is available at https://github.com/gylab-TAU/Multi_Category_Scenes_fMRI_analysis. The first six volumes in each run were acquired during a blank-screen display and were discarded from the analysis as “dummy scans.” The data were then preprocessed using realignment to the mean of the functional volumes and coregistration to the anatomical image (rigid body transformation), followed by spatial normalization to the MNI space. Spatial smoothing was performed for the localizer data only (5 mm). A GLM was performed with separate regressors for each run and each condition, including 24 nuisance-motion regressors for each run (six rigid body motion transformation, six motion derivatives, six square of motion, and six derivatives of the square of motion) and a baseline regressor for each run. In addition, a “scrubbing” method (Power et al., 2012) was applied for every volume with frame displacement >0.9 by adding a nuisance regressor with a value of 1 for that specific volume and zeros for all other volumes. The percent signal change (PSC) for each voxel was calculated for each experimental condition in each run by dividing the beta weight for that regressor by the beta weight of the baseline for that run.

Linear model fitting

The mean PSC across runs to the face, the body, the room, the chair, and the multi-category scene conditions from the main experiment data was extracted for each voxel of each subject. For each subject, we defined a moving mask of a sphere of 27 voxels (i.e., a 3 × 3 × 3 grid). We used a relatively small sphere of 27 voxels to assure that voxels within each sphere are homogeneous in terms of their category selectivity. For each sphere, we fitted a linear model with its voxel data as features (i.e., the PSC in each of these voxels) to predict the response to the multi-category scene based on the response to the isolated categories:Multi-categoryScenePSC=βFace⋅FacePSC+βBody⋅BodyPSC+βRoom⋅RoomPSC+βChair⋅ChairPSC+ε.(1) (1) The beta coefficients of these models represent the contribution of each of the isolated categories to the response to the multi-category scene of each sphere of each subject. Note that the beta coefficients of the multi-category response model are not the same as the betas derived from the standard fMRI GLM analysis. The betas from the latter analysis are used to determine the PSC to each of the single- and multi-category stimuli as a measure of the fMRI response to these stimuli.

Anatomical regions of interest (anatomical ROI) definition

We defined voxels that belong to the ventrotemporal cortex (VTC) and lateral occipital cortex in the right hemisphere by using a mask based on the Harvard-Oxford Atlas (Frazier et al., 2005; Desikan et al., 2006; Makris et al., 2006; Goldstein et al., 2007). We used the max-probability mask (threshold = 0) with a voxel size of 2 × 2 × 2 mm. The ventrotemporal mask included the following areas from the Harvard-Oxford Atlas: inferior temporal gyrus, posterior division; inferior temporal gyrus, temporo-occipital part; parahippocampal gyrus, anterior division; parahippocampal gyrus, posterior division; temporal fusiform cortex, anterior division; temporal fusiform cortex, posterior division; temporal occipital fusiform cortex; and occipital fusiform gyrus (Labels 14–15, 33–34, and 36–39, respectively). The lateral occipital mask included the following areas from the Harvard-Oxford Atlas: middle temporal gyrus, temporooccipital part; lateral occipital cortex, superior division; and lateral occipital cortex, inferior division (Labels 12 and 21–22, respectively).

We selected the area labeled frontal pole (Label 0) as a control nonvisual area. The number of spheres that was randomly selected to be included in this control area was the average number of spheres of the category-selective areas for each participant.

Voxels definition by category selectivity

Based on the functional localizer data, we estimated the selectivity of each voxel of individual subjects for the face and the body by using contrast t-maps of face > object, body > object, and outdoor scenes > object, respectively. We used only these three categories since their definition shares the same baseline (i.e., they are all compared with object). We excluded the general object-selective region since the common definition of these areas (objects > scrambled objects) will result in areas that are not category specific but are similarly responsive to all object categories. Within each anatomical ROI (i.e., VTC and lateral occipital cortex), we defined several types of voxels based on their selectivity for these three categories. Face-selective voxels were defined as voxels that are selective only for faces over objects (p < 0.0001) and to faces over bodies (p < 0.0001) and not selective for bodies and places (p > 0.01). These criteria assured that the majority of the neurons within these voxels are face selective. One subject did not have face-selective voxels (i.e., selective only for faces and not for other categories). Similarly, we defined body-selective voxels as voxels selective only for bodies over objects (p < 0.0001) and to bodies over faces (p < 0.0001) but not selective for faces or places over objects (p > 0.01). In addition, we defined face- and body-selective voxels, by selecting voxels that are selective for both faces and bodies (but not for places). These voxels contain clusters of face-selective neurons and body-selective neurons. All participants showed voxels that were selective to both the face and the body in the VTC. In the lateral occipital cortex, only 5 out of 15 participants showed voxels that are selective to both the face and the body. Because the main novel contribution of this work is the response of the ROI selective to both faces and bodies, we focused only on the ventral–temporal ROIs and did not include the lateral occipital ROIs in our analyses.

The contribution of each category to the multi-category scene representation

For each subject, we calculated the betas of the model from Equation 1 for spheres of 27 voxels in the category-selective areas described above (see model fitting description above). To reduce statistical dependency as a result of the overlapping moving mask, we calculated the mean using an interleaved mask, taking only spheres that their centers are not immediately adjacent to one another. We computed the mean beta across all spheres in an ROI for each participant and across participants. We calculated the variance inflation factor (VIF), which provides a measure of multicollinearity of the beta coefficients, and removed spheres in which the VIF was larger than 10. We then performed repeated measure ANOVAs to examine the contribution of the isolated categories to the multi-category scene for voxels selective for pairs of categories within the VTC. We used category (face, body, room, and chair) and ROI selectivity (face selective, body selective, and face and body selective) as within-subject factors and the beta coefficients of the multi-category response model as a dependent variable. To test our specific hypothesis with respect to the representation of faces and bodies relative to the other categories in the different ROIs, we used paired t test. One subject did not have face-selective voxels (i.e., selective only for faces and not for other categories), and one subject did not have body-selective voxels (i.e., selective only for bodies and not for other categories) based on the abovementioned criteria and therefore was not included in statistical analysis that compared between these ROIs. These subjects were included in the analysis of other category-selective areas.

Defining nonsaturated voxels

To test whether the BOLD response to the multi-category scene was saturated, we conducted the following analysis. For each voxel, we compared the maximum response (PSC) to a single-category and to the response to the multi-category scene:Nonsaturatedvoxels:max{FacePSC,BodyPSC,ChairPSC,RoomPSC}>Multi-categoryScenePSC. A nonsaturated response to the multi-category scene is evident in voxels that show higher response to a single-category than to a multi-category stimulus.

Predictions

We measured the fMRI response to a multi-category scene of a person in a room with a chair and to each of its isolated categories (see Fig. 2 and Materials and Methods). The predictions of the response to the multi-category scene according to the normalization model in the different category-selective voxels are specified in Figure 1 (see Appendix for complete mathematical derivations and predictions). The normalization model predicts that in voxels that are selective for either the face or the body—therefore containing one homogeneous population of face- or body-selective neurons—the representation of multi-category scenes will be biased to the preferred category, decluttering nonpreferred stimuli in the scene. In addition, in voxels that are selective to both the face and the body—therefore containing two homogeneous populations of face- and body-selective neurons—the representation of multi-category scenes will be biased to both preferred face and body categories, decluttering nonperson stimuli (chair and room) within the scene.

Results

A clutter-tolerant representation of the whole person in face- and body-selective areas

We defined three types of voxels in the ventrotemporal area based on their selectivity for the isolated categories (see Materials and Methods): (1) face voxels, voxels selective for faces but not for bodies, nonliving objects, or places; (2) body voxels, voxels selective for bodies but not for faces, nonliving objects, or places; and (3) face–body-selective voxels, voxels selective for both faces and bodies but not for nonliving objects and places (usually located at the border between face and body areas; see Materials and Methods and Fig. 1). To assess the contribution of each category to the representation of the multi-category scene, subjects viewed a different, independent set of stimuli containing a multi-category visual scene of a whole person standing next to a chair located inside a room, as well as stimuli of each of the components of the scene shown separately (Fig. 2a). We extracted the mean PSC response from each voxel for each of the isolated-component stimuli. We then used these voxel-level PSCs of each component of the multiscene as predictors for a linear model and the PSC response for the multi-category scene as the predicted variable (see Equation 1):

The beta coefficients of the above model represent the contribution of each of the isolated categories to the response to the multi-category scene. For each subject, we defined a moving mask of a sphere of 27 (3 × 3 × 3) voxels. For each sphere, we fitted a linear model with its voxel data as features to predict the response to the multi-category scene from the response to each its component. We included only interleaved spheres to avoid high overlap and statistical dependency between overlapping spheres. Note that the beta coefficients of the multi-category response model indicate the predicted contribution of each category to the fMRI response to the multi-category scene, not the betas derived from the standard fMRI GLM analysis.

Figure 3a–c depicts the contribution of each of the isolated categories to the response to the multi-category scene [i.e., the beta coefficients of the above linear model in (1) for the face-selective, body-selective, and face-body selective voxels of each participant and averaged across participants]. We performed a repeated measures ANOVA with category (face, body, chair, and room) and voxel selectivity (face-selective voxels, body-selective voxels, and face- and body-selective voxels) as within-subject factors and the contribution to the complex scene representation (i.e., the beta coefficients of the linear model) as the dependent variable. We found a significant effect for category [F(3,36) = 35.363; p < 0.0001; ηG2=0.526 ] as well as a significant category × voxel selectivity interaction [F(6,72) = 8.625; p < 0.0001; ηG2=0.275 ].

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

The contribution of single categories to the representation of a multi-category scene in face- and body-selective voxels. a–c, The contribution of each isolated category to the representation of the multi-category scene in the right ventrotemporal face- and body-selective voxels as depicted by the β coefficients of the linear model (Eq. 1) in (a) face-selective voxels, (b) body-selective voxels, and (c) face- and body-selective voxels. Each dot indicates the mean β of a single subject. Gray lines connect the β's of the same subject. Diamonds and error bars indicate the group mean and SEM, respectively. Note that the β's of the linear model are not the betas extracted from the GLM that evaluates the correspondence between the fMRI hemodynamic response and stimulus presentation but indicate the contribution of each category to the response to the multi-category scene.

We performed paired t tests to compare the contribution of the preferred relative to the nonpreferred categories. In line with the predictions of the normalization model (Fig. 1), the contribution of the face to the representation of the multi-category scene in face-selective voxels was higher than the contribution of each of the other categories [βFace−βBody : mean = 0.264, t(12) = 3.274, p = 0.0200, Cohen's d = 0.908, 95% CI (0.088, 0.440); βFace−βRoom : mean = 0.452, t(12) = 6.707, p < 0.0001, Cohen's d = 1.860, 95% CI (0.305, 0.598); βFace−βChair : mean = 0.588, t(12) = 6.347, p = 0.0001, Cohen's d = 1.760, 95% CI (0.386, 0.790); all p values are corrected for multiple comparisons; Fig. 3a]. Similarly the contribution of the body was higher than the contribution of each of the other categories in the body-selective voxel categories [βBody−βFace : mean = 0.366, t(12) = 5.443, p = 0.0004, Cohen's d = 1.510, 95% CI (0.220, 0.513); βBody−βRoom : mean = 0.422, t(12) = 5.226, p = 0.0006, Cohen's d = 1.450, 95% CI (0.246, 0.598); βBody−βChair : mean = 0.421, t(12) = 4.979, p = 0.0010, Cohen's d = 1.381, 95% CI (0.237, 0.606); all p values are corrected for multiple comparisons; Fig. 3b]. Finally, the contribution of the preferred face and body was higher than the contribution of each of the nonpreferred categories in the face- and body-selective voxels [(βFace+βBody)/2−βRoom : mean = 0.516, t(12) = 8.729, p < 0.0001, Cohen's d = 2.421, 95% CI (0.387, 0.645); (βFace+βBody)/ 2−βChair : mean = 0.447, t(12) = 5.789, p = 0.0003, Cohen's d = 1.605, 95% C.I. (0.279, 0.615); all p values are corrected for multiple comparisons; Fig. 3c]. The difference between the contribution of the face and the body in these voxels was not statistically significant [βFace−βBody : mean = −0.012, t(12) = −0.164, p = 0.872 (not corrected), Cohen's d = −0.046, 95% CI (−0.1707, 0.146)]. When comparing the null model against the alternative using Bayesian t test, the null model is preferred over the alternative (BF = 0.282). In addition, the sum of betas within each type of voxels is a little over 1, as predicted by the mathematical model [see Appendix 1; βFace+βBody+βRoom+βChair : face only: mean = 1.302, SD = 0.301; body only: mean = 1.012, SD = 0.128; face–body: mean = 1.268; SD = 0.222].

To assess goodness of fit of the normalization model to the response to the multi-category scene, we computed the R2 for each sphere. Figure 4a–c shows the distribution of the R2 values in face- and body-selective areas. The overall median R2 = 0.817 indicates a good fit of the proposed model to the data from these areas. For comparison, we defined a control area in the frontal lobe that does not show visual category selectivity (the frontal pole; see Materials and Methods). The R2 in this region was much lower median R2 = 0.333 (Fig. 4d), indicating that, consistent with our predictions, the normalization model accounts for the response in the visual category-selective cortex.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

The distribution and median of R2 values of all linear models (Eq. 1) calculated for each sphere and subject for (a) face-selective voxels, (b) body-selective voxels, (c) face- and body-selective voxels, and (d) control nonvisual area in the frontal pole.

Results reported so far show that the response of voxels that are selective to both faces and bodies shows a different pattern of response than voxels that are either face or body selective. To further demonstrate that the three ROIs show distinct response characteristics, we plotted a scatterplot of the difference between βFace and βBody (i.e., difference in contribution of the face and the body to the response to the multi-category scene) over the relative selectivity for faces and for bodies, which was measured independently by the functional localizer, for all spheres of all subjects (Fig. 5). This scatterplot demonstrates that the bias toward either face or body in the response to the multi-category scene is associated with the selectivity to these categories. Moreover, it shows that the equal contribution of the face and the body to the response to the multi-category scene is a feature of a subpopulation of voxels that are selective to both the face and the body.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

The contribution of the face and the body to the multi-category scene covaries with face–body selectivity. The scatterplot shows the covariability of the difference in the contribution of the face and the body to the multi-category scene (y-axis) and the difference between face and body selectivity that was independently measured by the functional localizer (x-axis) for each sphere of all subjects.

We performed a mixed-model linear regression with a random intercept for subjects and found a positive association between βFace−βBody and the relative selectivity of face > body [χ2=68.372 when compared with the null model, p < 0.0001; fixed-effect estimates: slope = 0.041, SD = 0.005, t = 8.697, p < 0.0001; intercept = −0.059, SD = 0.040, t = −1.481, p = 0.158]. Similar findings were reported by Kliger and Yovel (2020) who presented the face, body, and whole person in isolation.

Taken together, these findings are consistent with predictions of the normalization model (Fig. 1), indicating that voxels that are selective for both the face and the body generate a representation that is biased to the whole person, while voxels that are selective for either of the single categories generate a representation that is biased to their preferred category.

Testing an alternative account to the normalization model: a summation model

An alternative summation account for our findings, which does not rely on the assumption of normalization, suggests that if a mixture of category-selective neurons responds independently, their combined response would be the sum of their individual responses to their preferred categories. This would lead to each beta value in the model (Eq. 1) to be equal to 1. Our results do not support this summation prediction (Fig. 3). Still, a lower than the sum response to the multi-category scene may be due to saturation of the BOLD signal, rather than a lack of support for a summation model. However, under the summation model, the response to the single categories would never be higher than the response to the multi-category scene. We therefore examined whether such a pattern of activation exists. A voxel-wise analysis reveals a large proportion of voxels that show this pattern of response. We found that, in contrast to the summation account, 65.74% of the voxels in VTC that are selective to faces or bodies (face selective, 61.76%; body selective, 77.95%; face–body selective, 55.81%) showed a higher response to a single-category relative to the response to the multi-category scene (Fig. 6a–c). Subjects who did not have enough voxels in specific areas were excluded from ANOVA that examined all ROIs, but whenever available, their data are shown in Figure 6d–i and tested for planned t test (two additional subjects with no face-only voxels, one additional subject with no body-only voxels, three subjects with no face–body voxels).

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

The contribution of single categories to the representation of a multi-category scene in nonsaturated face- and body-selective voxels. a–c, The proportion of voxels that show a higher response to single-category than multi-category scenes (nonsaturated voxels) in (a) face-selective, (b) body-selective, and (c) face and body-selective areas. d–f, The contribution of each isolated category to the representation of the multi-category scene in face- and body-selective voxels as depicted by the β's of the linear model (Eq. 1) in (a) nonsaturated face-selective voxels, (b) nonsaturated body-selective voxels, and (c) nonsaturated face- and body-selective voxels. Each dot indicates the mean β of a single subject. Gray lines connect the β's of the same subject. Diamonds and error bars indicate the group mean and SEM, respectively. g–i, The distribution and median of R2 values of all models (Eq. 1) for all spheres and all subjects for (g) face-selective voxels, (h) face-selective voxels, and (i) face and body-selective voxels.

We performed a similar analysis to the one described above (Fig. 3), only for the nonsaturated voxels—voxels that showed higher response to single-category than multi-category stimuli. Figure 6d–f depicts the contribution of each of the isolated categories to the response of the multi-category scene [i.e., the beta coefficients of the above linear model in (1) for only nonsaturated voxels that are face selective, body selective, and face–body selective of each participant and averaged across participants]. Results are similar to the previous analysis (Fig. 3). Figure 6g–i depicts the distribution of the R2 values of all models calculated for all spheres centered in nonsaturated voxels for all subjects. The R2 values (median = 0.790) were as high as those reported above for all voxels. Statistical analysis yielded similar results as well. A repeated measures ANOVA showed a significant effect for category [F(3,21) = 13.176; p < 0.0001; ηG2=0.422 ] as well as a significant category × voxel selectivity interaction [F(6,42) = 5.699; p ≤ 0.0002; ηG2=0.277 ]. Additionally, the contribution of the preferred categories was also significantly higher than the contribution of the nonpreferred categories. We performed paired t tests to compare the contribution of the preferred relative to the nonpreferred categories. In line with the predictions of the normalization model (Fig. 1), the contribution of the face to the representation of the multi-category scene was higher than the contribution of all other categories in face-selective voxels [βFace−βBody : mean = 0.293, t(11) = 2.83, p = 0.0491, Cohen's d = 0.817, 95% CI (0.065, 0.522); βFace−βRoom : mean = 0.445, t(11) = 4.337, p = 0.0035, Cohen's d = 1.252, 95% CI (0.219, 0.670); βFace−βChair : mean = 0.533, t(11) = 4.70, p = 0.0019, Cohen's d = 1.357, 95% CI (0.283, 0.782); all p values are corrected for multiple comparisons; Fig. 6d]. Similarly, the contribution of the body was higher than the contribution of all other categories in the body-selective voxels [βBody−βFace : mean = 0.352, t(12) = 4.875, p = 0.0011, Cohen's d = 1.351, 95% CI (0.194, 0.509); βBody−βRoom : mean = 0.357, t(12) = 5.363, p = 0.0005, Cohen's d = 1.487, 95% CI (0.212, 0.503); βBody−βChair : mean = 0.299, t(12) = 5.593, p = 0.0004, Cohen's d = 1.551, 95% CI (0.182, 0.415); all p values are corrected for multiple comparisons; Fig. 6e]. For the face- and body-selective voxels, the contribution of the preferred face and body was higher than the contribution of the nonpreferred categories [(βFace+βBody)/2−βRoom : mean = 0.511, t(11) = 6.081, p = 0.0002, Cohen's d = 1.755, 95% CI (0.326, 0.696); (βFace+βBody)/2 −βChair : mean = 0.422, t(11) = 5.485, p = 0.0006, Cohen's d = 1.583, 95% CI (0.253, 0.591); all p values are corrected for multiple comparisons; Fig. 6f]. The difference between the contribution of the face and the body in these voxels was not statistically significant [βFace−βBody : mean = 0.069, t(11) = 0.783, p = 0.450 (not corrected), Cohen's d = 0.226, 95% CI (−0.124, 0.261)]. When comparing the null model with the alternative using Bayesian t test, the null model is preferred over the alternative (BF = 0.373). These findings are consistent with the prediction of the normalization model (Fig. 1) and the results reported above that include all data (Fig. 3), indicating that a simple summation account does not support the observed findings.

Finally, we examined behavioral measures during the fMRI scanning for the different stimulus categories. We measured performance on the 1-back task across the different categories. Performance was at ceiling for all categories. The mean accuracy was the following: multi-category scene = 0.99 (SD = 0.010); face = 0.98 (SD = 0.018); body = 0.98 (SD = 0.014); room =0.97 (SD = 0.019); and chair = 0.98 (SD = 0.015). In addition, we displayed the eye fixation patterns for each category. Figure 7 shows an overall similar pattern of fixations across the different stimuli, indicating that participants followed the instructions to focus on the fixation dot that was presented in the same location on screen across the different conditions during the experiment.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Eye tracker fixation patterns. Hit map of fixations duration along all experimental trials of all subjects. Background image is the picture with original location, size, and fixation point as was presented in the experiment. The data is displayed for the different experimental conditions: (a) face, (b) body, (c) room, (d) chair, and (e) multi-category scene.

Discussion

The functional properties of face- and body-selective areas have been extensively investigated in numerous neuroimaging and neurophysiological studies in humans and monkeys in the past two and a half decades. Still, the functional significance of their separation and adjacent locations remained unclear (for recent reviews, see Hu et al., 2020; Taubert et al., 2022). Our study provides a mechanistic account for this long-standing puzzle by considering the operation of a well-established normalization model on distinct face- and body-selective regions that reside in proximal anatomical locations. Consistent with predictions of the normalization model (see Fig. 1 and mathematical derivations in Appendix), we found that the representation of the multi-category scene was dominated by the face in face-selective areas, by the body in body-selective areas, and by both the face and body (i.e., the whole person) in the border between the face-selective and body-selective areas that is selective to both categories, filtering out the nonpreferred categories. To reveal this pattern of response, our study presented the whole person within a multi-category scene (Fig. 1), unlike previous studies that presented an isolated face, body, or whole person (Song et al., 2013; Bernstein et al., 2014; Kaiser et al., 2014; Fisher and Freiwald, 2015; Kliger and Yovel, 2020; Zafirova et al., 2022). This enabled us to measure the relative contribution of both the preferred face and body categories and nonpreferred categories to the multi-category scene.

Consistent with predictions of the normalization model (Fig. 1), we found that the proximal location of face- and body-selective areas enables the generation of a clutter-tolerant representation of the meaningful combination of the face and body (the whole person). This machinery eliminates the need for a dedicated population of neurons that are selective to the combined whole person stimulus. The generation of a clutter-tolerant representation of the whole person in neighboring face and body areas is accomplished through the same normalization mechanism that declutters the preferred single categories within their category-selective cortex (Reddy et al., 2009; Bao and Tsao, 2018; Kliger and Yovel, 2020). According to the normalization model, when the normalization pool of a face-selective neuron is selective to bodies (or when the normalization pool of a body-selective neuron is selective to faces), the response to the multi-category scene will be a weighted mean of the response to the two categories, and reduced response to the nonpreferred categories (see Appendix for mathematical derivations), essentially generating a clutter-tolerant representation of the whole person. Furthermore, we demonstrated that an alternative model, which predicts summation instead of normalization, was not supported by the data.

Voxels that are selective to both the face and body typically reside in the border between the face- and body-selective areas. Whereas our design does not allow us to determine whether these voxels contain two populations of face- and body-selective neurons or one population of person-selective neurons, previous studies suggest that the former might be the case. Kaiser et al. (2014) used multivoxel pattern analysis to ask if a person-selective region (a region that shows higher response to person than object stimuli) is composed of one population of person-selective neurons or nearby face-selective and body-selective neurons. Their results support the latter conclusion, though as noted by the authors, they do not provide conclusive evidence for the absence of person-selective neurons in this region. It should be noted that neurons responsive to the whole person were reported in the upper bank of the STS (Wachsmuth et al., 1994; see also fMRI findings by Fisher and Freiwald, 2015). This upper bank of the monkey STS may be the homolog of human STS (Yovel and Freiwald, 2013), which is known to show selectivity to the biological motion of the whole person (Thompson et al., 2005). Another recent study has found neurons responding to the face and body in a patch that is selective to whole person natural configuration in the anterior IT (Zafirova et al., 2024). Our findings specifically focus on the border between the face- and body-selective areas in the posterior VTC, where to the best of our knowledge, such person-selective neurons were not reported. Future studies that will record neurons that reside in the border between face- and body-selective regions are required to further support this claim.

In a recent extensive review of the literature on the response of the visual cortex to faces, bodies, and the whole person, Taubert et al. (2022) have proposed four hypotheses regarding the organization of face- and body-selective regions including separate networks, weakly integrated networks, strongly integrated networks, or a single network. They concluded that current data do not fully support any of these hypotheses and called for future studies that will combine the face and body to address these open questions. Our study goes beyond their hypotheses by predicting that the significance of this functional organization will be evident when the whole person is presented in multi-category scenes. Thus, our findings show the benefit of processing faces and bodies by both separated networks for enhancing the representations of the face or the body as well as integrated networks for enhancing the representation of the whole person when presented in multi-category scenes.

The question of whether faces and bodies are processed by separated or integrated systems has been also discussed in a recent review by Hu et al. (2020). According to their suggested model, faces and bodies are processed separately in posterior brain areas (OFA and EBA) but are integrated to the whole person in more anterior regions, the FFA and FBA (Song et al., 2013; Bernstein et al., 2014; Fisher and Freiwald, 2015). This model is consistent with our findings as the face and body adjacent voxels are primarily located in the fusiform gyrus (Fig. 1), whereas more lateral and posterior face- and body-selective voxels (i.e., OFA and EBA) are located more distant from each other. Indeed, we found that the face- and body-selective regions were proximal only in third of the participants in the lateral occipital cortex, whereas in the VTC, the face and body regions reside adjacently in all our participants (Weiner and Grill-Spector, 2010, 2013). We suggest that functional organization enables such independent and integrated processing of the face and body, either by clusters of neurons that are located more remotely from one another (mostly posteriorly) or by nearby face and body regions (mostly ventrally), respectively.

The well-established organization of the category-selective visual cortex has generated different hypotheses with respect to their emergence in particular locations in the high-level visual cortex (Saygin et al., 2016; Deen et al., 2017; van den Hurk et al., 2017). Recently, Op de Beeck et al. (2019) have proposed three main factors that determine where category-selective areas emerge in the visual cortex: (1) preexisting feature selectivity, (2) computational hierarchy, and (3) domain-specific connectivity to areas outside the visual stream. The current study goes beyond the representation of single categories and highlights the benefit of positioning different category-selective regions in proximity, in particular to the representation of multi-category scenes. Theories that attempt to account for the predetermined locations of category-selective areas should also consider the functional significance of their relative proximity for resolving the computational challenges of representing objects in clutter.

The present study proposes a bottom-up mechanism that can bias the response to certain, significant categories by clustering homogeneous category-selective neurons. Yet, other mechanisms have been suggested to bias the response to specific stimuli. For example, bottom-up, stimulus-driven mechanisms based on stimulus saliency can allocate resources toward a specific target (Beck and Kastner, 2005). Furthermore, a normalization operation was also shown to account for top-down mechanisms of selective attention that resolves competition among multiple stimuli (Desimone and Duncan, 1995; Reddy et al., 2009; Reynolds and Heeger, 2009) Thus, the proposed hardwired mechanism acts in concordance with other bottom-up and top-down mechanisms to resolve the challenge of processing rich, multi-category scenes (Pessoa et al., 2003; McMains and Kastner, 2011).

The biased representation to the whole person that we revealed is in line with behavioral studies that reported evidence for the preferred processing of the whole person. For example, Mayer et al. (2015) showed that stimuli of the whole person pop out in cluttered scenes relative to other nonhuman stimuli. Downing et al. (2004) showed that they capture attention even when they are unattended. Privileged detection for whole person and faces relative to objects was also found in continuous flash suppression tasks (Stein et al., 2012). The clutter-tolerant representation that we revealed here for the whole person may underlie these behavioral effects.

To summarize, our study offers a unified mechanistic account for long-standing questions about the neural representations of the face, the body, and the whole person in the high-level visual cortex. We explain how the same normalization mechanism enables the generation of a clutter-tolerant representation of each socially significant component (face or body) and their meaningful combination (whole person), thanks to the neighboring cortical locations of distinct clusters of face- and body-selective neurons. More generally, our study reveals a new mechanism that is used by the visual system to resolve the challenging task of processing socially meaningful stimuli in cluttered scenes.

Data Availability Statement

Data that were collected in this study will be available at https://openneuro.org. Tables of preprocessed data as well as the code that was used to generate the analysis, figures, and statistics are available at https://github.com/gylab-TAU/Multi_Category_Scenes_fMRI_analysis.

Appendix: Mathematical derivations of a model predicting the representation of a multi-category scene composed of four categories

According to the normalization model (Reynolds and Heeger, 2009), the measured neuronal response of a specific neuron (i.e., neuron j) to multi-category stimuli is divided by the sum of the responses of the surrounding neurons. This can be described by the following equation for a multi-category stimulus composed of four categories, denoted by A–D, such as the stimulus shown in Figure 1b, given by the following equation:Rj(A+B+C+D)=γAj+Bj+Cj+Djσ+∑kAk+∑kBk+∑kCk+∑kDk, where the measured response of a specific neuron (i.e., neuron j) to the stimuli presented together, Rj(A+B+C+D) , equals the response of the neuron to the sum of the stimuli, Aj+Bj+Cj+Dj , divided by the sum of the responses of the surrounding neurons (the normalization pool) to the stimuli, ∑kAk+∑kBk+∑kCk+∑kDk , and a constant, σ. The constants γ and σ are the free parameters that are fitted to the data.

We apply this equation to the multi-category scene that we used in our study, which is composed of a person, a chair, and a room. Therefore, we have four categories presented in this multi-category scene, denoted as follows: a face (F), a body (B), a chair (C), and a room (i.e., a place, P):Rj(F+B+C+P)=γFj+Bj+Cj+Pjσ+∑kFk+∑kBk+∑kCk+∑kPk. We can rewrite the normalization equation to express the response to a multi-category scene as a linear combination of the responses to each of the isolated categories composing the scene (Fig. 1c).

We can separate the right side of the equation into two parts, yielding the following:Rj(F+B+C+P)=γFjσ+∑kFk+∑kBk+∑kCk+∑kPk+γ(Bj+Cj+Pj)σ+∑kFk+∑kBk+∑kCk+∑kPk. Next, we multiply the face part by a term equal to 1:Rj(F+B+C+P)=σ+∑kFkσ+∑kFk⋅γFjσ+∑kFk+∑kBk+∑kCk+∑kPk+γ(Bj+Cj+Pj)σ+∑kFk+∑kBk+∑kCk+∑kPk. Rewriting the equation, it becomes the following:Rj(F+B+C+P)=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk⋅γFjσ+∑kFk+γ(Bj+Cj+Pj)σ+∑kFk+∑kBk+∑kCk+∑kPk. Since the response to the isolated face according to the normalization model is given byRj(F)=γFjσ+∑kFk, the response to the multi-category scene becomes the following:Rj(F+B+C+P)=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk⋅Rj(F)+γ(Bj+Cj+Pj)σ+∑kFk+∑kBk+∑kCk+∑kPk. We can write this equation as follows:Rj(F+B+C+P)=βFace⋅Rj(F)+γ(Bj+Cj+Pj)σ+∑kFk+∑kBk+∑kCk+∑kPk, whereβFace=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk is the coefficient of the face in this linear combination. Note that the coefficient depends only on the selectivity of the surrounding neurons (i.e., the normalization pool) for the multi-category scene that was presented, and not on the selectivity of neuron j itself.

For simplicity, we showed only the derivations for the face weight, but similar derivations can be performed for all other categories, yielding the following:Rj(F+B+C+P)=βFace⋅Rj(F)+βBody⋅Rj(F)+βChair⋅Rj(C)+βPlace⋅Rj(P), βFace=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk, βBody=σ+∑kBkσ+∑kFk+∑kBk+∑kCk+∑kPk, βChair=σ+∑kCkσ+∑kFk+∑kBk+∑kCk+∑kPk, βPlace=σ+∑kPkσ+∑kFk+∑kBk+∑kCk+∑kPk, where the coefficient of each category depends only on the ratio between the selectivity of the normalization pool for each of the categories and the selectivity for the other categories.

A face-selective neuron that resides in a face-selective area responds more to faces than each of the other categories, that is,Fj≫Bj,Cj,Pj, and is surrounded by neurons that are mostly face-selective, that is,∑kFk≫∑kBk,∑kCk,∑kPk. Based on the normalization equation, we can predict that the response to the multi-category scene will be dominated by the response to the face (Fig. 1d):βFace>βBody,βChair,βRoom. In addition, some category-selective areas reside in neighboring locations, and the border between them contains two populations of neighboring neurons that are selective for either one of the two categories—for example, an area with a similar proportion of neurons that are selective for faces and bodies but not for chairs and places, such as in the border between the face and body areas, that is,∑kFk≈∑kBk, ∑kFk,∑kBk≫∑kCk,∑kPk. Based on the normalization equation, we can predict that the response to the multi-category scene will be dominated by the responses both to the face and to the body (Fig. 1f):βFace,βBody>βChair,βRoom, βFace≈βBody. In other words, in an area that is selective for faces and bodies (but not places or chairs), the response to the multi-category scene would be biased to the whole person, while filtering out the other categories presented in the multi-category scene, that is, the chair and the room.

We can further see that the difference between the coefficients of two categories, for example, the face and the body, is given by the following:βFace−βBody=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk−σ+∑kBkσ+∑kFk+∑kBk+∑kCk+∑kPk =∑kFk−∑kBkσ+∑kFk+∑kBk+∑kCk+∑kPk, that is, the difference between the coefficients is determined by the difference in the selectivity of the normalization pool for the two categories. Thus, as shown before, for areas that contain proximal face- and body-selective clusters of neurons, the response to the multi-category scene is equally biased to the two categories (i.e., the face and the body), while the chair and the room are filtered out with coefficients close to zero:βChair,βRoom≈0, ∑kCk,∑kPk≈0. Finally, the sum of the coefficients is given by the following:βFace+βBody+βChair+βPlace=4σ+∑kFk+∑kBk+∑kCk+∑kPkσ+∑kFk+∑kBk+∑kCk+∑kPk, =σ+∑kFk+∑kBk+∑kCk+∑kPkσ+∑kFk+∑kBk+∑kCk+∑kPk+4σσ+∑kFk+∑kBk+∑kCk+∑kPk =1+4σσ+∑kFk+∑kBk+∑kCk+∑kPk, that is, the sum of the coefficients is slightly higher than 1, being equal to 1 plus a small positive term. σ is usually a small positive number (Reynolds and Heeger, 2009).

When using fMRI to measure the response to a multi-category stimulus, we measure the BOLD signal, which is an estimate of the response of thousands of neurons in a small patch of the cortex (e.g., a 2 × 2 × 2 mm3 voxel). The response of all the neurons in a voxel can be written as follows:∑jRj(F+B+C+P)=∑j(γFj+Bj+Cj+Pjσ+∑k(j)Fk(j)+∑k(j)Bk(j)+∑k(j)Ck(j)+∑k(j)Pk(j)), where k(j) indicates the k-th neuron in the normalization pool of neuron j.

Assuming that all neurons in a given voxel have a similar normalization pool, that is, a similar surrounding, we can rewrite the equation such that k is no longer a function of j:∑jRj(F+B+C+P)≈∑j(γFj+Bj+Cj+Pjσ+∑kFk+∑kBk+∑kCk+∑kPk). Now, following the exact same derivations made for a single neuron, we can rewrite the equation of the expected response as a linear combination of the sum of responses of the neurons with the same coefficients used for a single neuron:∑jRj(F+B+C+P)=βFace⋅∑jRj(F)+βBody⋅∑jRj(F)+βChair⋅∑jRj(C)+βRoom⋅∑jRj(P), βFace=σ+∑kFkσ+∑kFk+∑kBk+∑kCk+∑kPk, βBody=σ+∑kBkσ+∑kFk+∑kBk+∑kCk+∑kPk, βChair=σ+∑kCkσ+∑kFk+∑kBk+∑kCk+∑kPk, βRoom=σ+∑kPkσ+∑kFk+∑kBk+∑kCk+∑kPk. These coefficients are dependent on the local selectivity for each of the isolated categories, which can be measured effectively by fMRI.

Footnotes

  • This work was supported by grants from the Israel Science Foundation (446/16, 917/21).

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Libi Kliger at libi.kliger{at}gmail.com or Galit Yovel at gality{at}tauex.tau.ac.il.

SfN exclusive license.

References

  1. ↵
    1. Baeck A,
    2. Wagemans J,
    3. de Beeck HP
    (2013) The distributed representation of random and meaningful object pairs in human occipitotemporal cortex: the weighted average as a general rule. NeuroImage 70:37–47. https://doi.org/10.1016/j.neuroimage.2012.12.023
    OpenUrl
  2. ↵
    1. Baldassano C,
    2. Beck DM,
    3. Fei-Fei L
    (2016) Human–object interactions are more than the sum of their parts. Cereb Cortex 27:2276–2288. https://doi.org/10.1093/cercor/bhw077 pmid:27073216
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bao P,
    2. Tsao DY
    (2018) Representation of multiple objects in macaque category-selective areas. Nat Commun 9:1774. https://doi.org/10.1038/s41467-018-04126-7 pmid:29720645
    OpenUrlCrossRefPubMed
  4. ↵
    1. Beck DM,
    2. Kastner S
    (2005) Stimulus context modulates competition in human extrastriate cortex. Nat Neurosci 8:1110–1116. https://doi.org/10.1038/nn1501 pmid:16007082
    OpenUrlCrossRefPubMed
  5. ↵
    1. Bernstein M,
    2. Oron J,
    3. Sadeh B,
    4. Yovel G
    (2014) An integrated face–body representation in the fusiform gyrus but not the lateral occipital cortex. J Cogn Neurosci 26:2469–2478. https://doi.org/10.1162/jocn_a_00639
    OpenUrlCrossRefPubMed
  6. ↵
    1. Brainard DH
    (1997) The psychophysics toolbox. Spat Vis 10:433–436. https://doi.org/10.1163/156856897X00357
    OpenUrlCrossRefPubMed
  7. ↵
    1. Carandini M,
    2. Heeger DJ
    (2012) Normalization as a canonical neural computation. Nat Rev Neurosci 13:51–62. https://doi.org/10.1038/nrn3136 pmid:22108672
    OpenUrlCrossRefPubMed
  8. ↵
    1. Dale AM,
    2. Fischl B,
    3. Sereno MI
    (1999) Cortical surface-based analysis. NeuroImage 9:179–194. https://doi.org/10.1006/nimg.1998.0395
    OpenUrlCrossRefPubMed
  9. ↵
    1. Deen B,
    2. Richardson H,
    3. Dilks DD,
    4. Takahashi A,
    5. Keil B,
    6. Wald LL,
    7. Kanwisher N,
    8. Saxe R
    (2017) Organization of high-level visual cortex in human infants. Nat Commun 8:13995. https://doi.org/10.1038/ncomms13995 pmid:28072399
    OpenUrlCrossRefPubMed
  10. ↵
    1. Desikan RS, et al.
    (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage 31:968–980. https://doi.org/10.1016/j.neuroimage.2006.01.021
    OpenUrlCrossRefPubMed
  11. ↵
    1. Desimone R,
    2. Duncan J
    (1995) Neural mechanism of selective visual attention. Annu Rev Neurosci 18:193–222. https://doi.org/10.1146/annurev.ne.18.030195.001205
    OpenUrlCrossRefPubMed
  12. ↵
    1. Downing PE,
    2. Bray D,
    3. Rogers J,
    4. Childs C
    (2004) Bodies capture attention when nothing is expected. Cognition 93:27–38. https://doi.org/10.1016/j.cognition.2003.10.010
    OpenUrl
  13. ↵
    1. Fisher C,
    2. Freiwald WA
    (2015) Whole-agent selectivity within the macaque face-processing system. Proc Natl Acad Sci U S A 112:201512378. https://doi.org/10.1073/pnas.1512378112 pmid:26464511
    OpenUrlAbstract/FREE Full Text
  14. ↵
    1. Foster C,
    2. Zhao M,
    3. Bolkart T,
    4. Black MJ,
    5. Bartels A,
    6. Bülthoff I
    (2021) Separated and overlapping neural coding of face and body identity. Hum Brain Mapp 42:4242–4260. https://doi.org/10.1002/hbm.25544 pmid:34032361
    OpenUrlPubMed
  15. ↵
    1. Frazier JA, et al.
    (2005) Structural brain magnetic resonance imaging of limbic and thalamic volumes in pediatric bipolar disorder. Am J Psychiatry 162:1256–1265. https://doi.org/10.1176/appi.ajp.162.7.1256
    OpenUrlCrossRefPubMed
  16. ↵
    1. Goldstein JM,
    2. Seidman LJ,
    3. Makris N,
    4. Ahern T,
    5. O’Brien LM,
    6. Caviness VS,
    7. Kennedy DN,
    8. Faraone SV,
    9. Tsuang MT
    (2007) Hypothalamic abnormalities in schizophrenia: sex effects and genetic vulnerability. Biol Psychiatry 61:935–945. https://doi.org/10.1016/j.biopsych.2006.06.027
    OpenUrlCrossRefPubMed
  17. ↵
    1. Harry BB,
    2. Umla-Runge K,
    3. Lawrence AD,
    4. Graham KS,
    5. Downing PE
    (2016) Evidence for integrated visual face and body representations in the anterior temporal lobes. J Cogn Neurosci 28:1178–1193. https://doi.org/10.1162/jocn_a_00966
    OpenUrlCrossRefPubMed
  18. ↵
    1. Heeger DJ
    (2011) Normalization of cell responses in cat striate cortex. Vis Neurosci 9:51–61. https://doi.org/10.1038/nrn3136 pmid:22108672
    OpenUrlPubMed
  19. ↵
    1. Hu Y,
    2. Baragchizadeh A,
    3. O’Toole AJ
    (2020) Integrating faces and bodies: psychological and neural perspectives on whole person perception. Neurosci Biobehav Rev 112:472–486. https://doi.org/10.1016/j.neubiorev.2020.02.021
    OpenUrlCrossRefPubMed
  20. ↵
    1. Kaiser D,
    2. Peelen MV
    (2018) Transformation from independent to integrative coding of multi-object arrangements in human visual cortex. NeuroImage 169:334–341. https://doi.org/10.1016/j.neuroimage.2017.12.065 pmid:29277645
    OpenUrlCrossRefPubMed
  21. ↵
    1. Kaiser D,
    2. Strnad L,
    3. Seidl KN,
    4. Kastner S,
    5. Peelen MV
    (2014) Whole person-evoked fMRI activity patterns in human fusiform gyrus are accurately modeled by a linear combination of face- and body-evoked activity patterns. J Neurophysiol 111:82–90. https://doi.org/10.1152/jn.00371.2013 pmid:24108794
    OpenUrlCrossRefPubMed
  22. ↵
    1. Kleiner M,
    2. Brainard DH,
    3. Pelli DG,
    4. Broussard C,
    5. Wolf T,
    6. Niehorster D
    (2007) What’s new in Psychtoolbox-3? Perception 39:189. https://doi.org/10.1068/v070821
    OpenUrl
  23. ↵
    1. Kliger L,
    2. Yovel G
    (2020) The functional organization of high-level visual cortex determines the representation of complex visual stimuli. J Neurosci 40:7545–7558. https://doi.org/10.1523/JNEUROSCI.0446-20.2020 pmid:32859715
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. MacEvoy SP,
    2. Epstein RA
    (2009) Decoding the representation of multiple simultaneous objects in human occipitotemporal cortex. Curr Biol 19:943–947. https://doi.org/10.1016/j.cub.2009.04.020 pmid:19446454
    OpenUrlCrossRefPubMed
  25. ↵
    1. MacEvoy SP,
    2. Epstein RA
    (2011) Constructing scenes from objects in human occipitotemporal cortex. Nat Neurosci 14:1323–1329. https://doi.org/10.1038/nn.2903 pmid:21892156
    OpenUrlCrossRefPubMed
  26. ↵
    1. Makris N,
    2. Goldstein JM,
    3. Kennedy D,
    4. Hodge SM,
    5. Caviness VS,
    6. Faraone SV,
    7. Tsuang MT,
    8. Seidman LJ
    (2006) Decreased volume of left and total anterior insular lobule in schizophrenia. Schizophr Res 83:155–171. https://doi.org/10.1016/j.schres.2005.11.020
    OpenUrlCrossRefPubMed
  27. ↵
    1. Mayer KM,
    2. Vuong QC,
    3. Thornton IM
    (2015) Do people ‘pop out’? PLoS One 10:e0139618. https://doi.org/10.1371/journal.pone.0139618 pmid:26441221
    OpenUrlPubMed
  28. ↵
    1. McMains S,
    2. Kastner S
    (2011) Interactions of top-down and bottom-up mechanisms in human visual cortex. J Neurosci 31:587–597. https://doi.org/10.1523/JNEUROSCI.3766-10.2011 pmid:21228167
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Miller EK,
    2. Gochin PM,
    3. Gross CG
    (1993) Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus. Brain Res 616:25–29. https://doi.org/10.1016/0006-8993(93)90187-R
    OpenUrlCrossRefPubMed
  30. ↵
    1. Op de Beeck HP,
    2. Pillet I,
    3. Ritchie JB
    (2019) Factors determining where category-selective areas emerge in visual cortex. Trends Cogn Sci 23:784–797. https://doi.org/10.1016/j.tics.2019.06.006
    OpenUrl
  31. ↵
    1. Pessoa L,
    2. Kastner S,
    3. Ungerleider LG
    (2003) Neuroimaging studies of attention: from modulation of sensory processing to top-down control. J Neurosci 23:3990–3998. https://doi.org/10.1523/jneurosci.23-10-03990.2003 pmid:12764083
    OpenUrlFREE Full Text
  32. ↵
    1. Pinsk MA,
    2. Arcaro M,
    3. Weiner KS,
    4. Kalkus JF,
    5. Inati SJ,
    6. Gross CG,
    7. Kastner S
    (2009) Neural representations of faces and body parts in macaque and human cortex: a comparative fMRI study. J Neurophysiol 101:2581–2600. https://doi.org/10.1152/jn.91198.2008 pmid:19225169
    OpenUrlCrossRefPubMed
  33. ↵
    1. Pinsk MA,
    2. DeSimone K,
    3. Moore T,
    4. Gross CG,
    5. Kastner S
    (2005) Representations of faces and body parts in macaque temporal cortex: a functional MRI study. Proc Natl Acad Sci U S A 102:6996–7001. https://doi.org/10.1073/pnas.0502605102 pmid:15860578
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Power JD,
    2. Barnes KA,
    3. Snyder AZ,
    4. Schlaggar BL,
    5. Petersen SE
    (2012) Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage 59:2142–2154. https://doi.org/10.1016/J.NEUROIMAGE.2011.10.018 pmid:22019881
    OpenUrlCrossRefPubMed
  35. ↵
    1. Premereur E,
    2. Taubert J,
    3. Janssen P,
    4. Vogels R,
    5. Vanduffel W
    (2016) Effective connectivity reveals largely independent parallel networks of face and body patches. Curr Biol 26:3269–3279. https://doi.org/10.1016/j.cub.2016.09.059
    OpenUrlCrossRefPubMed
  36. ↵
    R Development Core Team (2011) R: a language and environment for statistical computing. In R foundation for statistical computing.
  37. ↵
    1. Reddy L,
    2. Kanwisher N
    (2007) Category selectivity in the ventral visual pathway confers robustness to clutter and diverted attention. Curr Biol 17:2067–2072. https://doi.org/10.1016/j.cub.2007.10.043 pmid:17997310
    OpenUrlCrossRefPubMed
  38. ↵
    1. Reddy L,
    2. Kanwisher NG,
    3. Vanrullen R
    (2009) Attention and biased competition in multi-voxel object representations. Proc Natl Acad Sci U S A 106:21447–21452. https://doi.org/10.1073/pnas.0907330106 pmid:19955434
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Reynolds JH,
    2. Heeger DJ
    (2009) The normalization model of attention. Neuron 61:168–185. https://doi.org/10.1016/j.neuron.2009.01.002 pmid:19186161
    OpenUrlCrossRefPubMed
  40. ↵
    1. Rolls ET,
    2. Tovee MJ
    (1995) The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Exp Brain Res 114:149–162. https://doi.org/10.1007/BF00241500
    OpenUrl
  41. ↵
    1. Saygin ZM,
    2. Osher DE,
    3. Norton ES,
    4. Youssoufian DA,
    5. Beach SD,
    6. Feather J,
    7. Gaab N,
    8. Gabrieli JDE,
    9. Kanwisher N
    (2016) Connectivity precedes function in the development of the visual word form area. Nat Neurosci 19:1250–1255. https://doi.org/10.1038/nn.4354 pmid:27500407
    OpenUrlCrossRefPubMed
  42. ↵
    1. Schwarzlose RF,
    2. Baker CI,
    3. Kanwisher N
    (2005) Separate face and body selectivity on the fusiform gyrus. J Neurosci 25:11055–11059. https://doi.org/10.1523/JNEUROSCI.2621-05.2005 pmid:16306418
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. Song Y,
    2. Luo YLL,
    3. Li X,
    4. Xu M,
    5. Liu J
    (2013) Representation of contextually related multiple objects in the human ventral visual pathway. J Cogn Neurosci 25:1261–1269. https://doi.org/10.1162/jocn_a_00406
    OpenUrlCrossRefPubMed
  44. ↵
    1. Stein T,
    2. Sterzer P,
    3. Peelen MV
    (2012) Privileged detection of conspecifics: evidence from inversion effects during continuous flash suppression. Cognition 125:64–79. https://doi.org/10.1016/j.cognition.2012.06.005
    OpenUrlCrossRefPubMed
  45. ↵
    1. Taubert J,
    2. Ritchie JB,
    3. Ungerleider LG,
    4. Baker CI
    (2022) One object, two networks? Assessing the relationship between the face and body-selective regions in the primate visual system. Brain Struct Funct 227:1423–1438. https://doi.org/10.1007/s00429-021-02420-7
    OpenUrl
  46. ↵
    1. Thompson JC,
    2. Clarke M,
    3. Stewart T,
    4. Puce A
    (2005) Configural processing of biological motion in human superior temporal sulcus. J Neurosci 25:9059–9066. https://doi.org/10.1523/JNEUROSCI.2129-05.2005 pmid:16192397
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. van den Hurk J,
    2. Van Baelen M,
    3. Op de Beeck HP
    (2017) Development of visual category selectivity in ventral visual cortex does not require visual experience. Proc Natl Acad Sci U S A 114:E4501–E4510. https://doi.org/10.1073/pnas.1612862114 pmid:28507127
    OpenUrlAbstract/FREE Full Text
  48. ↵
    1. Wachsmuth E,
    2. Oram MW,
    3. Perrett DI
    (1994) Recognition of objects and their component parts: responses of single units in the temporal cortex of the macaque. Cereb Cortex 4:509–522. https://doi.org/10.1093/cercor/4.5.509
    OpenUrlCrossRefPubMed
  49. ↵
    1. Weiner KS,
    2. Grill-Spector K
    (2010) Sparsely-distributed organization of face and limb activations in human ventral temporal cortex. NeuroImage 52:1559–1573. https://doi.org/10.1016/j.neuroimage.2010.04.262 pmid:20457261
    OpenUrlCrossRefPubMed
  50. ↵
    1. Weiner KS,
    2. Grill-Spector K
    (2013) Neural representations of faces and limbs neighbor in human high-level visual cortex: evidence for a new organization principle. Psychol Res 77:74–97. https://doi.org/10.1007/s00426-011-0392-x pmid:22139022
    OpenUrlCrossRefPubMed
  51. ↵
    1. Yovel G,
    2. Freiwald WA
    (2013) Face recognition systems in monkey and human: are they the same thing? F1000Prime Rep 5:10. https://doi.org/10.12703/P5-10 pmid:23585928
    OpenUrlCrossRefPubMed
  52. ↵
    1. Zafirova Y,
    2. Bognár A,
    3. Vogels R
    (2024) Configuration-sensitive face–body interactions in primate visual cortex. Prog Neurobiol 232:102545. https://doi.org/10.1016/j.pneurobio.2023.102545 pmid:38042248
    OpenUrlPubMed
  53. ↵
    1. Zafirova Y,
    2. Cui D,
    3. Raman R,
    4. Vogels R
    (2022) Keep the head in the right place: face–body interactions in inferior temporal cortex. NeuroImage 264:119676. https://doi.org/10.1016/j.neuroimage.2022.119676
    OpenUrl
  54. ↵
    1. Zoccolan D,
    2. Cox DD,
    3. DiCarlo JJ
    (2005) Multiple object response normalization in monkey inferotemporal cortex. J Neurosci 25:8150–8164. https://doi.org/10.1523/JNEUROSCI.2058-05.2005 pmid:16148223
    OpenUrlAbstract/FREE Full Text
Back to top

In this issue

The Journal of Neuroscience: 44 (24)
Journal of Neuroscience
Vol. 44, Issue 24
12 Jun 2024
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Distinct Yet Proximal Face- and Body-Selective Brain Regions Enable Clutter-Tolerant Representations of the Face, Body, and Whole Person
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Distinct Yet Proximal Face- and Body-Selective Brain Regions Enable Clutter-Tolerant Representations of the Face, Body, and Whole Person
Libi Kliger, Galit Yovel
Journal of Neuroscience 12 June 2024, 44 (24) e1871232024; DOI: 10.1523/JNEUROSCI.1871-23.2024

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Distinct Yet Proximal Face- and Body-Selective Brain Regions Enable Clutter-Tolerant Representations of the Face, Body, and Whole Person
Libi Kliger, Galit Yovel
Journal of Neuroscience 12 June 2024, 44 (24) e1871232024; DOI: 10.1523/JNEUROSCI.1871-23.2024
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Data Availability Statement
    • Appendix: Mathematical derivations of a model predicting the representation of a multi-category scene composed of four categories
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • category selectivity
  • clutter
  • face perception
  • high-level visual cortex
  • normalization
  • person perception

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Functional Roles of Gastrin-Releasing Peptide-Producing Neurons in the Suprachiasmatic Nucleus: Insights into Photic Entrainment and Circadian Regulation
  • Brain Topological Changes in Subjective Cognitive Decline and Associations with Amyloid Stages
  • The Functional Anatomy of Nociception: Effective Connectivity in Chronic Pain and Placebo Response
Show more Research Articles

Behavioral/Cognitive

  • Attention Alters Population Spatial Frequency Tuning
  • Complex Impact of Stimulus Envelope on Motor Synchronization to Sound
  • The Molecular Substrates of Second-Order Conditioned Fear in the Basolateral Amygdala Complex
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.