Abstract
Understanding social interaction requires processing social agents and their relationships. The latest results show that much of this process is visually solved: visual areas can represent multiple people encoding emergent information about their interaction that is not explained by the response to the individuals alone. A neural signature of this process is an increased response in visual areas, to face-to-face (seemingly interacting) people, relative to people presented as unrelated (back-to-back). This effect highlighted a network of visual areas for representing relational information. How is this network organized? Using functional MRI, we measured the brain activity of healthy female and male humans (N = 42), in response to images of two faces or two (head-blurred) bodies, facing toward or away from each other. Taking the facing > non-facing effect as a signature of relation perception, we found that relations between faces and between bodies were coded in distinct areas, mirroring the categorical representation of faces and bodies in the visual cortex. Additional analyses suggest the existence of a third network encoding relations between (nonsocial) objects. Finally, a separate occipitotemporal network showed the generalization of relational information across body, face, and nonsocial object dyads (multivariate pattern classification analysis), revealing shared properties of relations across categories. In sum, beyond single entities, the visual cortex encodes the relations that bind multiple entities into relationships; it does so in a category-selective fashion, thus respecting a general organizing principle of representation in high-level vision. Visual areas encoding visual relational information can reveal the processing of emergent properties of social (and nonsocial) interaction, which trigger inferential processes.
Significance Statement
Understanding social interaction requires representing the actors as well as the relation between them. We show that the earliest, rudimentary representation of a social interaction is formed in the visual cortex. Using fMRI on healthy adults, we measured the brain responses to two faces or two (head-blurred) bodies and found that, beyond representing faces and bodies, the visual cortex represents their relations, distinguishing between seemingly interacting (face-to-face) and noninteracting (back-to-back) faces/bodies. Moreover, we found that information about face and body relations is represented in separate networks, in line with the general organizing principle of categorical representation in the visual cortex. The brain network encoding visual relational information may represent emergent properties of interacting people, which underlie the cognitive representation of social interaction.
Introduction
The core meaning of a social interaction is not so much in the individual entities involved in the interaction, as in the relationship holding between them. Much is known about how the representation of social agents comes about, starting with the visual representation of faces and bodies. Much less is known about the representation of higher-level relations such as face-to-face social interaction, which bind people together in scenes/events, but are not explained by the information about individuals alone, or exhausted by low-level relational properties of stimuli (proximity, similarity, contours, symmetry, etc.).
The latest research suggests that the representation of social interaction is grounded in visual perception. A neural signature of social interaction perception is an increased response in visual areas, to two people face-to-face, relative to the same people presented as independent (Walbrin and Koldewyn, 2019; Abassi and Papeo, 2020; Bellot et al., 2021). This univariate facing > non-facing effect is intriguing as it denotes not only discrimination between two stimuli but also additional integrative processing emerging from the representation of seemingly related (i.e., facing) individuals (Adibpour et al., 2021; Goupil et al., 2023).
In those previous studies, the body was treated as a “monolithic” structure with all the parts oriented in the same direction, providing consistent information about an individual's relation with another (facing or not). In a body, however, different effectors can indicate an individual's direction of social attention and interaction (Emery, 2000; Nummenmaa and Calder, 2009). Depending on the context, individuals can rely on gaze, face, or rest-of-the-body orientation, independently.
Here, we considered two of such effectors, face/head (hereafter, face) and rest-of-the-body (hereafter, body), and addressed whether spatial relations between faces and relations between bodies are encoded in different aspects of the visual cortex. A categorical organization of relational information would match a general organizing principle in the visual cortex, which emerges from its anatomical and functional properties, as extensively studied in object representation (Grill-Spector and Weiner, 2014; Op de Beeck et al., 2019). Such organization could also be behaviorally relevant, as different effectors of interindividual relationships can provide different, even conflicting, information: one can look at another while walking in the opposite direction!
Using functional MRI (fMRI), we measured brain activity in healthy human adults, in response to images of two human faces or two (head-blurred) human bodies, facing toward or away from each other. Building on previous results, we used the increased response to facing versus non-facing dyads to identify areas for spatial relation perception across the whole brain and in a set of functionally defined visual areas, specialized in face or body perception (Kanwisher et al., 1997; Downing et al., 2001; Puce et al., 1996; Schwarzlose et al., 2005) and previously implicated in encoding relations between people (Walbrin and Koldewyn, 2019; Abassi and Papeo, 2020, 2022). In a subgroup of subjects, category selectivity was further investigated by measuring brain responses to pairs of nonsocial objects with anteroposterior morphology (i.e., machines), presented face-to-face or back-to-back. Since we found evidence for a category-selective organization in the visual cortex, with segregated representation of relations between faces and between bodies, we performed two additional analyses. First, we tested to which extent relational information for a category (e.g., facing vs non-facing faces) overlapped with the visual representation of the corresponding category (e.g., faces vs other objects). Second, we tested whether, besides category-selective areas, there were also visual areas that encoded relations (facing/non-facing) in a stimulus-independent manner, generalizing across dyads of bodies and faces, as well as objects.
In sum, this work examines a division of the visual cortex sensitive to socially relevant spatial relations between people, its internal organization, and its relationship with the face- and body-perception visual networks. In doing so, it lays the groundwork for investigating a new function of vision, and a new visual brain network, which can reveal the earliest perception-based stages toward understanding of social interaction.
Materials and Methods
Participants
A total of 42 subjects took part in the main study (24 females; mean age 24.9 years ± 3.3 SD). Among them, a subgroup of 20 was tested with body and face dyads only, and a subgroup of 22 was tested with body and face dyads and, in addition, pairs of nonsocial objects. All subjects had normal or corrected-to-normal vision and reported no history of psychiatric or neurological disorders or use of psychoactive medications. They were screened for contraindications to fMRI and gave written informed consent before participation. All procedures were approved by the local ethics committee (CPP Sud Est V, CHU de Grenoble).
Stimuli
Using Daz3D (Daz Productions) and the image processing toolbox of MATLAB (MathWorks), we created and edited 16 grayscale renderings of human bodies in profile view, with blurred heads and in various biomechanically possible poses, and 16 grayscale renderings of human faces in profile view and in various poses and expressions. Additionally, sixteen grayscale renderings of nonsocial artificial objects (eight chairs), four automated teller machines, and four game machines, i.e., slot machines or arcade game machines) were imported from the TurboSquid database (www.turbosquid.com) and edited with the image processing toolbox of MATLAB. All stimuli were matched for luminance and contrast using the SHINE toolbox (Willenbockel et al., 2010). From the sixteen single bodies, eight facing body dyads were created, with each single body appearing only in one pair. Non-facing body dyads were created by swapping the position of the two bodies in each facing dyad. Likewise, eight facing face dyads were created from the sixteen single faces, with every single face appearing only in one pair. Non-facing face dyads were created by swapping the position of the two faces in each facing dyad. Finally, four pairs of machines and four pairs of chairs positioned face-to-face were created from the eight machines and eight chairs, with each single object appearing only in one pair, yielding a total of eight pairs of facing objects. Non-facing object dyads were created by swapping the position of the two objects in each facing dyad. A mirror version of each dyad was obtained by flipping horizontally each dyad, which yielded a total of 32 body dyads, 32 face dyads, and 32 nonsocial object dyads (for each category, 16 facing and 16 non-facing). The distance between the two terms in a dyad (i.e., between the two closest points) was identical (47 pixels) for all three categories of stimuli (bodies, faces, and nonsocial objects) and for facing and non-facing stimuli.
Procedures
fMRI data acquisition
Imaging was conducted on a MAGNETOM Prisma 3 T scanner (Siemens Healthcare) run by CERMEP at Primage. T2*-weighted functional volumes were acquired using a gradient-echo echo-planar imaging sequence (GRE-EPI; TR/TE = 2,000/30 ms, flip angle = 80°, acquisition matrix = 96 × 92, FOV = 210 × 201, 56 transverse slices, slice thickness = 2.2 mm, no gap, multiband acceleration factor = 2, and phase encoding set to anterior/posterior direction). Acquisition of high-resolution T1-weighted anatomical images was performed in the middle of the main experiment and lasted 8 min (MPRAGE; TR/TE/TI = 3,000/3.7/1,100 ms, flip angle = 8°, acquisition matrix = 320 × 280, FOV = 256 × 224 mm, slice thickness = 0.8 mm, 224 sagittal slices, GRAPPA acceleration factor = 2). The acquisition of two field maps was performed at the beginning of the fMRI session.
Main experiment
The experiment consisted of two parts: the main fMRI experiment and a functional localizer task, which we describe below.
Runs
In the main experiment, 20 of the 42 subjects saw 3 runs with (facing and non-facing) body dyads and 3 runs with (facing and non-facing) face dyads. Each run lasted 5.23 min and consisted of 2 sequences of 16 blocks (8 facing and 8 non-facing), for a total of 32 blocks of 6 s each and a total duration of the experiment of 31.38 min. Runs with body dyads and runs with face dyads were presented in pseudorandom order to avoid more than two consecutive runs of the same stimulus group. Each run began with a warm-up block (10 s) and ended with a cooldown block (16 s), during which a central fixation cross was presented. The blocks in the first sequence were presented in random order, and the blocks in the second sequence were presented in counterbalanced (i.e., reversed) order relative to the first sequence. Thus, the blocks that were presented at the end of the first sequence were shown at the beginning of the second sequence, and vice versa. In each of the three runs, there were two blocks for each dyad, for a total of six blocks for each dyad across the whole experiment. For the remaining 22 subjects, everything was identical to the above, except that 3 runs of nonsocial object pairs were presented in addition to 3 runs of body dyads and 3 runs of face dyads. In order to limit the duration of the scanning session while increasing the number of stimuli, the duration of a run was reduced to 4.17 min by shortening the block duration (see below), and the experiment lasted 37.53 min.
Blocks
For the 20 subjects who saw body and face dyads only, each block featured 5 repetitions of the same dyad, randomly alternating between 1 view and its flipped version. Within a run, the onset time of each block was jittered (range of inter-block interval duration, 2–6 s; total inter-block time for each run, 96 s) to remove the overlap from the estimate of the hemodynamic response (Dale, 1999). Jittering was optimized using the optseq tool of FreeSurfer (Fischl, 2012). During each block, a black cross was always present in the center of the screen, while stimuli appeared for 500 ms, separated by an interval of 875 ms. For the remaining 22 subjects who also saw nonsocial object pairs, in order to limit the duration of the scanning session while increasing the number of stimuli, the duration of each block was reduced to 4 s, by modifying its structure (4 repetitions of a stimulus, each shown for 520 ms, separated by an interval of 640 ms).
Task
For both groups, in a subset (37.5%) of stimulus and fixation blocks, the cross changed color (from black to red). Subjects were instructed to fixate the cross throughout the experiment and detect and report the color change by pressing a button with their right index finger. This task was used to minimize eye movements away from the center and maintain vigilance in the scanner. During fMRI acquisition, stimuli were back-projected onto a screen by a liquid crystal projector (frame rate, 60 Hz; screen resolution, 1,024 × 768 pixels; screen size, 40 × 30 cm). For all the stimuli, the center of the image overlapped with the center of the screen. Subjects, lying down inside the scanner, viewed the stimuli binocularly (∼7° of visual angle) through a mirror above their head. Stimulus presentation, response collection, and synchronization with the scanner were controlled with the Psychtoolbox (Brainard, 1997) through MATLAB (MathWorks).
Functional localizer task
All subjects completed a functional localizer task prior to the main experiment, with stimuli and tasks adapted from the fLoc package (Stigliani et al., 2015). During this task, subjects saw 180 grayscale photographs of the following five object categories: (1) body stimuli (headless bodies in various views and poses, and body parts); (2) faces (adults and children); (3) places (houses and corridors); (4) inanimate objects (various exemplars of cars and musical instruments); and (5) scrambled objects. Stimuli were presented over two runs (5.27 min each). Each run began with a warm-up block (12 s) and ended with a cooldown block (16 s) and included 72 blocks of 4 s each: 12 blocks for each object class with 8 images per block (500 ms per image without interruption), randomly interleaved with 12 baseline blocks featuring an empty screen. To minimize low-level differences across categories, the view, size, and retinal position of the images varied across trials, and each item was overlaid on a 10.5° phase-scrambled background generated from another image of the set, randomly selected. During blocks containing images, some images were repeated twice, interleaved by a different image (two-back task). Subjects had to press a button when they detected the repetition.
Preprocessing of fMRI data
Functional images were preprocessed and analyzed using MATLAB (MathWorks), in combination with SPM 12 (Friston et al., 2007) and the CoSMoMVPA toolbox (Oosterhof et al., 2016). The first four volumes of each run were discarded, taking into account initial scanner gradient stabilization (Soares et al., 2016). Preprocessing of the remaining volumes involved despiking (SPMUP; https://github.com/CPernet/spmup/), slice time correction, geometric distortions correction using field maps, spatial realignment, and motion correction using the first volume of each run as reference. Anatomical volumes were co-registered to the mean functional image; segmented into gray matter, white matter, and cerebrospinal fluid in native space; and aligned to the probability maps in the Montreal Neurological Institute (MNI) space. The DARTEL method (Ashburner, 2007) was used to create a flow field for each subject and an intersubject template, which was registered in the MNI space and used for the normalization of functional images. The final steps included spatial smoothing with a Gaussian kernel of 6 mm FWHM for univariate analyses and 2 mm FWHM for multivariate analyses and removing low-frequency drifts with a temporal high-pass filter (cutoff 128 s).
Analyses
Whole-brain univariate analysis
Considering the runs with face and body dyads in the main experiment (N = 42), the blood oxygen level-dependent (BOLD) signal of each voxel in each subject was estimated in two random-effect general linear model (RFX GLM) analyses, each including two regressors for the experimental conditions (facing and non-facing body dyads or facing and non-facing face dyads), one regressor for fixation blocks, and six regressors for movement correction parameters as nuisance covariates. Separately for bodies and faces, we ran the RFX GLM contrast facing > non-facing and non-facing > facing. Data of the two groups (group 1 with face and body stimuli and group 2 with face, body, and nonsocial stimuli) were normalized separately using a min–max normalization (Ahammed et al., 2021) to control for changes in signal intensity due to the different designs and then analyzed together. With the same methods, we analyzed the runs involving nonsocial object pairs, presented to a subgroup of 22 participants. For all whole-brain analyses, the statistical significance of second-level effects was determined using a voxelwise threshold of p ≤ 0.001 and correction for multiple comparisons with false discovery rate (FDR) at the cluster level, taking into account all preprocessing steps, including smoothing, as implemented in SPM 12 (Friston et al., 2007).
Definition of ROIs
We used data from the functional localizer task to define, for each subject, the following visual areas: extrastriate body area (EBA), fusiform body area (FBA), fusiform face area (FFA), occipital face area (OFA), object-selective lateral occipital cortex (obj-LOC), and place-specific parahippocampal place area (PPA). The first four ROIs are selective to body (EBA and FBA; Downing et al., 2001; Taylor et al., 2007) or face perception (FFA and OFA; Kanwisher et al., 1997; Pitcher et al., 2011a,b) and have been also implicated in processing multiple-person scenarios (Abassi and Papeo, 2020, 2022), thus appearing to be plausible candidates in processing relations between faces/bodies. The other two ROIs (obj-LOC and PPA) were included as control areas, as they are involved in generally processing objects but with no selectivity for faces and bodies. To define those ROIs, individual data were entered into a general linear model with five regressors for the five object-class conditions (bodies, faces, places, objects, and scrambled objects), one regressor for baseline blocks, and six regressors for movement correction parameters as nuisance covariates. Four bilateral masks of the middle occipitotemporal cortex (MOTC), the inferior occipital cortex (IOC), the occipitotemporal fusiform cortex (OTFC), and the inferior parahippocampal cortex (PHC) were created using FSLeyes (McCarthy, 2018) and the Harvard–Oxford Atlas (Desikan et al., 2006) through FSL (Jenkinson et al., 2012). For each subject, within each mask, we selected the voxels with significant activity (threshold, p = 0.05) for the contrasts of interest: EBA in the MOTC with the contrast bodies > (objects + faces + places), FBA in the OTFC with the contrast bodies > (objects + faces + places), OFA in the IOC with the contrast faces > (objects + bodies + places), FFA in the OTFC with the contrast faces > (objects + bodies + places), obj-LOC in the MOTC with contrast objects > (bodies + faces + places), and PPA in the PHC with the contrast places > (objects + faces + bodies). For each ROI, all the voxels within the bilateral mask that passed the threshold were ranked by activation level (t values). The final ROI included up to 100 best voxels across the right and left ROI.
ROI analyses
From the six ROIs of each subject in the main experiment, we extracted the mean neural activity values (mean β-weights minus baseline) for facing and non-facing bodies and facing and non-facing faces and analyzed them in a 2 category (body, face) × 2 configuration (facing, non-facing) × 6 ROI (EBA, FBA, OFA, FFA, obj-LOC, and PPA) repeated-measure ANOVA. We then used t tests (two-tail) for pairwise comparisons addressing the difference between the target effect of relation (facing vs non-facing) for bodies versus faces, in each ROI. In a secondary analysis controlling for the selectivity of the effects of faces and bodies, we extracted from the above ROIs the β values (mean β-weights minus baseline) associated with facing and non-facing nonsocial objects, and used pairwise t tests to measure the effect of nonsocial objects configuration in each ROI. Cohen's d was used as a measure of effect size with exact confidence intervals calculated with alpha = 0.05%.
Multivariate searchlight cross-decoding
We used searchlight MVPA analysis (Kriegeskorte et al., 2006, 2008) to identify, across the whole brain, areas that encoded relational (facing vs non-facing) information, generalizing across body and face dyads. Separately for each subject, in each sphere of a three-voxel radius centered in each voxel across the brain, we trained a support vector machine classifier (LIBSVM; Chang and Lin, 2011) to discriminate between the β-patterns responding to facing versus non-facing body dyads (48 patterns by condition) and tested the classification accuracy using the β-patterns for facing versus non-facing face dyads (48 patterns by condition) and vice versa (train on faces, test on bodies). For each subject, we obtained one map, averaging the two accuracy maps (one for training on bodies and test on faces, the other for training on faces and test on bodies). Individuals’ accuracy maps were tested, at the group level, with a one-sample t test against chance (50%). The statistical significance of the group-level map was determined using a voxelwise threshold of p ≤ 0.001 corrected for multiple comparisons using FDR at the cluster level.
Spatial relations between brain networks
We addressed the anatomical relation between the effect of configuration (facing > non-facing) and the activity for face and body perception. In particular, we identified the peak of each effect to test whether the processing of social objects (faces and bodies) and their relations could recruit dissociable neuronal populations. We tested separately the relation between the representation of bodies and their relations and the representation of faces and their relations.
We asked whether for each subject, the voxels with the strongest facing > non-facing effect overlapped with the voxels with the highest selectivity for single bodies or faces. For each subject, from each anatomical region (MOTC, IOC, and OTFC) with significant clusters for the contrast facing > non-facing body dyads, facing > non-facing face dyads, bodies > (objects + faces + places), or faces > (objects + bodies + places), we extracted the best 100 voxels (based on t values) using group-constrained, subject-specific localization (Saxe et al., 2006; Fedorenko et al., 2010). More precisely, for effects related to bodies, we created two bilateral masks based on group-level data, one corresponding to the most significant cluster for the contrast facing > non-facing body dyads and one corresponding to the most significant cluster for bodies > (objects + faces + places) in the MOTC; then, for each subject, from each mask, we extracted the best 100 voxels for each contrast, based on t values. For effects related to faces, we created bilateral masks based on group-level data, one corresponding to the most significant cluster for the contrast facing > non-facing face dyads and one corresponding to the most significant cluster for the contrast faces > (objects + bodies + places) in the IOC and OTFC; then, for each subject, from each mask, we extracted the best 100 voxels for each contrast, based on t values. Finally, for each subject, within each mask, we also computed the number of voxels overlapping for the two contrasts (processing of bodies and their relations and processing of faces and their relations).
Since the analysis of individual subjects is more informative in characterizing functionally distinct regions of the cortex (Saxe et al., 2006; Fedorenko et al., 2010), we considered group-level effects for illustration purposes. On group-level maps, we plotted the voxels with the highest selectivity for the facing > non-facing body dyads, for facing > non-facing face dyads, for single bodies, and for single faces. The peak for facing > non-facing body dyads was defined considering the 100 most significant voxels in the group-level maps (based on t values), within the MOTC (data from the main experiment); the peak for the body-selective response was defined considering the 100 most significant voxels in the group-level map (based on t values) within the MOTC for the contrast bodies > (objects + faces + places) (data from function localizer task). We finally computed the number of overlapping voxels between the two effects. For the relation between processing faces and processing relations between faces, we used the same method as above, but in two different anatomical regions of the visual cortex, the IOC and the OTFC, and using the contrast facing > non-facing face dyads (data from main experiment) and faces > (objects + bodies + places) (data from the functional localizer task). Finally, the above analysis was repeated to investigate the spatial relationship between voxels with the highest selectivity for the effect facing > non-facing for nonsocial object dyads, and voxels with the highest selectivity for single nonsocial objects, within the MOTC, using the contrast facing > non-facing object dyads (data from main experiment) and objects > (scrambled objects) (data from the functional localizer task).
Results
Higher-level relations are represented in a category-selective fashion in the visual cortex
Building on previous research (Isik et al., 2017; Walbrin and Koldewyn, 2019; Abassi and Papeo, 2020, 2022; Bellot et al., 2021), we used the increased response to facing versus non-facing dyads to identify areas sensitive to socially relevant relational information. Going beyond previous research, we tested whether relations between two different effectors of interindividual relationships (faces and bodies) were processed in separate areas of the visual cortex.
For bodies (Fig. 1a; Table 1), the contrast facing > non-facing dyads showed effects in a bilateral cluster peaking in the anterior MOTC and overlapping with the (group-level) body-selective EBA and in a left cluster in the posterior MOTC (Fig. 1c). The contrast non-facing > facing yielded a cluster in the left cuneus. For faces (Fig. 1b; Table 1), the contrast facing > non-facing dyads showed an effect in a bilateral cluster peaking in the fusiform gyrus and overlapping with the (group-level) face-selective FFA, as well as in a cluster peaking in the posterior MOTC, extending into the IOC and overlapping with the (group-level) face-selective OFA (Fig. 1c). The contrast non-facing > facing faces revealed two clusters peaking in the right lingual gyrus and in the left cuneus.
Whole-brain univariate analyses for body and face dyads. a, Left and right group random-effect maps (N = 42) for the contrast facing > non-facing body dyads. b, Left and right group random-effect maps (N = 42) for the contrast facing > non-facing face dyads. c, Overlap (purple) of the clusters for the contrast facing > non-facing body (red) and face (blue) dyads. Statistical maps are corrected for multiple comparisons using FDR at the cluster level. For illustration purposes, ROIs correspond to the group-level random-effect contrasts (N = 42) of bodies > (objects + faces + places) for the EBA and FBA, faces > (objects + faces + places) for the OFA and FFA, objects > scrambled objects for the obj-LOC, and places > (bodies + faces + objects) for the PPA. However, in the actual analyses, we used anatomically constrained activations in individual subjects, rather than group-level activations.
Activations for the whole-brain contrasts. Location and significance of clusters showing stronger responses to facing dyads relative to non-facing dyads and non-facing dyads relative to facing dyads, separately for bodies, faces, and nonsocial objects (machines/chairs). Peak coordinates are in MNI space
Body- and face-perception ROIs represent relational information in a category-selective fashion
Category-selective organization of relational information in the visual cortex was confirmed by the ROI analyses. The individual β values extracted for each condition, from each ROI, were entered in a 2 category (body, face) × 2 configuration (facing, non-facing) × 6 ROI (EBA, FBA, OFA, FFA, obj-LOC, PPA) repeated-measure ANOVA. The results showed the main effects of category, F(1, 41) = 23.60, p < 0.001, ηp2 = 0.37; configuration, F(1, 41) = 20.70, p < 0.001, ηp2 = 0.34; and ROI, F(5, 205) = 3.45, p = 0.005, ηp2 = 0.08, along with significant interactions between category and configuration, F(1, 41) = 6.25, p = 0.017, ηp2 = 0.13; category and ROI, F(5, 205) = 65.48, p < 0.001, ηp2 = 0.61; and configuration and ROI, F(5, 205) = 11.67, p < 0.001, ηp2 = 0.22. All effects were qualified by a significant three-way interaction, F(5, 205) = 16.55, p < 0.001, ηp2 = 0.29.
We explained this interaction through pairwise comparisons showing higher activity for facing than non-facing bodies in the EBA and FBA but not in other ROIs [EBA, t(41) = 4.22, p < 0.001, d = 0.65, CI = (0.02;0.07); FBA, t(41) = 2.20, p = 0.033, d = 0.34, CI = (0.01;0.03); OFA, t(41) = 0.63, p > 0.250, d = 0.10, CI = (−0.02;0.03); FFA, t(41) = 0.62, p > 0.250, d = 0.10, CI = (−0.02;0.03); obj-LOC, t(41) = 0.59, p > 0.250, d = 0.09, CI = (−0.02;0.03); PPA, t(41) = 0.38, p > 0.250, d = 0.06, CI = (0.03;0.07)] and higher activity for facing than non-facing faces in the EBA, FBA, OFA, FFA, and obj-LOC, but not in the PPA [EBA, t(41) = 3.04, p = 0.004, d = 0.47, CI = (0.01;0.03); FBA, t(41) = 4.40, p < 0.001, d = 0.68, CI = (0.02;0.07); OFA, t(41) = 9.46, p < 0.001, d = 1.46, CI = (0.06;0.10); FFA, t(41) = 7.38, p < 0.001, d = 1.14, CI = (0.07;0.12); obj-LOC, t(41) = 4.93, p < 0.001, d = 0.76, CI = (0.03;0.07); PPA, t(41) = 1.04, p > 0.250, d = 0.16, CI = (−0.04;0.01)]. Crucially, we found that the facing > non-facing effect (Fig. 2) was higher for bodies than for faces in the EBA, t(41) = 2.29, p = 0.027, d = 0.35, CI = (0.00;0.05), and for faces than for bodies in the OFA, t(41) = 5.52, p < 0.001, d = 0.85, CI = (0.05;0.10), and FFA, t(41) = 5.71, p < 0.001, d = 0.88, CI = (0.06;0.12), as well as in the obj-LOC, t(41) = 3.19, p = 0.003, d = 0.49, CI = (0.06;0.12). The difference was not significant in the FBA, t(41) = 1.37, p = 0.178, d = 0.21, CI = (−0.04;0.01), and PPA, t(41) = 0.84, p > 0.250, d = 0.13, CI = (−0.03;0.07).
Results of ROI analyses. The facing > non-facing effect for bodies in dark gray (N = 42) and faces in light gray (N = 42) in each ROI. Boxes depict the median, lower quartile, and upper quartile while the whiskers depict the nonoutlier minimum and maximum. The circles represent outliers, and the dots represent all subject datapoints. *p < 0.05; **p < 0.01; ***p < 0.001.
Contribution of low-level visual properties to the facing > non-facing effect
In the following analysis, we sought to shed light on the features that contribute to the facing/non-facing distinction in high-level visual areas. In particular, we quantified to what extent the facing/non-facing distinction relied on low-level differences between the stimuli. To this end, we used a Gabor-jet model (Lades et al., 1993) to measure the V1-like visual properties of each stimulus image (eight facing and eight non-facing dyads of bodies and faces), represented as a vector (Yue et al., 2012; Haghighat et al., 2015; Foster et al., 2022). Then, separately for each category (bodies and faces), we used Pearson correlations (1 Pearson's r) between Gabor-based vectors to create representational dissimilarity matrices (RDMs) reflecting low-level differences (i.e., dissimilarities) between stimuli. Upon visual inspection, those Gabor-based RDMs revealed a clear distinction between facing and non-facing pairs for both bodies (Fig. 3a) and faces (Fig. 3b), suggesting that low-level image information aids facing/non-facing discrimination. To quantify this effect, we tested whether the facing/non-facing distinction in the EBA, OFA, FBA, and FFA remained after removing low-level visual information. Separately for bodies and faces, we created a synthetic RDM where cells had a value of 0 (for two stimuli of the same type, e.g., facing) or 1 (for two stimuli of a different type). For each subject, for each ROI, we extracted the pattern of activity for each stimulus (eight facing and eight non-facing dyads) and created an RDM using Pearson correlations (1 Pearson's r), to measure stimulus dissimilarities in terms of neural patterns. Finally, we computed the correlation (Pearson correlation) between each activity-based RDM (from EBA, OFA, FBA, and FFA) and the above synthetic RDM, regressing out similarity based on low-level information (i.e., the Gabor model RDM was used as regressor). For each ROI, for each subject, we obtained a correlation coefficient r. Normalized (Fisher-transformed) correlation coefficients, for each ROI separately, were tested against chance with a one-tail t test. The results showed no significant correlation between the activity-based and synthetic RDM after removing low-level visual information, in all the tested ROIs except the EBA [EBA, t(41) = 4.00, p < 0.001, d = 0.62, CI = (0.03;0.10); FBA, t(41) = 0.15, p > 0.250, d = 0.02, CI = (−0.03;0.03); OFA, t(41) = 1.24, p = 0.111, d = 0.19, CI = (−0.01;0.04); FFA, t(41) = 0.03, p > 0.250, d = 0.00, CI = (−0.03;0.03)]. This means that in higher-level ROIs, the facing/non-facing distinction was built—almost exclusively—on the visual properties of the stimuli, in line with the view that information in the physical structure of the stimuli is already relevant to the representation of social interaction (McMahon et al., 2023). The fact that the EBA encodes the facing/non-facing distinction beyond the low-level visual properties of the stimuli suggests functional differences between the ROIs, with EBA serving a distinctive role in the early transformative processes of related bodies (Walbrin and Koldewyn, 2019; Gandolfo et al., 2023; Abassi and Papeo, 2022). In sum, this analysis showed that the facing/non-facing distinction in higher-level visual areas primarily leverages low-level visual information. What remains to be understood is the precise computation that, based on such distinction, yields a stronger response to facing (vs non-facing) dyads. This computation, in our framework, would encode emergent properties of related/interacting entities.
RDMs representing stimulus dissimilarities based on low-level visual features and neural activity patterns. a, RDMs for body dyads based on V1-like visual features measured with a Gabor-jet model (Gabor model) and based on neural activity patterns extracted from EBA and FBA; b, RDMs for face dyads based on V1-like visual features measured with a Gabor-jet model (Gabor model) and based on neural activity patterns extracted from OFA and FFA. For illustration purposes, each RDM has been separately rescaled between 0 (very similar) and 2 (very dissimilar) values. The labels fc1 to fc8 correspond to the eight facing pairs; the labels Nfc1 to Nfc8 correspond to the eight non-facing pairs.
An occipitotemporal network encodes spatial relations across face and body dyads
We used searchlight MVPA analyses (Kriegeskorte et al., 2006) to identify areas, across the whole brain, which encoded relational (facing vs non-facing) information, generalizing across body and face dyads. Searchlight MVPA cross-decoding across face and body dyads showed effects in the bilateral medial and lateral occipitotemporal cortex and in the right inferior occipital gyrus. The effect in the lateral occipitotemporal cortex fell in between the EBA, OFA, and obj-LOC and was partly segregated from areas showing the facing > non-facing effect in the MOTC (Fig. 4; Table 2).
Cross-decoding of spatial relations (facing vs non-facing) across dyads of bodies and faces. The results of the cross-decoding analysis using searchlight MVPA to identify areas that encode spatial relations across face and body dyads. Areas bounded by colored lines correspond to the ROIs defined with the group-level random-effect contrasts (N = 42), bodies > (objects + faces + places) for the EBA and FBA, faces > (objects + faces + places) for the OFA and FFA, objects > scrambled objects for the obj-LOC, and places > (bodies + faces + objects) for the PPA. Statistical maps are corrected for multiple comparisons using FDR at the cluster level.
Searchlight MVPA analyses. Location and significance of clusters showing significant cross-decoding of spatial relations (facing vs non-facing) across dyads of bodies and faces and across dyads of persons and nonsocial objects. Peak coordinates are in MNI space
Distinct neuronal populations represent bodies, faces, and their relations
Above we showed that the encoding of socially relevant relations between bodies was computed in a network that involved key body-perception structures (notably in the EBA). Next, we addressed the relationship between the two processes, by asking whether, in each subject, the 100 voxels that showed the highest selectivity to bodies were also those that showed the strongest facing > non-facing effect. The analyses, constrained to the bilateral MOTC, revealed largely distinct peaks for the two processes, with no overlap in the right hemisphere and a modest overlap (14.74 voxels ± 11.73 SD) in the left hemisphere (Fig. 5). Since the analysis of individual subjects is more informative to characterize functionally distinct regions of the cortex (Saxe et al., 2006; Fedorenko et al., 2010), we considered group-level effects for illustration purposes (Fig. 6a). Consistent with the individual-level analysis, the effect at group-level showed two largely distinct peaks for the two processes, with no overlap in the right hemisphere and a modest overlap of 23 voxels in the left hemisphere (Fig. 6a).
Subject-by-subject peaks of activity for body perception and the facing > non-facing effect for body dyads. For each of the 42 subjects, we show the best 100 voxels for the contrast facing > non-facing bodies (green) and bodies > (objects + faces + places) (red) constrained within the anatomical MOTC. Overlapping voxels are highlighted in orange.
Peaks of (group-level) effects of body/face perception and relation perception. a, Peaks (100 best voxels) for the group-level effects of the contrast facing > non-facing bodies and bodies > (objects + faces + places). b, Peaks (100 best voxels) for the group-level effects of the contrast facing > non-facing faces and faces > (objects + bodiess + places).
The same subject-by-subject analysis, considering the face-selective activity and the facing > non-facing effect for face dyads in the IOC (Fig. 7), showed largely segregated peaks with minimal overlap in the left hemisphere (1.64 voxels ± 1.79) and modest overlap in the right hemisphere (13.81 voxels ± 12.53). Likewise, the two effects showed only modest overlap in the fusiform gyrus (left, 18.21 ± 14.41; right, 17.52 ± 14.61). Consistent with the effects at the individual level, group-level peaks were largely segregated (Fig. 6b) with an overlap of 4 voxels in the left IOC, 16 voxels in the right IOC, 34 voxels in the left fusiform gyrus, and 39 voxels in the right fusiform gyrus.
Subject-by-subject peaks of activity for face perception and the facing > non-facing effect for face dyads. For each of the 42 subjects, we show the best 100 voxels for the contrast facing > non-facing faces (blue) and faces > (objects + bodies + places) (yellow) constrained within the anatomical IOC and fusiform gyrus. Overlapping voxels are highlighted in green.
The visual processing of nonsocial object pairs
A subgroup of subjects (N = 22) saw pairs of nonsocial objects with clear anteroposterior morphology, facing toward or away from each other, in addition to face and body dyads. We used these data to further investigate category-selective effects in relation to perception; particularly, to study whether the effects described above for faces and bodies could generalize to other visual object categories. The whole-brain contrast facing > non-facing objects yielded a cluster in the left posterior MOTC, overlapping with the obj-LOC, but distinct from areas encoding relations between bodies, and partly segregated from areas encoding relations between faces (Fig. 8a; Table 1). The contrast non-facing > facing objects showed no effects. Furthermore, we found that, like for faces and bodies, the peak (100 best voxels at the group level) of the facing > non-facing effect for object dyads in the left posterior MOTC was spatially segregated from the (group-level) peak of object-selective activity found with the contrast intact > scrambled objects in functional localizer task (Fig. 8c). Note that, for objects, we only performed this analysis at the group-level, as, in our approach, subject-specific peaks were defined within group-constrained activations, which did not overlap at all between the effect of facing > non-facing objects (main experiment) and the effect of intact > scrambled objects (functional localizer task).
Results for nonsocial object pairs. a, Whole-brain univariate analyses for nonsocial object dyads. Left and right group random-effect map (N = 22) for the contrast facing > non-facing nonsocial object dyads (machines and chairs). The results of the group-level (N = 42) random-effect contrasts of facing > non-facing body dyads (in pink), facing > non-facing face dyads (in blue), and objects > scrambled objects for the obj-LOC (in black) are also highlighted. b, Results of the ROI analyses. The facing > non-facing effect (N = 22) in each functionally localized ROI. The boxes depict the median, lower quartile, and upper quartile while the whiskers depict the nonoutlier minimum and maximum. The circles represent outliers, and the dots represent all subject datapoints. c, Peaks of (group-level) effects of object perception and relation perception. Peaks (100 best voxels) for the group-level effects of the contrast facing > non-facing object dyads and objects > scrambled objects. d, Cross-decoding of spatial relations (facing vs non-facing) across dyads of people (bodies and faces) and objects. The results of the cross-decoding analysis using searchlight MVPA to identify areas that encode spatial relations across people (bodies and faces) and objects. For comparison, areas represented in red-to-yellow are the same as shown in Figure 4 (cross-decoding of face and body dyads). All statistical maps are corrected for multiple comparisons using FDR at the cluster level.
The effect of relations between objects was further investigated in each of the category-selective ROIs based on the functional localizer task. Supporting a category-specific representation of relational information, there was no effect of object relation (facing > non-facing) in the body- or face-specific ROIs [EBA, t(21) = 1.21, p = 0.119, d = 0.26, CI = (−0.02;0.06); FBA, t(21) = 0.28, p > 0.250, d = −0.06, CI = (−0.03;0.02); OFA, t(21) = 0.50, p > 0.250, d = −0.11, CI = (−0.05;0.03); FFA, t(21) = 0.85, p = 0.204, d = −0.18, CI = (−0.04;0.02)] or in PPA [t(21) = 0.78, p = 0.222, d = 0.17, CI = (−0.02;0.04)] but a trend in the obj-LOC [t(21) = 1.41, p = 0.086, d = 0.30, CI = (−0.01;0.05); Fig. 8b].
In summary, while the focus of this study was on social relationships and the analysis of nonsocial objects lacked statistical power relative to the analysis for body and face dyads, the exploration of the effects associated with nonsocial objects provided converging evidence for category-selective effects of relational information. In particular, we found that, in none of the face- or body-processing areas, nonsocial object pairs evoked a significant facing > non-facing effect. Moreover, the whole-brain effect, as well as the trend for an effect of relation in the obj-LOC, suggested that the network of visual areas sensitive to relational information may extend beyond the areas identified with body and face dyads, including other meaningful, category-selective subdivisions (Roberts and Humphreys, 2010; Kim and Biederman, 2011; Baeck et al., 2013; Kaiser et al., 2014a, 2019; Kubilius et al., 2015).
Finally, since we found a network of areas that encoded relational information across faces and bodies, we carried out a second searchlight MVPA analysis to explore to what extent those areas also encoded relations between nonsocial objects. To do so, we used the data from the 22 subjects who saw faces, bodies, and nonsocial objects. For each subject, in each sphere of a three-voxel radius centered in each voxel across the brain, we trained a support vector machine classifier (LIBSVM) to discriminate between facing and non-facing people and tested the classifier on the facing/non-facing discrimination using the β-patterns associated with (facing and non-facing) object pairs, and vice versa (training on objects, test on people). As we had 96 patterns for the people condition (48 body dyad patterns and 48 face dyad patterns) and 48 patterns in the objects condition, we used half of the body patterns (24) and half of the face patterns (24) for the people condition in a first classification and the remaining halves in a second classification. For each classification, we averaged the two accuracy maps obtained with training on people and test on objects and training on objects and test on people. We then averaged the two maps from the two classification analyses. Individuals’ accuracy maps were tested at the group level with a one-sample t test against chance (50%). The statistical significance of the group-level map was determined using a voxelwise threshold of p ≤ 0.001 corrected for multiple comparisons using FDR at the cluster level. The result showed effects in the left medial occipitotemporal cortex and in the bilateral primary visual cortex (Fig. 8d; Table 2). As shown in Figure 7d, the effects of the cross-decoding of people and objects overlapped with the effects of the cross-decoding of bodies and faces in the more posterior (occipital) aspects. More anterior (occipitotemporal) effects found with the cross-decoding of bodies and faces did not appear in the cross-decoding of people and objects. These results suggest that neuronal populations in the occipitotemporal cortex are selectively recruited for processing relations between people or person orientation (Centelles et al., 2011; Walbrin et al., 2018; Hochmann and Papeo, 2021; Foster et al., 2022).
Discussion
Understanding how relational information is computed, where in the brain and in what format, is a challenge for cognitive and computational neuroscience (Green and Hummel, 2004; Hochmann, 2022; Malik and Isik, 2023) Here, we addressed the role of the visual cortex in encoding visual relational information en route to the representation of social interaction.
Building on previous work, we targeted the facing > non-facing effect, under the hypothesis that this effect captures the representation of emergent properties of seemingly related (i.e., facing) bodies (Bellot et al., 2021; Abassi and Papeo, 2022). The results showed that a set of visual areas responded more strongly to facing dyads than to non-facing dyads, in a way that respected a central organizing principle in high-level vision: object category. In particular, with whole-brain analysis, we found that the facing > non-facing effect for face and body dyads recruited segregated networks, mirroring the categorical object representation in the visual cortex. The effect for body dyads encompassed the bilateral MOTC overlapping with the EBA; the effect for face dyads encompassed the bilateral fusiform gyrus and middle occipital gyrus, overlapping with the FFA and OFA. Consistent with the results of whole-brain analyses, ROI analyses showed a double dissociation pattern, with a stronger facing > non-facing effect for body dyads than face dyads in the body-processing EBA and a stronger facing > non-facing effect for face dyads than for body dyads in the face-processing FFA and OFA. Supporting category selectivity, in none of the face- or body-processing areas, nonsocial object pairs evoked a significant facing > non-facing effect. Additional analyses considering Gabor-filtered image properties showed that the facing > non-facing effect in EBA, FBA, and OFA is built starting from a facing/non-facing discrimination based on low-level visual properties of the stimuli. Finally, we found that the peaks of facing > non-facing effects were spatially distinct from the activity peaks for face/body (and object) perception. This anatomical shift might explain why relational effects for a category encroached on the territory of another category (i.e., why the facing > non-facing face effect was found in face-selective but also in body- and object-selective ROIs). At the same time, this finding provides an important indication for future research, showing a possible limitation of targeting classic object-perception areas (EBA, FFA, obj-LOC, etc.) to study multiple-object perception.
Category-specific and category-general relational effects
The representation of bodies and faces is dissociated in many aspects in the visual cortex: distinct areas represent body versus face parts, configurations, or motions (Kanwisher et al., 1997; Downing et al., 2001; Pitcher et al., 2009, 2019; Atkinson et al., 2012). Category selectivity, which is especially marked in the case of faces and bodies, is the result of anatomical and functional features of neurons in high-level visual areas. Our results showed that, in exploiting the same substrate, the representation of relational information in the visual cortex preserves one key function of high-level vision: categorization (Grill-Spector and Weiner, 2014; compare Konkle and Alvarez, 2022).
Since faces and bodies are independent effectors of interpersonal relationships (Emery, 2000), segregated processing of face- and body-relational information can be behaviorally relevant. At the same time, an architecture that processes interpersonal relationships also needs to combine category-specific information into a whole-body representation. Again, such organization would mirror the organization of face- and body-related information in the visual cortex, where face- and body-selective areas coexist with areas that integrate category-selective signals into whole-person representation (Bernstein et al., 2014; Kaiser et al., 2014b; Hu et al., 2020; see also Taubert et al., 2021). We found that an occipitotemporal network showed the generalization of facing/non-facing dyadic information across body dyads and face dyads. This network, in its anterior part, overlaps with a network reported by Foster et al. (2022), showing shared neural code for face and body orientation (i.e., front, leftward, rightward). These results could come closer to the definition of visual areas that encode spatial relations between things with respect to the observer's viewpoint. An open question remains whether these effects reflect abstraction from category-specific signals or low-level visuospatial features/processing that are common to facing/non-facing things and are then read by other (higher-order) areas for abstraction and integration (Hochmann and Papeo, 2021).
Another open question remains: what kinds of object categories and relations can trigger the effects found for face and body dyads? We found the facing > non-facing effect for nonsocial objects, in a cluster adjacent to the obj-LOC, and a trend for the same effect in the functionally localized obj-LOC. Moreover, the occipitotemporal network that classified facing/non-facing relations across faces and bodies, in its posterior part, also did so with nonsocial object pairs. These observations lend credit to the hypothesis that there are other category-selective divisions of the visual cortex for processing object–object or human–object relations (Roberts and Humphreys, 2010; Kim and Biederman, 2011; Baeck et al., 2013; Kaiser et al., 2014a, 2019; Wurm and Caramazza, 2019), together with common mechanisms for generalization of relational structures across social and nonsocial interaction events (Karakose-Akbiyik et al., 2023).
Visual versus non-visual effects of spatial relations
By implying interaction, facing people could recruit more attention and/or deeper inferential processing. If so, the facing > non-facing effect in the visual cortex could reflect top-down signals enhancing the perceptual response to facing dyads, rather than inherently different visual representations of facing versus non-facing dyads. We consider a number of facts that favor the visual origin of the effect. First, we found effects not only for the contrast facing > non-facing bodies/faces but also for the contrasts non-facing > facing bodies/faces and facing > non-facing objects (Table 1), which already rule out some kind of general bias for facing people. Second, we designed tasks and stimuli to single out visuospatial processes while minimizing the recruitment of inferential processes and attention toward the stimuli. In particular, attention was balanced across conditions using an orthogonal task (color change detection on central fixation); accordingly, we found no univariate or multivariate effects in areas that could be specifically related to attentional processing (Posner and Dehaene, 1994; Thiebaut de Schotten et al., 2011). Moreover, while preserving prototypical features of social interaction (facingness and spatial proximity), our facing dyads did not give rise to any familiar, meaningful interactions. Accordingly, univariate and multivariate analyses revealed selective effects in the visual cortex, but no activity in higher-order temporoparietal areas, associated with the processing of dynamic representations of meaningful social interactions (Centelles et al., 2011; Isik et al., 2017; Walbrin et al., 2018; see also McMahon et al., 2023).
In summary, there is no indication in our results that effects in the visual cortex are driven by processes occurring upstream. Our results are rather in line with studies where paradigms tackling visual perception performance were used to demonstrate more efficient processing of facing (vs non-facing) dyads (Papeo et al., 2017; Papeo, 2020), associated with differences in the visual cortex activations (Abassi and Papeo, 2022).
This discussion is not intended to suggest that the processing of relations between people (or objects) begins and ends in the visual cortex. We rather propose that visual areas compute the earliest stages of social interaction processing, encoding cues of relationship (e.g., facingness), which are precursors to, but are not yet a representation of, social interaction (Hochmann and Papeo, 2021; McMahon et al., 2023). Spatial positioning (facing/non-facing) is one of such cues, but others (e.g., distance, synchrony, coordination) await study. Moreover, by using rich more meaningful and dynamic depictions of social interactions, it should be possible to trigger the whole social interaction processing network, revealing where information is channeled beyond the visual cortex. Candidate targets of relational information encoded in visual areas are temporoparietal areas, which appear to be preferentially driven by dynamic stimuli (Isik et al., 2017; Bellot et al., 2021; Pitcher and Ungerleider, 2021; Landsiedel et al., 2022). Another possible target to explore is the attention brain network. As discussed above, we found no evidence that the facing > non-facing effect arises outside of the visual cortex. However, there may be an important role for attention in a process that conceivably has a lot to do with individuals’ attention orientation, gaze following, and the like. fMRI combined with other methods will reveal feedforward and feedback dynamics within and between networks that contribute to the understanding of social interaction.
Conclusions
This study shows that, beyond face/body recognition, the visual cortex encodes information that is relevant to constructing the representation of meaningful relationships (e.g., a social interaction) between people. It does so in a category-selective fashion, thus respecting a general organizing principle of representation in high-level vision. A separate network of visual areas tends to generalize relational information (facing/non-facing) between body and face dyads, as well as between people and objects. Overall, our results open the possibility that segregated networks in the visual cortex represent objects and their visuospatial relations. Understanding the organization and functioning of these previously uncharted divisions of the visual cortex can unravel visual processes that, beyond single face/body recognition, compute emergent properties of complex, multi-person (or multi-object) scenarios, implementing the early stages of compositionality toward the representation of events.
Footnotes
This work was supported by a European Research Council Starting Grant to L.P. (THEMPO-758473).
The authors declare no competing financial interests.
- Correspondence should be addressed to Etienne Abassi at etienne.abassi{at}gmail.com or Liuba Papeo at liuba.papeo{at}isc.cnrs.fr.