Abstract
We appraise other people’s emotions by combining multiple sources of information, including somatic facial/body reactions and the surrounding context. Wealthy literature revealed how people take into account contextual information in the interpretation of facial expressions, but the mechanisms mediating such influence still need to be duly investigated. Across two experiments, we mapped the neural representations of distinct (but comparably unpleasant) negative states, pain, and disgust, as conveyed by naturalistic facial expressions or contextual sentences. Negative expressions led to shared activity in the fusiform gyrus and superior temporal sulcus. Instead, pain contexts recruited the supramarginal, postcentral, and insular cortex, whereas disgust contexts triggered the temporoparietal cortex and hippocampus/amygdala. When pairing the two sources of information together, we found a higher likelihood of classifying an expression according to the sentence preceding it. Furthermore, networks specifically involved in processing contexts were re-enacted whenever a face followed said context. Finally, the perigenual medial prefrontal cortex (mPFC) showed increased activity for consistent (vs inconsistent) face-context pairings, suggesting that it integrates state-specific information from the two sources. Overall, our study reveals the heterogeneous nature of face-context information integration, which operates both according to a state-general and state-specific principle, with the latter mediated by the perigenual medial prefrontal cortex.
Significance Statement
With the aid of a controlled database and a comprehensive paradigm, our study provides new insights into the brain and behavioral processes mediating contextual influences on face emotion-specific processing. Our results reveal that context operates both in face-independent and face-conditional fashion, by biasing the interpretation of any face toward the state implied by associated context and also triggering processes that monitor the consistency between the different sources of information. Overall, our study unveils key neural processes underlying the coding of state-specific information from both face and context and sheds new light on how they are integrated within the medial prefrontal cortex.
Introduction
People appraise others’ affect by integrating multiple pieces of information. In particular, facial expressions are not processed exclusively from the inspection of perceivable muscular displacements but also according to their consistency with the surrounding context (Righart and de Gelder, 2008; Aviezer et al., 2012; Wieser et al., 2012; Stewart et al., 2019). For instance, expressions like disgust, fear, and joy are classified more rapidly/accurately when preceded by a short text providing congruent information (Carroll and Russell, 1996; Stewart et al., 2019). Likewise, individuals underestimate the intensity of painful expressions if told that the displayed person has been successfully treated (Lamm et al., 2007), that the he/she is simulating the facial reaction (Zhao et al., 2021), or that the pain could not be explained by any medical condition (De Ruddere et al., 2016). Accordingly, faces of pain are likely to be judged as more intense if embedded in a consistent posture or background (Aviezer et al., 2012). Overall, these effects suggest that facial expressions have a degree of ambiguity, especially if evoked by states of comparable unpleasantness, like pain and disgust (Kunz et al., 2013; Dirupo et al., 2022). As such, context represents a critical source of disambiguation (Carroll and Russell, 1996; Stewart et al., 2019). This opens the question of which subprocess is influenced by contextual information and whether it involves neural mechanisms of facial expression decoding or high-order representations of affective states arising from multiple sources of information.
It is known that static faces are processed by ventral portions of the occipital and fusiform cortex (Haxby et al., 2000; Duchaine and Yovel, 2015), while the middle temporo-occipital cortex and the superior temporal sulcus seem to process dynamic information (Deen et al., 2015; Duchaine and Yovel, 2015; Schobert et al., 2018). Critically, the sight of facial expressions of pain and disgust implicates the anterior insular cortex and inferior frontal gyrus (Jauniaux et al., 2019; Gan et al., 2022). These regions might encode both domain-general and domain-specific information, with some components being specific for pain and disgust and others processing supraordinal dimensions such as unpleasantness (Corradi-Dell’Acqua et al., 2016).
Previous studies have investigated the degree to which the neural response to faces in these regions is influenced by contextual information. For instance, Vrticka et al. (2009, 2011) found that portions of the fusiform gyrus, amygdala, temporo-occipital cortex, and inferior frontal gyrus responded more strongly to facial expressions when associated with contextual cues of opposite valence, possibly underlying error-like signals about the inconsistency. Furthermore, the anterior insula exhibits altered connectivity with the supramarginal gyrus and olfactory midbrain, respectively, whenever painful and disgusted faces were associated with cues suggestive that expressions were simulated (Zhao et al., 2021, 2022). Of particular interest, however, is the perigenual portion of the medial prefrontal cortex (mPFC), as this region discloses activity patterns responding coherently to specific emotional states across different sources of information (face, postures, comic-like vignettes, etc.; Peelen et al., 2010; Skerry and Saxe, 2014). Importantly, however, the perigenual mPFC response might be influenced by supraordinal coding of valence, as previous studies employed only positive versus negative comparisons (Skerry and Saxe, 2014), or found the strongest pattern differentiation between positive and negative states (Peelen et al., 2010). It is therefore unclear whether perigenual mPFC integrates contextual and facial information according to a state-specific or valence coding.
In the present study, we used fMRI to investigate the behavioral and neural mechanisms underlying contextual influences on facial expression processing. To this aim, we ran two experiments where video clips of naturalistic facial expressions of pain and disgust (matched for unpleasantness) were associated with contextual sentences either consistent or inconsistent with face information. Hence, we tested contextual effects independently from supraordinal coding of unpleasantness. Based on the literature reviewed above, we hypothesized that the integration of contextual and facial state-specific information involves the mPFC in its perigenual section.
Materials and Methods
Population
Thirty-eight participants (20 males; mean age, 24.13 ± 7.53 SD) were recruited for Experiment 1, whereas 26 (10 males; mean age, 23.88 ± 3.91) were recruited for Experiment 2. All were native French speakers, declared no history of psychological/psychiatric illness, and were naive to the purpose of the study. Furthermore, they signed an informed consent prior to the experiment. This research was conducted in accordance with the Declaration of Helsinki and was approved by the local ethical committee.
Stimuli
We used a video database of naturalistic facial reactions of individuals exposed to comparably unpleasant painful or disgusting stimulations. This is composed of 81 video clips, organized into 27 triplets in each of which the same person reacts to a thermal painful temperature (fP), a disgusting olfactory stimulation (fD), or a thermal/olfactory stimulation eliciting a neutral reaction (fN). fP and fD were matched for unpleasantness from the point of view of both the video-recorded person and an independent sample of observers. Furthermore, they were sufficiently similar to be confused at times with one another but sufficiently different to be discriminated with ∼65% accuracy, thus minimizing potential ceiling/floor effects in the main tasks. We also used a database of 81 phrases describing contextual scenarios of individuals in situations eliciting pain (cP; e.g., “walking on a sharp nail”), disgust (cD; “walking on cat vomit”), or a neutral situation (cN; “walking on a soft carpet”). The sentences from these three categories were comparable in length and lexical frequency. Furthermore, pilot validation ensured that cP sentences elicited a larger association with pain than the other two categories, whereas cD sentences elicited a larger association with disgust. Finally, cP and cD sentences elicited similar unpleasantness ratings, both reliably larger than those associated with a neutral context.
Facial expressions database
We used a video database of naturalistic facial expressions of pain, disgust, and neutral state. Full details about how these videos were created and validated are available in previous research (Dirupo et al., 2020, 2022). In summary, 29 participants (10 males; average age, 25.00; SD, 3.46) were video recorded while undergoing olfactory and thermal stimuli, respectively, triggering disgust and pain at different levels of unpleasantness. These video recordings were used to create a pool of 123 short clips with no sound, organized in 41 triplets with the same person video-recorded while experiencing pain and disgust events as matched as possible for unpleasantness, and a third thermal/olfactory stimulation with unpleasantness rated as close as possible to the ideal point of 0 (corresponding to a neutral state). These videos were validated by an independent sample of 24 participants (seven males; average age, 23.54; SD, 4.12), who underwent a classification task in which they had to guess whether the portrayed person experienced pain, disgust, or a neutral state. For pain/disgust choices, participants were also asked to subsequently rate the associated unpleasantness (Dirupo et al., 2020). Based on the performance of this independent sample, we selected a portion of 81 videos (organized in 27 triplets) which were matched for unpleasantness from the point of view of both the video-recorded person and an independent sample of observers. Furthermore, spontaneous expressions of pain and disgust are sufficiently similar to be confused at times with one another and yet sufficiently different to be discriminated with ∼65% accuracy. This minimizes the emergence of ceiling/floor effects in the main experiment. See Table 1 and Figure 1 for full details on the video database.
Validation of experimental materials and experimental set-up. A, Sentence validations. Participants' ratings of unpleasantness (left subplot), pain (middle subplot), and disgust (right subplot) associated with the contextual sentences included in the study. All ratings are carried out on a visual analog scale, subsequently converted in a value ranging from 0 (not painful/disgusting/unpleasant at all) to 10 (extremely painful/disgusting/unpleasant). For each graph, the distribution of ratings per condition is described in terms of box plots, in which the star represents the distribution mean, the horizontal line refers to the median, the box edges refer to the interquartile range, and whiskers refer to the overall data range within 1.5 of the interquartile range. Individual data points are also displayed as color coded circles. The red data points refer to pain expressions/contexts, blue data points refer to disgust, and gray data points refer to the neutral condition. B, Face validations. The left subplot displays the self-reported unpleasantness of the people portrayed in the video clips. The middle subplot displays the unpleasantness attributed to the video clips by an independent sample of observers. The right subplot displays the observers' ability at classifying video expressions. Please note that, during the validation, only videos classified as painful/disgusting were subsequently rated in terms of unpleasantness. C, Trial structure for the face classification task (left side) and for the unpleasantness rating task (right side).
Results from preliminary analysis testing dichotomic effects of expressions (painful vs disgusted, pain vs neutral, disgusted vs neutral) as within-subject factor
Contextual sentences
We created short French-written sentences describing painful, neutral, and disgusting contexts in the infinitive form. As for the videos, also texts were organized in triplets, describing individuals embedded in situations eliciting pain (e.g., “walking on a sharp nail”), disgust (“walking on cat vomit”), or a neutral situation (“walking on a soft carpet”). As our aim was to describe plausible contexts of affect, without describing directly the affective states themselves, we ensured that none of the sentences reported explicitly words like “pain” and “disgust” (or synonyms). Hence, in our experiment, we explored the association between a facial reaction and the context perceived with it, rather than the relationship between the lexical and facial representation of the same affective state, as is the case of priming-like experiments (Weingarten et al., 2016). Furthermore, for disgust sentences, we avoided descriptions related to moral transgressions or “violation of the body envelope” (Haidt et al., 1994), which might recall also violence or physical harm. This dataset was validated on an independent sample of native French speaker volunteers (Pilot 1: N = 24; 11 men; age, 27 ± 9.25), who were asked to evaluate the event described in each sentence in terms of pain, disgust, and unpleasantness on a visual analog scale (VAS). Following the results from this pilot, we selected a subportion of 81 sentences (27 triplets containing 1 sentence for each state), with the following characteristics. First, painful contexts elicited larger pain ratings than disgust and neutral contexts, whereas disgust contexts elicited larger disgust ratings than pain/neutral contexts. Second, pain and disgust contexts elicited similar unpleasantness ratings, both reliably larger than those associated with neutral contexts. Finally, all three categories were matched in terms of text length (number of characters) and lexical frequency, referring to the database Lexique 3.83 (New et al., 2004) which exploits a corpus of 218 books (135,000 words) published between 1950 and 2000. See Table 2 and Figure 1 for full details.
Contextual information
Experimental setup
We employed two experiments using the video clips and sentences in such a way that the state described in the context and the one displayed in the face could be congruent or incongruent. Both experiments were programmed and run with Matlab R2012a (MathWorks) with the aid of the Cogent 2000 toolbox (Wellcome Department of Imaging Neuroscience).
Experiment 1
Experiment 1 was a behavioral study organized in two independent experimental sessions (Fig. 1C), the order of which was counterbalanced across participants. The first task (face classification) was chosen in keeping with prior research investigating contextual influences on facial expressions (Carroll and Russell, 1996; Stewart et al., 2019). Participants first read one contextual sentence and subsequently were shown a facial expression, which they had to classify by pressing one of three keys corresponding to “pain,” “disgust,” or “neutral” (the association key state was balanced across participants), within a time window of 5 s. Importantly, as the experiment required participants to evaluate only facial expressions, contexts were manipulated as task-irrelevant competing information. In other words, while we ensured attention toward both facial expressions and the contexts (see catch trials described below), participants were explicitly instructed to ignore the sentence during the execution of the main task. Within this paradigm, the 81 contexts (27 per state) and the 81 expressions (27 per state) were matched pseudorandomly to get nine independent conditions (each with nine repetitions), where each facial expression type was associated with each type of context, leading to a “3 expressions (fP, fD, fN) × 3 contexts (cP, cD, cN)” factorial design.
The second task (unpleasantness rating) was organized in almost identical fashion to the classification paradigm, with the only difference being that participants were asked to quantify the degree of unpleasantness experienced by the person depicted in the video. The evaluation happened through a VAS where the two extremities were labeled as “neutral” and “extremely unpleasant” (the side of the anchors was balanced across participants) and a cursor could be moved along the scale by pressing two different keys. This rating task was chosen to account for a potential limitation of the classification task. Indeed, based on the literature participants are expected to classify faces as function of the preceding context (Carroll and Russell, 1996; Stewart et al., 2019). Such effect, however, could be explained either in terms of contextual influences on facial processing (e.g., “I see more pain in the face”) or in terms of preactivation of a given response selection (e.g., “I am more ready to select ‘pain’”) regardless of the observed face. The unpleasantness rating task was designed in such a way that response selection occurred along a dimension that was orthogonal to (and matched between) pain/disgust categories. This would allow us to test whether individuals respond to faces as function of the preceding context without any response preselection confound.
As, in both tasks, participants were required to evaluate only facial expressions, we included a control condition to ensure that they were paying attention also to the sentences presented before each video. We randomly embedded in each session nine “catch trials” in which contextual sentences were followed by a question aiming at testing participants’ comprehension of the situation described. These nine sentences were chosen from those excluded from the contextual validation pilot and therefore shared similar properties with the 81 used in the main conditions. The question was: “How many living beings are there in the situation described by this sentence?” The possible answers were “one living being” or “more than one living being.” Participants had 5 s to press one of two keys corresponding to the two possible answers (the association key response was balanced across participants). Overall, each session comprehended 90 trials (81 experimental trials and 9 catch trials) and lasted approximately 25 min.
Experiment 2
In this experiment, we recorded neural activity through functional magnetic resonance imaging (fMRI) while participants underwent a modified version of the “unpleasantness rating” session from Experiment 1. In particular, we selected the unpleasantness rating task (as opposed to the classification task), as this would allow for the most unbiased investigation of contextual effects on facial processing (e.g., by testing differences in the neural response to the videos when these were congruent vs incongruent with the previous context) independent of any response preselection confound. Furthermore, the paradigm was modified to (1) overcome limitations from the previous experiment (see below, Behavioral data), (2) optimize the design sensitivity for the analysis of neural activity, and (3) include a high-level control condition where faces and contexts were presented in isolation.
Hence, the core of the experiment was simplified to a “2 expressions (fP, fD) × 2 contexts (cP, cD)” design, with four independent conditions, where pain and disgust expressions were displayed following pain or disgust contexts, in either a consistent or inconsistent fashion. Additionally, we included six high-level control conditions: three of those participants saw fP, fD, and fN in the absence of any previous context, whereas the remaining three participants read cP, cD, and cN phrases followed immediately by a rating scale. This led to an overall of 10 independent conditions, with nine repetitions each. Trials in each of these conditions were followed by a jittered interstimulus interval ranging between 2 and 5 s. The same jittered interval was presented between contexts and faces in the main trials where the two sources of information were integrated. Please note that, as this modified paradigm contained context-only trials, participants knew that they had to pay attention also to contexts throughout the experiment. It was therefore not necessary to include any control catch trial as in Experiment 1.
Procedure and apparatus
After having read and signed the consent form and MRI security checklist (for Experiment 2), participants underwent the experiment as described above. In Experiment 1, they sat comfortably on an office chair and watched the stimuli displayed on a Dell PC screen. Keypresses were recorded on the Dell keyboard where the relevant response keys were highlighted. In Experiment 2, participants lay supine in the scanner with their heads fixed by firm foam pads and underwent a unique scanning session of ∼25 min. The visual stimuli were presented on a 23″ MRI-compatible LCD screen (BOLDScreen23; Cambridge Research Systems). Keypresses were recorded on an MRI-compatible bimanual response button box (HH-2X4-C; Current Designs). Following the experimental session, participants filled out demographic questionnaires and were formally debriefed. Experiment 1 took place at the Brain and Behavior Laboratory of the University of Geneva and required approximately 60 min. Experiment 2 took place at the Human Neuroscience Platform of the Campus Biotech in Geneva and required approximately 90 min.
Data analysis
Behavioral data
In the analysis of rating stimuli from both experiments, the cursor position on the scale was converted into a scalar ranging from 1 (the position associated with the label “neutral”) to 10 (the opposite position, associated with “extremely unpleasant”). Behavioral data were analyzed through a (generalized) linear mixed model with “expressions” and “context” as fixed factors. As random factors, we modeled the identity of the participants, the identity of the people displayed in the video clips, and the contextual sentences. In particular, we privileged those converging models with the most complex random structure (Tables 1⇑–3), in order to account for possible idiosyncratic effects of the experimental materials. For the analysis of correct response times (from the face classification session, in Experiment 1) and ratings (from Experiments 1 and 2), we used a linear mixed model, and the significance of the fixed effects was calculated using the Satterthwaite approximation of the degrees of freedom. For the analysis of classification accuracy (from Experiment 1), we employed a generalized linear mixed model with a binomial distribution and Laplace approximation. The analysis was carried out as implemented in the lmerTest package (Kuznetsova et al., 2017) from R.3.4.4 software (https://cran.r-project.org/).
Contextual effects on facial processing
Neural activity
In Experiment 2, brain structural and functional images were acquired by the means of a 3 T Siemens Magnetom Prisma whole-body MRI scanner with a 64-channel head-and-neck coil. The sequence was multiband with time to recovery, 1,100 ms; TE, 32 ms; flip angle, 50°; 66 interleaved slices; 112 × 112 in-plane resolution; 2 × 2 × 2 mm voxel size; and no interslice gap. We used no parallel acquisition technique and multiband acceleration factor 6. We estimated a field map based on the acquisition of two functional images with different echo times (short TE, 4.92 ms; long TE, 7.38 ms). A structural image of each participant was also recorded with a T1-weighted MPRAGE sequence (192 slices; TR, 2,300 ms; TE, 2.32 ms; flip angle, 8°; slice thickness of 0.9 mm; in-plane resolution, 256 × 256; 0.9 × 0.9 × 0.9 mm voxel size).
Statistical analysis was performed using the SPM12 software (http://www.fil.ion.ucl.ac.uk/spm/). For each subject, functional images were realigned, unwrapped, and slice-time corrected. The Artifact Detection Tools (embedded in the CONN21 toolbox; Whitfield-Gabrieli and Nieto-Castanon, 2012) were then used for the identification of outlier scans in terms of excessive subject motion and signal intensity spikes. Finally, the images were normalized to a template based on 152 brains from the Montreal Neurological Institute with a voxel-size resolution of 2 × 2 × 2 mm and smoothed by convolution with an 8 mm full width at half-maximum Gaussian.
Preprocessed volumes were fed into a first-level analysis using the general linear model framework implemented in SPM. In particular, our design had seven kinds of face conditions: three conditions in which painful, disgusted, and neutral facial expressions were presented in the absence of a preceding contextual information and four conditions in which painful and disgusted expressions were presented following either a consistent or inconsistent context. These seven conditions were modeled through a boxcar function corresponding to each video duration. Furthermore, three kinds of contexts were modeled separately as 3 s events. We accounted for habituation effects in neural responses by using the time-modulation option implemented in SPM, which creates, for each condition, an additional regressor in which the trial order is modulated parametrically. This led to a total of 20 regressors (10 main conditions and 10 time modulators) that were convolved with a canonical hemodynamic response function and associated with regressors describing their first-order time derivative. To account for movement-related variance, physiological-related artifacts, and other sources of noise, we also included the six realignment parameters, dummy variables’ signaling outlier scans (from Artifact Detection Tools), and an estimate of cardiac- and inspiration-induced changes in the signal based on PhysIO toolbox (Kasper et al., 2017). Low-frequency signal drifts were filtered using a cutoff period of 128 s. Serial correlations in the neural signal were accounted for through exponential covariance structures, as implemented in the “FAST” option of SPM. Global scaling was applied.
Functional contrasts, testing differential parameter estimates images associated with one experimental condition versus the other were then fed in a second level, one-sample t test using random-effect analysis. Effects were considered significant if exceeded p < 0.05, familywise correction for multiple comparisons at the cluster level, with an underlying height threshold of p < 0.001, uncorrected (Flandin and Friston, 2019).
Results
Behavioral data
Preliminary analysis
Of the 38 participants recruited for Experiment 1, eight did not carry out the unpleasantness rating task due to technical issues. On the remaining population, we first analyzed participants’ performance in the catch control condition, where they were asked to respond to properties of the contextual phrases. The overall accuracy was 68% across the two sessions (for those who carried out only the classification task, the accuracy was calculated only in one session). However, there was an important interindividual variability in the performance of this control, with eight individuals at chance level (50% or less), who were excluded from the final analysis. Hence, the final sample for Experiment 1 was 30 participants (17 males; mean age, 23.43 ± 4.57) for the face classification and 22 (12 males; mean age, 23.86 ± 5.06) for the unpleasantness rating. The high number of excluded people reveals the suboptimal nature of the catch control condition from Experiment 1. Consequently, Experiment 2 was a modified version of the unpleasantness rating session from Experiment 1, without such control, but with ∼30% of trials involving rating the unpleasantness of the contexts themselves, rather than the facial expressions (see Materials and Methods). This ensured that the participants paid attention also to contexts throughout the experimental session.
Classification task
Table 3 reports full details on the statistical analysis and associated results. When analyzing accuracy as a function of “expression,” we found no difference between the classification of pain (accuracy, 63.23% ± 17.87) and disgust (64.83% ± 19.00; fP–fD, z = −1.30, p = 0.193). Instead, neutral expressions were classified with significantly higher accuracy (91.90% ± 13.65; fN–fD, z = 3.71, p < 0.001). Furthermore, accuracy was influenced by the preceding “context.” Specifically, when processing disgusted expressions, participants were less accurate when the video clips were preceded by a pain (58.60% ± 27.35) as opposed to a disgust context (71.08% ± 19.96; cDfD–cPfD, z = 2.12, p = 0.033). We found an effect with opposite direction when participants processed painful faces, leading to a significant “expression–context” interaction [(cDfD–cPfD)–(cDfP–cPfP), z = 2.22, p = 0.026] revealing that pain facial expressions were processed with higher accuracy when preceded by a pain (66.84% ± 22.54) as opposed to a disgust context (62.98% ± 26.72). We found no interaction effect associated with neutral expressions.
This interaction effect confirms previous results showing that contextual sentences can influence the accuracy in subsequent face classification (Carroll and Russell, 1996; Stewart et al., 2019). In our task, participants were presented with two sources of information (facial expression and sentence on a contextual information). While explicitly instructed to answer only accordingly to the former, our results suggest that also the latter information source (contextual sentences) was processed and contributed to the participants’ performance. However, this effect could be interpreted in two different ways: on the one side, context might influence the evaluation of the expression, reflecting true context-face integration; on the other side, context could merely facilitate the preselection of a given response regardless of the facial information available. To shed more light on the mechanisms underlying contextual effects on facial processing, we repeated the analysis by modeling the occurrence of pain/disgust/neutral responses instead of accuracy. The results are described in Figure 2A and Table 3 and confirm that the response likelihood is influenced by “expression” and “context,” without any interaction. More specifically, participants were more likely to select pain responses when processing a face expression following any context (fP–fD: z = 4.81, p < 0.001) and, independently, when any face was preceded by painful contexts (cP–cD: z = 2.04, p = 0.042; Fig. 2A, left subplot). Likewise, participants were more likely to select disgust responses when processing a disgust expression following any context (fD–fP: z = 4.40, p < 0.001) and, independently, when any face was preceded by disgusting contexts (cD–cP: z = 2.06, p = 0.039; Fig. 2A, middle subplot). Instead, neutral responses were modulated exclusively by the facial expressions, with a higher likelihood of correct answer when processing neutral faces (fN–fD: z = 6.61, p < 0.001, Fig. 2A, right subplot), without any context effect. Overall, contexts influenced the appraisal of videos in an “additive” fashion, that is, by increasing the likelihood of selecting the response suggested by the context, independently of the subsequent face. No significant effect was associated with the response times of correct responses.
Behavioral data. A, Classification task from Experiment 1. Percentage values of pain (left subplot), disgust (middle subplot), and neutral responses (right subplot), plot as function of the facial expression displayed (red, painful; blue, disgusted; gray, neutral) and the preceding context (cP, painful; cD, disgusting; cN, neutral). B, Rating task from Experiment 1. Unpleasantness rating values displayed as function of facial expression and preceding context. C, Rating task from Experiment 2. The left subplot displays unpleasantness ratings evoked by the processing of facial expressions, as function of the expressions themselves and the preceding context (none, faces presented without associated contextual information). The right subplot displays unpleasantness ratings evoked by the processing of contextual sentences, in absence of any associated facial expression. For each graph, the distribution of ratings/responses per condition is described in terms of box plots, in which the star represents the distribution mean, the horizontal line refers to the median, the box edges refer to the interquartile range, and whiskers refer to overall data range within 1.5 of the interquartile range. Individual data points are also displayed as color coded circles.
Unpleasantness rating task
The unpleasantness rating task was devised as a most stringent (albeit indirect) way to assess whether context affected the processing of facial expressions. Indeed, as responses are labeled in terms of unpleasantness (matched and orthogonal between pain/disgust), any contextual influence in face processing could not have been interpreted in terms of response preselection. In this view, both Experiments 1 and 2 confirm that unpleasantness was influenced exclusively by “expressions,” with no difference between pain and disgust faces (fP–fD, t ≤ 0.91, p ≥ 0.378), but less unpleasant ratings for neutral expressions (fN–fD, t ≤ −6.16, p < 0.001; Fig. 2B,C, Tables 2, 3). In neither experiment, ratings were influenced by the “context” main effect or by the “expression–context” interaction.
Neural activity
Facial expressions
In Experiment 2, we analyzed the neural activity evoked by the unpleasantness rating task. Table 4 lists the regions implicated in processing facial expressions in the absence of previous contexts. When compared with neutral expressions, both pain and disgust expressions recruited bilaterally the fusiform gyrus and middle temporal gyrus, extending to the inferior frontal gyrus (Fig. 3A). Pain expressions recruited also the right superior temporal sulcus and the right precentral gyrus extending to the inferior frontal gyrus. These same regions were also observed when contrasting directly pain versus disgust expressions (Fig. 3B), whereas no region displayed increased activity for the opposite contrast.
Brain surface showing a significant increase of neural activity associated with pain or disgust expressions. A, Suprathreshold effects for pain (red blobs) and disgust (blue blob) contrasted with neutral expressions. Regions involved in both pain and disgust are displayed in purple. B, Suprathreshold activity for pain versus disgust expressions. The average parameter estimates from the right middle temporal gyrus, fusiform gyrus, and inferior frontal gyrus are displayed as box plots and individual data.
Regions implicated when observing facial expressions in the absence of a preceding context
Contextual phrases
We also looked at the neural areas implicated in the processing of contextual sentences without associated facial expressions (Table 5). Pain contexts, as compared with neutral ones, implicated the supramarginal gyrus, postcentral gyrus, middle temporal gyrus, posterior insula, and frontal operculum (Fig. 4A, red blobs). Part of this network was observed also when contrasting pain contexts against disgust ones (Fig. 4B, red blobs). Instead, disgust (vs neutral) contexts recruited portions of the middle temporal gyrus and frontal operculum already observed for the case of pain (Fig. 4A, purple blobs) plus the amygdala, extending posteriorly to the hippocampus and parahippocampal gyrus (Fig. 4A, blue blobs). Furthermore, when contrasting directly disgust contexts against pain ones, we found a network involving the bilateral angular gyrus, temporal pole, precuneus, and dorsomedial prefrontal cortex (Fig. 4B, blue blobs).
Brain surface showing a significant increase of neural activity associated with pain or disgust contexts. A, Suprathreshold effects for pain (red blobs) and disgust (blue blob) contrasted with neutral contexts. Regions involved in both pain and disgust are displayed in purple. B, Suprathreshold activity for pain versus disgust contexts. The average parameter estimates from the right frontal operculum, left posterior insula, and angular gyrus are displayed as box plots and individual data.
Regions implicated when reading contextual sentences without any associated facial expression
Effects of context on face processing
Table 3 displays the brain regions implicated in facial expressions followed by contextual phrases (see Table 6). As a first step, we tested for increased activity when an expression was preceded by a pain versus disgust context (cP–cD), and we found increased activity in the precuneus and supramarginal/postcentral gyrus, extending to the central operculum and posterior insula (Fig. 5, red blobs) in a subportion of the network implicated in the processing of contexts alone. Instead, the opposite contrast (cD–cP) showed an increased activity in the inferior frontal gyrus. Furthermore, under a slightly less conservative threshold (FDR cluster correction at q < 0.05), we found increased activity also at the level of the angular gyrus (Fig. 5, blue blobs), over and around the area associated with the processing of disgust contexts alone.
Brain surface showing effects of contextual information in the neural response to facial expressions. The main effects of the previous context are displayed in red (cP–cD) and blue (cD–cP), whereas the green blobs refer to interaction effects showing increased activity for consistent face–context paring, relative to inconsistent ones. The average parameter estimates from the left postcentral gyrus, angular gyrus, and perigenual MPFC are displayed as box plots and individual data.
Brain structures whose response to facial expressions is influenced by the preceding context
As a last step, we tested the interaction, specifically the contrast comparing neural response to faces when associated with consistent versus inconsistent contexts. When correcting for multiple comparisons for the whole brain, no suprathreshold effect was observed. However, following studies that repeatedly implicated the perigenual mPFC in the integration of facial and nonfacial cues of affective states (Peelen et al., 2010; Skerry and Saxe, 2014), we computed a small volume correction analysis on ROI combining medial portions areas 10 and 14 (bilaterally) from the Brainnetome Atlas (Fan et al., 2016). Within this search area, we found a significant interaction effect (Fig. 5, green blobs). No region was implicated in the inverse contrast.
Discussion
We investigated the role played by contextual information in the processing of spontaneous facial expressions of pain and disgust. We found that contextual cues have an “additive” influence on the classification of faces, by increasing the likelihood of selecting the response implied by the context, regardless of the expression displayed. In a separated experiment, we found that contextual information influenced the neural processing of expressions in multiple ways. The postcentral cortex and angular gyrus, heavily sensitive to painful and disgusting contexts, respectively, were also strongly recruited when a face followed said contexts. Furthermore, the perigenual mPFC displayed increased activity when pain and disgust expressions followed consistent contexts, suggesting that the mPFC integrates state-specific information from both facial and nonfacial cues.
Networks for facial expressions
The sight of pain and disgust expressions triggered a common set of regions involving the ventral occipital cortex and posterior superior temporal structures. This converges with previous literature describing these regions as part of a core network for face processing (Said et al., 2010; Deen et al., 2015; Duchaine and Yovel, 2015; Schobert et al., 2018). Furthermore, pain expressions preferentially activated the superior temporal sulcus in all its length. This possibly reflects the differential facial response patterns between the two states, as pain usually triggers more frequently mouth movements than disgust (Kunz et al., 2013; Dirupo et al., 2022), and the anteroventral superior temporal sulcus was found associated with movements of the lower portion of the face (Schobert et al., 2018). Alternatively, the pain-preferential activity might underlie a representation of the painful characteristics of the face, as suggested for the activity in the inferior frontal gyrus and neighboring insula (see Timmers et al., 2018; Ding et al., 2019; and Jauniaux et al., 2019 as meta-analyses). Contrary to previous experiments (Jabbi et al., 2007; Timmers et al., 2018; Jauniaux et al., 2019; Zhao et al., 2021, 2022), our study finds little response of insular and middle cingulate activity to affective facial responses (especially in the case of disgust). It should be stressed, however, that our dataset was characterized by entirely spontaneous expressions (without any extrafacial supporting information). Instead, previous studies relied often on actors which could have led to a more pronounced and stereotypical facial configuration and, in turn, different neural activations.
Networks for contextual information
The analysis of contextual sentences revealed a dissociation between supramarginal gyrus, postcentral gyrus, and posterior insula, sensitive to pain-related phrases, and angular gyrus, temporoparietal junction, and hippocampus/amygdala, sensitive to disgust contexts. The bilateral frontal operculum appeared implicated in both states. These results converge with previous studies on verbal descriptions of physical pain (Gu and Han, 2007; Bruneau et al., 2012, 2013; Corradi-Dell’Acqua et al., 2014, 2020; Jacoby et al., 2016) which is thought to trigger similar neural responses to those observed for self-directed experiences (Corradi-Dell’Acqua et al., 2014, 2023). A similar interpretation could fit the hippocampus/amygdala, often implicated in firsthand experience of core disgust (Sharvit et al., 2020; Gan et al., 2022, 2023). As for the frontal operculum, previous studies suggest that the neural response of this region (and the neighboring dorsal anterior insula) might underlie a broad coding of unpleasantness shared between pain and disgust (Corradi-Dell’Acqua et al., 2016).
Previous studies consistently reported a dissociation between the supramarginal, postcentral, and insular structures, responding to sentences of pain and unpleasant somatic sensations, and the angular gyrus and temporoparietal cortex, responding to nonsomatic affective (Bruneau et al., 2012, 2013; Corradi-Dell’Acqua et al., 2014) and mental states like thoughts and believes (Saxe and Powell, 2006; Mar, 2011). It has been suggested that temporoparietal regions are involved in processing people’s affective states via their beliefs/thoughts (Corradi-Dell’Acqua et al., 2014). This interpretation fits with our findings, as disgust is grounded on evaluations about potential intoxications/contaminations (Rozin et al., 1999), and therefore its inference in others might underlie our representation of people’s beliefs about those risks.
Contextual influence in networks for facial expressions
We found that the precuneus, the supramarginal, poscentral, and opercular gyrii, showed an “additive” effect for contextual cues, with enhanced activity when a facial expression was preceded by pain (vs disgust) contexts. Importantly, this activation (Fig. 5, red blob) is part of a larger cluster involved in pain-related sentences alone (Fig. 4, red blobs), suggesting that representation of contexts is subsequently reinstated when processing an expression potentially in line with such information. Our results are in keeping with Zhao et al. (2021) who showed that contextual cues informing about whether painful expressions were genuine (vs simulated) enhanced supramarginal/postcentral activity. Hence, the combined information between present and previous research suggests that this region plays a key role in interpreting facial information in light of pain-relevant prior knowledge, possibly reflecting a broader mechanism for matching pain representations from different sources of information (Lamm et al., 2016).
Also disgust context exerted an “additive” effect on face processing networks, enhancing the activity at the level of angular gyrus and IFG. These results converge partly with Zhao et al. (2022), who tested how reliability cues influenced the processing of facial responses to disgusting odors and found as well that IFG activity was higher when contexts suggest the true nature of the expression. Importantly, this prior study implicated also other structures, like the midbrain olfactory cortex. Please notice, however, that in our research disgust-related contexts described an ample range of eliciting events (visual, auditory, gustatory, etc.), whereas only facial expressions were manipulated through olfaction. Hence, our contextual modulations at the level of the angular gyrus and temporoparietal cortex should be interpreted as part of a general mechanism for disgust and nonsomatic emotion appraisal (Corradi-Dell’Acqua et al., 2014) which is not idiosyncratic to one sensory channel.
Most critically, perigenual mPFC showed enhanced activity whenever a facial expression was paired with a consistent (vs inconsistent) context. Hence, mPFC operates in a state-conditional way, by distinguishing whether different sources of information are coherent with one another. Our results are in keeping with previous studies suggesting that this region represents people’s affect across state-specific patterns, independently from the stimulus source (Peelen et al., 2010; Skerry and Saxe, 2014). However, previous effects could have been driven by a more general representation of valence, as Skerry and Saxe (2014) compared exclusively positive versus negative affect, and Peelen et al. (2010) implemented a wide range of emotions but found stronger differentiation in mPFC between happiness and all negative states. In this perspective, the present study provides very reliable evidence that mPFC represents specific, but comparably unpleasant, states in others across multiple integrated sources of information.
Further considerations and overall conclusions
Overall, context influenced the networks for face processing in both an “additive” and “multiplicative” fashion. This mirrors partially the “additive” results from Experiment 1, whereby context increases relevant classifications regardless of the displayed face (Fig. 2). However, whereas classification results from Experiment 1 could be explained also in terms of response preselection, this is not the case for the neuroimaging data, as in the unpleasantness rating task response selection occurs along a dimension orthogonal to “pain” and “disgust” categories. Hence, Experiment 2 provides a more stringent evidence that context influences facial processing in an “additive” fashion, unveiling also the neural structures that promote specific face categorization (e.g., supramarginal/postcentral for pain). Unfortunately, despite its inherent interpretational advantages, Experiment 2 does not allow us to link directly brain responses with overt interpretation of facial expressions.
In this study, we exploited a dataset of spontaneous dynamic facial expressions, characterized by rubber cannulas connected to the face’s nostrils (Fig. 1; see Dirupo et al., 2020, 2022, for more details). This might have influenced negatively the plausibility of the experimental setup, as none of the manipulated contexts involved odorants delivered through tubes. We believe that the cannulas (present constantly in all videos) and potential plausibility considerations had negligible influences on our results, especially considering that participants were not required to explicitly compare faces with previous sentences, which operated instead as a task-irrelevant information.
Finally, although our findings provided converging evidence with prior neuroimaging results (see above), it is unclear how to interpret discrepancies, as some of these studies adopted different approaches and manipulated contexts as task-relevant information. Future research will need to examine more thoroughly the role played by task demands in the networks mediating contextual-facial integration.
In conclusion, our study is a systematic investigation of the cognitive and neural processes mediating contextual influences on affective face processing. Across two experiments, we found that individuals partly classify the expressions based on contextual information, regardless of the facial information displayed. This effect was further supported by evidence that neural structures specifically implicated in pain and disgust contexts were subsequently reactivated for any expression following said context. Additionally, we found that the perigenual mPFC discriminated between face-context pairings that were consistent (vs inconsistent) from one another. Overall, our study unveils key neural processes underlying the coding of state-specific information from both face and context and sheds new light on how they are integrated within the mPFC.
Footnotes
The work was funded by the Swiss National Science Foundation (Grant Number PP00P1_157424 and PP00P1_183717) awarded to Prof. Corrado Corradi-Dell’Acqua. The funding sources had no involvement in any step of the project including the study design, data collection, analyses, and interpretation of the data or writing.
The authors declare no competing financial interests.
- Correspondence should be addressed to Giada Dirupo at giada.dirupo{at}gmail.com.