Abstract
Humans possess a remarkable ability to rapidly access diverse information from others’ faces with just a brief glance, which is crucial for intricate social interactions. While previous studies using event-related potentials/fields have explored various face dimensions during this process, the interplay between these dimensions remains unclear. Here, by applying multivariate decoding analysis to neural signals recorded with optically pumped magnetometer magnetoencephalography, we systematically investigated the temporal interactions between invariant and variable aspects of face stimuli, including race, gender, age, and expression. First, our analysis revealed unique temporal structures for each face dimension with high test–retest reliability. Notably, expression and race exhibited a dominant and stably maintained temporal structure according to temporal generalization analysis. Further exploration into the mutual interactions among face dimensions uncovered age effects on gender and race, as well as expression effects on race, during the early stage (∼200–300 ms postface presentation). Additionally, we observed a relatively late effect of race on gender representation, peaking ∼350 ms after the stimulus onset. Taken together, our findings provide novel insights into the neural dynamics underlying the multidimensional aspects of face perception and illuminate the promising future of utilizing OPM-MEG for exploring higher-level human cognition.
- face perception
- magnetoencephalography
- multivariate decoding
- neural dynamics
- optically pumped magnetometer
- temporal generalization
Significance Statement
In everyday social activities, people can quickly interpret a wide range of information from others’ faces. Although converging evidence has shed light upon the neural substrates underpinning the perception of invariant and variable aspects of faces, such as race, gender, age, and expression, it is still not fully understood how the information of one face dimension alters the perception of another. In this study, we utilized multivariate decoding analysis on neural activity captured through optically pumped magnetometer magnetoencephalography during face perception. Our approach enabled a comprehensive exploration of the temporal interactions among different face dimensions, providing an improved understanding of the temporally structured neural dynamics that support the multidimensional face perception in the human brain.
Introduction
Humans possess a fundamental ability for social living, to collect a vast amount of information from others’ faces within just a few hundred milliseconds (Rossion, 2014). Some of the face dimensions (e.g., gender) remain relatively invariant over time, facilitating identity recognition, while other dimensions (e.g., expression) can dynamically change based on the social context (Bruce and Young, 1986; Campbell, 1996). Although the neural substrate underpinning variable or invariant face dimensions has been thoroughly studied (Kanwisher et al., 1997; Haxby et al., 2000; Hoffman and Haxby, 2000; Grill-Spector et al., 2004; Winston et al., 2004; Calder and Young, 2005; Duchaine and Yovel, 2015), the temporal structures and interactions associated with these face dimensions remain poorly understood. Specifically, we have yet to investigate whether the perception of one face dimension influences the perception of others and, if so, when these interactions occur. Addressing these questions will deepen our understanding of the rapid process of face perception.
A typical way to address temporal dynamics of various face dimensions is to use univariate event-related potentials/fields (ERPs/ERFs) analyses (Liu et al., 2002; Ito and Urland, 2005; Harris and Nakayama, 2007; F. Sheng et al., 2016; Pang et al., 2024). This method has revealed various neural components related to face cognition, e.g., P100/M100 for general response to face (Liu et al., 2002; Herrmann et al., 2005), N170 for categorical face perception (Bötzel et al., 1995; Sagiv and Bentin, 2001), N250 for face familiarity (Tanaka et al., 2006; Sommer et al., 2021), and P200 for racial categorization (Kubota and Ito, 2007; Wiese, 2012; Zhou et al., 2020). However, the relatively low sensitivity of univariate analysis (Haynes and Rees, 2006; Norman et al., 2006) makes it challenging to assess the fine-grained temporal dynamics of face perception. Furthermore, traditional event-related analyses do not consider the relationships between information processed at different times, i.e., temporal organization (King and Dehaene, 2014).
More recently, efforts have been made to uncover the temporal profiles of face perception using multivariate pattern analysis (MVPA; Cauchoix et al., 2014; Haxby et al., 2014; Barragan-Jason et al., 2015; Dobs et al., 2019; Muukkonen et al., 2020; Wardle et al., 2020; Ambrus et al., 2021; Li et al., 2022). This decoding-based approach offers increased sensitivity and requires fewer assumptions compared with univariate analyses (Haynes and Rees, 2006; Norman et al., 2006), facilitating the comparison of temporal characteristics across various dimensions. Using MVPA, recent studies have revealed that information related to gender and age is processed earlier than that of identity, following a coarse-to-fine scheme (Dobs et al., 2019). Additionally, it is suggested that arousal is elicited before valence during expression recognition (Li et al., 2022). However, the exploration of temporal interactions between variable and invariant face dimensions is limited, with existing studies only considering how familiarity influences the processing of other face dimensions (Ambrus et al., 2019; Dobs et al., 2019).
In this study, we systematically explored the intricate temporal structures of variable (expression) and invariant (race, gender, and age) face dimensions and, more importantly, their temporal interactions, through a series of multivariate decoding analyses on optically pumped magnetometer magnetoencephalography (OPM-MEG) recordings. Our findings reveal that age information can influence gender and race perception at different stages. Additionally, race representation can be affected by expression, and gender information can impact the perception of race. Together these results suggest complex interactions within a multifaceted face coding system, involving both variable and invariant dimensions. Notably, our decoding results from OPM-MEG recordings demonstrate a high level of test–retest reliability across different experiment sessions. In summary, this study provides a comprehensive understanding of the temporal interactions associated with variable and invariant face dimensions, emphasizing the promising potential of OPM-MEG in unveiling the neural dynamics of higher-level human cognition.
Materials and Methods
Experimental design
Subjects
A total of 23 right-handed Chinese participants were recruited for the OPM-MEG experiment. Two were excluded by reporting that they had difficulties stabilizing their bodies and heads in the cylinder shield. Thus, data from the remaining 21 participants (10 females; 11 males; age, 22.29 ± 2.12 years, denoted as mean ± SD in the subsequent text) were included in the following analyses. All participants had normal or corrected-to-normal vision and were naive to the purpose of this study. None reported a history of neurological or psychiatric symptoms. Participants were compensated for their time, with a potential bonus based on task performance. This study was approved by the Institutional Review Board of Peking University.
Stimuli
The stimuli were artificially generated by a face generator based on StyleGAN2 (available via https://github.com/a312863063/generators-with-stylegan2) while manipulating four face dimensions, each with two classes: race (Asian vs Caucasian), gender (female vs male), age (young vs elderly), and expression (neutral vs joyful). The generator was pretrained with diverse real face images encompassing various facial categories such as Asian and elderly faces, enabling the generation of artificial faces across different categories. For each combination of these four dimensions (i.e., 24 = 16 face conditions), we created four stimuli, yielding 64 different face images in total. Subsequently, the stimuli were background-removed using Remove.bg (https://www.remove.bg) and were converted to grayscale.
All face stimuli were controlled for low-level visual features to avoid confounding effects on subsequent decoding analyses, including luminance, root mean square (RMS) contrast, face size, and face position. The pixels of the stimuli were normalized to ensure consistent mean pixel intensities (luminance) and RMS contrasts. Next, the location and size of faces within the stimuli were estimated by employing a bounding box drawn via the built-in face detection algorithm in OpenCV-python. The stimuli were then translated and scaled to standardize the location and size of the faces, ensuring consistency both within and between dimensions. Lastly, the face stimuli were embedded into scrambled backgrounds randomly selected from scrambled images within the fLoc functional localizer dataset (Stigliani et al., 2015).
To ensure that the face pose did not contribute to the decoding results, we estimated post hoc the yaw (defined as the head shaking angle) and pitch (defined as the head nodding angle) of each face stimulus and found no significant difference between the two classes of each dimension (independent sample t test, all p > 0.05).
Task procedure
Before the main experiment, participants were instructed to complete 100 practice trials on the stimulus computer. In the main experiment, each run consisted of 270 trials. Each trial consisted of a face stimulus (4.96 × 4.96° visual angles in size) presented for 200 ms at the center of a gray screen (RGB, 127.5, 127.5, 127.5), then followed by a green fixation cross (RGB, 0, 255, 0; 0.21 × 0.21° in size; 0.02° in thickness) with a duration uniformly jittered between 900 and 1,100 ms (Fig. 1A). Within each run, the 64 face stimuli were repeated four times as no-catch trials, during which the participants fixated on the center of the face stimulus. Additionally, 14 catch trials were included, presenting the same face stimulus as in the previous trial (i.e., one-back), prompting participants to respond with a button press. The order of face stimuli in each run was pseudorandomized. Catch trials were inserted in an unpredictable but relatively uniform manner to avoid consecutive button presses. Each run lasted ∼5.5 min. Participants had a 1 min break between consecutive runs. The main experiment had two sessions on the same day. In Session A, 18 participants completed 10 consecutive runs, while the remaining three participants completed 6, 7, and 12 runs, respectively. Following a 1 h break, 15 out of the total 21 participants proceeded to Session B, with 13 participants completing 10 runs each and the remaining 2 participants completing 7 and 8 runs, respectively. Participants included in subsequent analyses (N = 21) reached an average reaction time of 557.30 ± 62.09 ms, and the mean sensitivity index was d’ = 3.45 ± 0.55.
Experiment design and time-resolved multivariate decoding analysis. A, Experimental design. Each trial consisted of a face stimulus with a duration of 200 ms and a green fixation cross with a duration jittered between 900 and 1,100 ms. For ∼5% of trials within each run, images were identical to the previous ones (i.e., one-back trial), prompting a button press response. B, Picture depicting the participant lying on the scanning bed. C, The pipeline of time-resolved multivariate decoding analysis. For each time point t, we collected the signals from all OPM-MEG channels of all trials corresponding to each pair of faces (e.g., face j and face k). Then we performed pairwise SVM classification in a cross-validated manner, resulting in an average decoding accuracy assigned to location (j, k) in the nRDMs at time t. D, Group-averaged nRDMs at 0, 100, 200, 300, 400, and 500 ms relative to the stimulus onset. E, Time course of grand average face image decoding accuracy, which was obtained by averaging all elements in the nRDM at each time point. The gray shadow indicates standard error. The black horizontal line above the x-axis indicates significant time points (cluster-based permutation test, cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05). The gray vertical line indicates the stimulus onset; gray horizontal line indicates the chance level of classification accuracy (i.e., 50% for two-class SVM classification). Exemplar face stimuli in A and C are artificially generated.
OPM-MEG acquisition
During the main experiment, participants were positioned in a supine orientation on the scanning bed inside the cylindrical shield of PyraMag Epoch 64 OPM-MEG system (Quanmag Healthcare; see Fig. 1B for the experiment setup). To ensure optimal performance, we closely positioned the OPM sensors on the participants’ scalp and were covered with a thin layer of gel for thermal insulation (J. Sheng et al., 2017). The cylindrical shield, based on our previous design (He et al., 2019), had its longitudinal axis perpendicular to the geomagnetic field, minimizing potential confounding effects from the Earth's magnetic field. For stimulus delivery, we utilized a high-precision stimulation and feedback system (SA-9800E, Shenzhen Sinorad Medical Electronics). Stimuli were projected onto a screen located at the open end of the cylindrical shield. The delay of the projector (33.07 ± 0.22 ms) was corrected post hoc in OPM-MEG recordings. The experiment program was implemented in MATLAB R2023b (MathWorks) using Psychtoolbox 3.0.19 (Brainard, 1997; Kleiner et al., 2007). Equipment control and data collection were managed with customized software based on LabView (National Instruments). Raw OPM-MEG data were recorded at a sampling rate of 1,000 Hz with a 24-bit digital acquisition system (Quanmag Healthcare). All data were collected at Changping Laboratory.
OPM-MEG data preprocessing
Raw OPM-MEG data were converted to an MNE-python (version 1.4; Gramfort et al., 2013)-compatible FIFF format. Notch filters were applied using a zero-phased finite impulse response (FIR) filter at 44 Hz (electronic system noise) and 50 up to 250 Hz in a step of 50 Hz (line noise and its harmonics). Data were then bandpass filtered between 1 and 100 Hz using an FIR filter, shifted backward to compensate for phase delay due to the original filtering. We chose a 1 Hz high-pass cutoff frequency to remove slow drift in the raw data, as adopted by previous OPM-MEG literature (Hill et al., 2020; Westner et al., 2021; Iivanainen et al., 2023; Rier et al., 2023). Channels with high within-channel standard deviations of the filtered OPM-MEG signal were discarded using a generalized extreme studentized deviate (ESD) test at a 0.05 significance threshold (Rosner, 1983), as implemented in standardized M/EEG preprocessing pipelines (e.g., OSL). Moreover, further visual inspection was conducted to remove the remaining bad channels for each run. The bad trials within each run were also automatically removed using a generalized ESD test at a 0.05 significance threshold. To ensure a consistent number of channels in the two sessions for training support vector machine (SVM) classifiers, we excluded the union of bad channels, independently for each participant, across all runs and sessions, yielding 45.62 ± 2.55 good channels on average across the participants. Note that the number of available channels in our study was sufficient for conducting MVPA as suggested by a recent study (Bezsudnova and Jensen, 2023). Then, we applied independent component analysis (ICA) and nulled out ICs that related to heartbeat, ocular activity, or movement. Based on the de-artifacted continuous data, trials were extracted from −200 to 800 ms relative to the stimulus onset. Trial-wise baseline correction was applied by removing the average magnetic field between −200 and 0 ms relative to the stimulus onset, and detrending was done to account for slow drift. All data processing was conducted using MNE-python and customized Python (version 3.11.6; Python Software Foundation) codes.
Data analysis and statistical inference
Time-resolved multivariate decoding
To characterize the temporal dynamics of different face dimensions, we performed time-resolved multivariate decoding using linear SVM (Fig. 1C for the pipeline). The first step involved constructing a neural representational dissimilarity matrix (nRDM) at each time point (i.e., time-resolved), resulting in symmetric matrices with an undefined diagonal line. Elements within the nRDMs indexed the extent to which OPM-MEG signals distinguished each pair of face stimuli at a given time point. To build the nRDM, we extracted whole-brain OPM-MEG signals corresponding to face stimuli j and k at time t. Next, we trained SVM classifiers with fivefold cross-validation for the binary classification of each face pair, resulting in classification accuracy. Note that in SVM decoder training, we averaged the trials within each fold to further reduce noise. This process was repeated 1,000 times, and the average accuracy was assigned to the (j, k) position in the nRDM at time t. This calculation spanned all time points from −200 to 800 ms relative to the stimulus onset and for all pairs of face stimuli, yielding 1,001 nRDMs (Fig. 1D).
Next, we performed a representational similarity analysis (RSA; Kriegeskorte et al., 2008). For each face dimension, we constructed a model RDM by assigning 0 to pairs of face stimuli within the same class (“within”) and 1 to pairs belonging to different classes (“between”; Fig. 2, Column 1). These model RDMs separately represented the ideal pattern of nRDMs when face stimuli were distinguished solely by a given face dimension. To account for the effects of low-level visual features, we used the HMAX model (Riesenhuber and Poggio, 1999; Serre et al., 2005) to construct low-level visual feature model RDM by calculating the pairwise dissimilarities (defined as the Euclidean distance between two images’ activations in the C2 layer of HMAX). To quantify the extent to which the nRDMs could be explained by the model RDMs, we calculated partial Spearman's ρ between the nRDMs and the five model RDMs (i.e., four face dimensions and the low-level visual features) over time, yielding temporal profiles for each face dimension. Following previous conventions (Giari et al., 2020; Muukkonen et al., 2020), the partial correlation was used to control confounding from other model RDMs. Note that to ensure that our results were not biased by various low-level visual information, we also used five different dissimilarity metrics including pixel-wise similarity, structural similarity index measure (Wang et al., 2003), feature similarity index measure (L. Zhang et al., 2011), learned perceptual image patch similarity (LPIPS, R. Zhang et al., 2018) metric based on VGG, and LPIPS based on AlexNet, which yielded highly similar RSA results. To reduce noise, we smoothed the original decoding time courses using a zero-lag Butterworth low-pass filter with a cutoff frequency of 30 Hz. We also applied two-dimensional multidimensional scaling (MDS) to the nRDMs to visualize the representational structures of specific face dimensions at their peak latency.
Temporal dynamics of four face dimensions unveiled by time-resolved multivariate decoding. Rows (A to D) represent age, expression, gender and race, respectively. Column 1, Binary model RDMs for individual face dimensions, with zeros (black) corresponding to those between two faces within the same class and ones (white) corresponding to the dissimilarity between two faces from different classes. Column 2, Time courses of partial Spearman's ρ between nRDMs and model RDMs. These temporal profiles show the extent to which nRDMs are explained by each model RDM over time. The black curve indicates the average time course. The gray shadow around the average time course indicates standard error. The light purple curves indicate the time courses of individual participants. The black horizontal line above the x-axis indicates significant time points (cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05). The gray vertical line indicates stimulus onset, and the gray horizontal line indicates null correlation. Exemplar face stimuli are shown at the top-right corner of each subplot. Column 3, Structures of nRDMs at peak latencies, visualized by 2D MDS. Two classes within a face dimension with better decoding performance are more distinguishable at its peak latency and vice versa. Exemplar face stimuli in Column 2 are artificially generated.
For statistical inference, we conducted nonparametric statistical tests without prior assumptions on the distribution of data (Nichols and Holmes, 2002; Pantazis et al., 2005; Maris and Oostenveld, 2007). To identify periods with a significant effect, we employed cluster-based permutation tests. For face image decoding based on SVM, decoding strength (characterized by average decoding accuracies) was tested against 50% (i.e., chance level for binary classification). For each face dimension, decoding strength characterized by the partial Spearman's ρ (between a given model RDM and nRDMs while partialling out the other model RDMs) was compared with zero (i.e., null correlation). In each permutation, we flipped the sign of the decoding strength time course in randomly selected participants; then we calculated the t value for each time point and the maximal cluster mass (i.e., the sum of t values of continuous time points exceeding a cluster-forming threshold p = 0.05). This procedure was conducted 1,000 times (i.e., 1,000 permutations). The p value of a suprathreshold cluster was calculated based on the null distribution of maximal cluster mass from all permutations (with a significant threshold of cluster-wise p = 0.05). Onset latency was characterized as the first time point poststimulus onset showing a significant effect. Peak latency was identified as the time point corresponding to the highest decoding strength within the window of 0 to 220 ms poststimulus onset. The choice of 220 ms was based on our experimental setup to avoid potential bias from the stimulus offset response (Carlson et al., 2011; Dobs et al., 2019). Duration was defined as the length of the interval between the onset and the offset of significant effects. We conducted 2,000 bootstrap resampling of the time course by participants, resulting in an empirical distribution for the onset, peak, and duration for each face dimension. The 95% confidence intervals were defined by the 2.5th and 97.5th percentiles of these distributions.
Temporal generalization analysis
To depict how the neural representation of face stimulus and each face dimension evolved over time, we conducted the temporal generalization analysis, which is a two-dimensional extension of the decoding analysis above (Fig. 3A for the pipeline). This analysis was done by applying the SVM classifier trained at a specific time point to all the other time points. If the classifier obtained at time t1 yielded performance significantly surpassing the chance level at time t2, it suggested that the neural representation at time t1 could be generalized to time t2. The analysis generated nRDMs trained at t1 and tested at t2 for all possible combinations of t1 and t2. For image decoding, we averaged the values within each nRDM, resulting in a temporal generalization matrix (TGM). In contrast to previous studies that only examined the TGM of image decoding without considering specific features or dimensions encoded in the images, we took a step further by segregating and identifying the unique pattern of TGM for each face dimension. This was achieved by calculating partial Spearman's ρ between nRDMs and model RDMs and assigning the values to the TGM for each face dimension. To reduce computational costs, we adopted a leave-one-out strategy for cross-validation, which was in line with a previous study (Cichy et al., 2014); also, this analysis was conducted on the OPM-MEG data resampled to 100 Hz (i.e., the size of TGM was 100 × 100). To further reduce noise, the original TGMs underwent smoothing with a 20 × 20 ms convolutional kernel.
Temporal structures of the neural representation for different face dimensions revealed by temporal generalization analysis. A, The pipeline of the temporal generalization analysis. Similar to the pipeline in Figure 1C, except that the SVM classifiers trained at a given time point were applied to all the time points, thus yielding a 100 × 100 temporal generation matrix. To obtain the TGM of the face image, we averaged the corresponding nRDM for each pair of time points. To obtain the TGM for each face dimension, we further computed partial Spearman's ρ between nRDMs and model RDMs. Note that as special cases, the diagonal line of each TGM is the average time course in Figure 2. Exemplar face stimuli are artificially generated. B, Schematic illustration of four models of generalization principles included in BMS. C–G, TGM for face image, age, expression, gender, and race, respectively. The x-axis shows the training time relative to the stimulus onset for SVM classifiers, while the y-axis indicates the testing time. Color bars indicate decoding accuracy (for face image decoding) or partial Spearman's ρ between nRDMs and model RDMs (for each face dimension). Horizontal and vertical dotted lines denote the stimulus onset. The significant clusters are outlined by black contours (cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05).
To identify significant clusters in each TGM, we conducted two-dimensional cluster–based permutation tests, which is a 2D extension of that used in the multivariate decoding analysis described above. Cluster-defining threshold and cluster-corrected significance level were both set to p = 0.05.
Bayesian model selection
In previous studies, TGMs were often qualitatively interpreted based on group averages, which could lead to inaccurate conclusions that overlooked individual differences. Additionally, the absence of statistical tests on TGMs hindered the direct comparison of results across different dimensions. To address these issues, we employed random-effect Bayesian model selection (RFX-BMS; Stephan et al., 2009) using functions adapted from the VBA toolbox (Daunizeau et al., 2014).
First, we created four regressors of interest, representing (1) sustained, (2) chain, (3) reactivated, and (4) oscillatory models, as outlined in previous literature (Fig. 3B; King and Dehaene, 2014). These models aimed to capture potential generalization principles underpinning TGMs. For all four generalization models, we defined starting and ending time points that positioned the hypothesized pattern of each model on TGMs. For chain, reactivated, and oscillatory models, we further defined the “thickness” of the pattern along the diagonal line, corresponding to the number of “neural generators.” We performed a grid search on these parameters by calculating correlation coefficients between generalization models and TGMs. The parameters yielding the maximum correlation were selected for each model. Subsequently, we fitted these four generalization models to TGMs of race, gender, age, and expression, respectively, producing estimations for log model evidence (LME). Using LME estimations, we conducted RFX-BMS to calculate exceedance probabilities (XP) and estimated model frequencies (MF). XP indicated the likelihood that a specific model accounted more for the neural patterns than other competing models, while MF demonstrated the proportion of individual patterns explained by the model. Finally, we tested MF against chance level (i.e., 0.25 for four competing models) and conducted a Z test on MF to ascertain whether a TGM exhibited a significant tendency to be explained by a specific generalization model.
Mutual interactions between face dimensions
To further investigate how the information of a given face dimension affects the perception of other dimensions, we employed time-resolved decoding, as described earlier, but focused on distinct subsets of the nRDMs. For example, to test whether the information of race affected the perception of gender (i.e., an effect of race on gender), we created the Asian and Caucasian subsets of nRDMs by selecting all elements corresponding to face pairs that were both Asians and Caucasians. Meanwhile, the same elements were selected from the gender model RDM. Then we calculated the correlation between each subset of nRDMs and that of the gender model RDM while partialling out the model RDMs of the other dimensions, which resulted in a time course of Spearman's ρ indicating the processing of gender while perceiving Asian or Caucasian faces. Finally, we tested whether there was any temporal cluster with a significant difference between these two time courses via cluster-based permutation tests (both cluster-defining threshold and cluster-corrected significance level were set to p = 0.05). A significant temporal cluster suggested that race information impacted the temporal dynamics of gender. This procedure was repeated to test the mutual interactions between the four face dimensions.
Test–retest reliability of OPM-MEG decoding results
We evaluated the test–retest reliability of OPM-MEG decoding results. Following the procedure described in the time-resolved multivariate decoding analysis, we trained classifiers using data from one session and tested them using data from the other (or the same) session. Given the two sessions recorded on the same day, there were four possible combinations of sessions (i.e., A-B, B-A, A-A, B-B) which resulted in four sets of results for decoding face image, race, gender, age, and expression, respectively. Permutation tests were applied to these four sets of time courses and the differences among them. We also calculated intraclass correlation (ICC), indicating the consistency across two sessions, in a time-by-time manner, which is a well-established metric for measuring test–retest reliability (Caceres et al., 2009; Ge et al., 2023).
Data and code accessibility
Raw OPM-MEG data from this study, as well as the codes used for carrying out the experiment, analyzing data, and reproducing related figures in the manuscript, are available in the following OpenNeuro (Markiewicz et al., 2021) repository: https://openneuro.org/datasets/ds005107. Our dataset adheres to the Brain Imaging Data Structure standard, ensuring compatibility and ease of use for neuroimaging researchers.
Results
Temporal dynamics of face perception unveiled by MVPA
Here we used time-resolved multivariate decoding (Fig. 1C for the pipeline) to decode the temporal dynamics of face perception across different dimensions. The nRDMs, composed of average accuracies for pairwise SVM classification (Fig. 1D), were first grand averaged at each time point and then averaged across participants. This process generated a time course of average decoding accuracies (Fig. 1E), reflecting the discriminability of all pairs of face stimuli at each time point, irrespective of any specific face dimension. Face image decoding accuracy was around the chance level (i.e., 50% for binary classification) until the 54 ms poststimulus onset (cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-wise p < 0.05; the same hereinafter); then it exhibited a steep ascent, reaching its peak at 107 ms (95% confidence interval across 2,000 bootstrap samples was 104–114 ms, denoted in square brackets in the subsequent text; cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-wise p < 0.05; the same hereinafter). Image decoding accuracy remained above the chance level for 746 ms in total. It is noteworthy that a second peak was observed at 280 ms, which might be attributed to the stimulus offset effects (Carlson et al., 2011).
Next, we conducted RSA to investigate how the perception of different face dimensions unfolded over time. Our findings revealed that the four face dimensions of interest exhibited unique temporal profiles (Fig. 2, Column 2). The age dimension displayed the most prominent and earliest effect, initiating at 56 ms, reaching its peak at 145 [114–150] ms poststimulus onset, and persisting for 504 ms, while the gender effect emerged at 63 ms, peaked at 176 [97–219] ms after the stimulus onset, and endured for 513 ms. Expression discrimination showed an onset latency of 76 ms, a peak latency of 122 [111–149] ms, and a duration of 465 ms. Lastly, neural dissimilarity due to different races was characterized by an onset latency of 124 ms, a peak latency of 191 [137–210] ms, and a duration of 260 ms. Note that although we used a zero-lag low–pass filter to avoid the shift of timings, the onset and duration of significant effects could still be affected by the decoding strength; thus they should be interpreted with caution. Moreover, we performed 2D MDS to visually represent the structures of nRDMs at the peak latency of each face dimension (Fig. 2, Column 3). Dimensions with stronger effects exhibited more distinguishable patterns between their two classes and vice versa. Besides, the individual decoding results showed a larger variance for dimensions with weaker effects (such as race).
Our findings are consistent with existing research. Notably, earlier studies also found that the processing of age information precedes that of gender (Dobs et al., 2019; Li et al., 2022) and expression (Li et al., 2022). The observation of a more pronounced effect of age compared with expression is also in line with prior research (Li et al., 2022). Interestingly, we observed that the emergence of the race effect was delayed (∼190 ms after the stimulus onset) compared with the other three dimensions we investigated. This might be linked to the N170 response which is a neural correlate of processing own- and other-race faces (Ofan et al., 2011; Chen et al., 2013).
Temporal generalization of neural representations of different face dimensions
Although MVPA probed the neural representation of a specific face dimension at individual time points, it lacked insight into the evolving dynamics of neural representation over time. To address this limitation, we conducted temporal generalization analysis (Fig. 3A for the pipeline) to uncover the temporal structures of different face dimensions and gain a better understanding of the temporal dynamics of neural representations over time (Sandhaeger and Siegel, 2023). This method involved training a decoder to distinguish between neural patterns associated with different conditions/classes (e.g., young and old faces for the age dimension) at a specific time point and applying it to the neural data at other time points (King and Dehaene, 2014). The resulting TGMs showed each classifier's performance across all time points, allowing us to assess the persistence or change in distinguishable neural representations over time.
We found a prominent diagonal pattern in the TGM of face image decoding, indicating that the neural representation used to distinguish different faces underwent constant updates as time progressed (Fig. 3C). Additionally, the temporal structures of neural representation for the four face dimensions revealed distinct patterns, as illustrated by the significance contours at the group level in Figure 3D–G. The TGM of expression displayed a neat, square-like pattern that endured for a relatively extended period poststimulus onset. The TGM of age exhibited a diagonal pattern despite occasional off-diagonal reactivations. However, the patterns for gender and race TGMs were less clear, possibly due to their lower decoding strength.
Moving beyond the qualitative interpretation based on the group-level results, our RFX-BMS provided a quantitative assessment of whether the TGMs across individual participants shared a common generalization model for each face dimension. Specifically, we observed a significant sustained pattern in the TGM of expression (MF = 0.95 ± 0.22; t(20) = 14.75; p < 10−4, one-tailed; see Table 1 for BMS fitting parameters of all generalization models and face dimensions). Besides, the TGM of race dimension also exhibited significant a sustained pattern (MF = 0.62 ± 0.50; t(20) = 3.39; p < 0.01, one-tailed). However, for the TGMs of age and gender, no significant generalization model was identified due to substantial variance among participants. Moreover, when comparing the XP, which quantified the extent to which TGMs were predominantly explained by a single generalization model, we discovered that the TGM of expression and race exhibited dominant sustained patterns (for expression, XP = 1.00; for race, XP = 0.97; both exceeding the 0.95 threshold). In the case of gender, the tendency of the sustained pattern (XP = 0.49) was compromised by the presence of a chain-like pattern in its TGM, although this effect was not statistically significant. For the age dimension, there was a tendency for sustained and reactivated patterns in the TGM, which was not significant, either.
Results of BMS for the temporal generalization patterns of four face dimensions
Our findings indicate that the whole-brain neural representation related to expression and race, captured by OPM-MEG, remained notably stable during face perception. Alternatively, this observation might suggest that their cognitive process involves the sustained activation of a specific brain network (see Discussion). Despite accumulating evidence showing that information related to expression is processed in various brain regions and at different times, our results suggest the existence of a generalizable representation that becomes observable across time at the whole-brain level. For the age and gender dimensions, although their TGMs seemed to exhibit specific patterns, conclusions should be drawn cautiously. The primary reason is the absence of a dominant pattern based on XP, signifying that no single model could comprehensively explain most individuals’ TGMs.
In brief, we revealed distinctive temporal structures for race, gender, age, and expression, among which the processing of expression and race exhibited a dominant sustained neural representation over time.
Temporal interactions between different face dimensions
Beyond uncovering the unique temporal patterns associated with various face dimensions during face perception, we explored the potential influence of one dimension on the temporal dynamics of another, i.e., temporal interactions between different face dimensions. It is noteworthy that previous studies primarily focused on the impact of familiarity on the temporal dynamics of other face dimensions (Ambrus et al., 2019; Dobs et al., 2019), leaving a gap in a thorough examination of temporal interactions among different face dimensions at the same time.
Specifically, following previous literature (Dobs et al., 2019), we conducted MVPA separately for the two classes of a given face dimension, which enables us to examine the influence of this dimension over the other dimensions by contrasting the decoding strength of its two classes. We observed a significantly higher decoding strength for gender in the Asian subset (own-race condition) compared with the Caucasian subset (other-race condition) during the 343–388 ms time window, suggesting a race effect on gender during face perception (Fig. 4D, Column 1; cluster-based permutation test, cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05; as denoted henceforth). Furthermore, our analysis revealed an effect of expression over race, as evidenced by the enhanced decoding performance for race in the neutral expression subset during the 227–298 ms period, in contrast to the joyful subset (Fig. 4B, Column 1). Additionally, neural representations of race were distinguishable between young and elderly faces at 203–257 ms, while the temporal dynamics of gender differed between young and elderly faces during the 179–230 and 378–428 ms interval (Fig. 4A, Column 2), which suggests an effect of age over race and gender separately.
Temporal interactions between different face dimensions. Rows (A–D) respectively represent the effect of age, expression, gender, and race on the other face dimensions. For each dimension of interest, time-resolved multivariate decoding analysis was conducted for the two classes of this dimension separately. Their differences reflect the influence of this dimension on the dimensions being decoded. In each subplot, the two classes in the dimension of interest are indicated by different colors, while the title indicates the dimension being decoded. The gray shadow around the average time course indicates standard error. The gray vertical line indicates the stimulus onset; gray horizontal line indicates null correlation. The black horizontal line above the x-axis shows significant time points for the face dimension of interest, as shown in Figure 2. The dark gray horizontal line above the x-axis shows time points when a significant difference is found between the two classes of the dimension of interest, while colored horizontal lines above the x-axis show significant time points for each of its two classes (cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05).
These findings align well with existing literature. Specifically, the better decoding performance for the own-race condition compared with the other-race condition suggests enhanced processing of gender information in faces of the own race, which could be attributed to the own-race bias (Brigham and Malpass, 1985; Meissner and Brigham, 2001). The effect of race over gender emerged ∼400 ms after the stimulus onset, which is consistent with the timing of the N400 response associated with the own-race bias (Willadsen-Jensen and Ito, 2008; Tanaka and Pierce, 2009; Proverbio et al., 2020).
Similarly, the improved performance of gender decoding was observed in the young subset compared with the elderly subset. This could be explained by the own-age effect, suggesting that individuals are more proficient at recognizing faces of a similar age (Perfect and Moon, 2005; M. G. Rhodes and Anastasi, 2012), given that all participants in this experiment were young adults. The temporal dynamics of “age effects” on gender in our work (179–230 and 378–428 ms) had comparable timing with those reported in previous studies (Wiese et al., 2008), which had evidenced an own-age bias in face memory using ERPs (in young participants, the right occipitotemporal N250 and the centroparietal old/new recognition effect at ∼ 400 ms). Also, the period of the aforementioned N250 fell in the interval of age effects on race in our study (203–257 ms), although with a reversed effect. The effect of age over gender might be caused by the degendering of elderly faces, where older faces tend to appear more gender-neutral, as evidenced in behavioral experiments (Quinn and Macrae, 2005; Fitousi, 2021).
To conclude, we systematically examined whether and when the temporal dynamics of a given face dimension differed between the two subsets of another face dimension, reflecting its influence over the processing of other face dimensions. Our results complement previous studies on the interaction between face dimensions by highlighting the mutual interactions between race, gender, age, and expression in a complex face coding system over time.
Test–retest reliability of OPM-MEG decoding results
OPM-MEG has demonstrated its emerging potential in cognitive neurosciences and is considered a next-generation methodology for functional neuroimaging (Boto et al., 2018; Qin and Gao, 2021; Brookes et al., 2022). This is supported by literature comparing the performance of OPM-MEG with SQUID-MEG (Hill et al., 2020; Marhl et al., 2022; Iivanainen et al., 2023) and studies replicating classical experiments (Tierney et al., 2018; Barry et al., 2019; Lin et al., 2019; Bénar et al., 2021; Ru et al., 2022; N. Rhodes et al., 2023). However, there has been limited focus in previous literature on its test–retest reliability (Rier et al., 2023), i.e., whether OPM-MEG provides robust results in studying human cognition. Here, we assessed the test–retest reliability of OPM-MEG decoding results in a split-half manner. Decoders were trained in one session and tested in another (or the same) session, allowing us to evaluate the extent to which the identical information decodable for face perception is preserved in the OPM-MEG data recorded in two sessions.
In Figure 5A–E, we showed the temporal dynamics of face image decoding and each face dimension across four test–retest conditions (2 training sessions × 2 test sessions). We observed similar results across the four conditions and did not find any significantly different temporal clusters between the time courses of any two conditions (cluster-based permutation test, cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05). The temporal profiles with higher decoding strength (e.g., face image, age dimension) exhibited relatively more overlap across sessions, and vice versa. These findings suggest that the decoding results from OPM recordings remained consistent across sessions.
Test–retest reliability of OPM-MEG decoding results. A–E, Temporal dynamics of four face dimensions under four test–retest conditions, similar to the pipeline in Figure 1, except that SVM classifiers were trained using data from Sessions A or B and were tested using data from either session, respectively. Colored horizontal lines above the x-axis indicate significant time points (cluster-based permutation test; cluster-defining threshold p < 0.05; cluster-corrected significance level p < 0.05), with colors representing different train–test combinations. The gray vertical line indicates the stimulus onset; gray horizontal line indicates the chance level of decoding accuracy (i.e., 50%, for image decoding) or correlation coefficients (i.e., 0, for face dimensions). For face image decoding and individual face dimensions, no significant differences between any pair of test–retest conditions were found. F, ICC of nRDMs across participants at each time point. The gray shadow indicates the standard error. The gray vertical line indicates the stimulus onset.
Additionally, we assessed test–retest reliability at the level of neural representation by calculating the ICC of nRDMs among the four test–retest conditions over time (Fig. 5F), which reached a relatively high peak of 0.78 at 116 ms after the stimulus onset. Interestingly, the baseline period (i.e., −200 to 0 ms relative to the stimulus onset), even without face stimuli presented, showed an average ICC of 0.48, which might be due to the specific neural patterns of individual participants irrelevant to the face stimuli.
To conclude, our findings demonstrate that the decoding results from OPM-MEG exhibit high test–retest reliability across different experimental sessions, indicating the potential of using MVPA on OPM-MEG to explore the neural dynamics of human cognition.
Discussion
In this study, we investigate two fundamental aspects of rapid face processing in the brain using OPM-MEG: (1) the temporal structures of individual face dimensions and (2) their mutual interactions during face perception. Our findings reveal early processing of information related to age, expression, gender, and race, emerging within the first 200 ms post the onset of face stimuli. These four face dimensions are characterized by distinct neural dynamics and different neural patterns generalized over time. Particularly, we find a sustained pattern for the neural representation of expression once activated, which lasts for ∼500 ms. As for the temporal interactions between face dimensions, we observe that expression and age can modulate the temporal dynamics of race perception, while the temporal profiles of gender can be influenced by race and age. Furthermore, we assess the test–retest reliability of the decoding results, offering robust evidence for the reliability of using OPM-MEG to explore complex cognitive processes.
The temporal dynamics of face perception, as revealed by the time-resolved decoding approach, are following existing research. For example, we find facial information could be detectable as early as 54 ms after stimulus onset, consistent with a study that finds human fusiform gyrus activities decodable as early as 50–75 ms (Ghuman et al., 2014). Our results indicating that race information is processed at a later stage might be linked to the observation that the effect of own-race discrimination also occurs relatively late since the Asian participants in this study could be more familiar with Asian faces compared with Caucasian faces. We also found expression decoding had an onset (76 ms) comparable with the fast amygdala response which begins at 74 ms after stimulus presentation (Mendez-Bertolo et al., 2016). Moreover, previous studies have consistently reported that age information is processed earlier than expression (Li et al., 2022) and gender (Dobs et al., 2019), a pattern that we have also observed. It is noteworthy that the onsets of face dimensions should be interpreted with caution, as some of them are earlier than previous electrophysiological results (e.g., M100; Liu et al., 2002). This may be due to the presence of low-level features related to specific face dimensions which could not be fully captured by the excluded low-level visual feature models. It might be also attributed to the fact that stronger decoding signals yield relatively earlier onset (Grootswagers et al., 2017) or the usage of low-pass filters (VanRullen, 2011). Hence, it might be more robust to index facial information processing with peak latency.
Interestingly, in our work, gender showed a relatively weak effect among the dimensions examined, with an earlier onset at 63 ms and a delayed peak at 176 ms, compared with a previous study on face perception (Dobs et al., 2019). These different results might be attributed to the differences between the face stimuli used in the two studies. In Dobs et al. (2019), half of the stimuli were celebrities familiar to the participants. Crucially, their results showed that familiarity significantly influenced the decoding of gender, i.e., the decoding time course of gender in familiar faces showed significantly higher decoding strength and a steeper decoding time course. While in our study, all face stimuli were artificially generated faces and unfamiliar to participants. Thus, the differences in the familiarity of face stimuli might be a possible reason for the different gender decoding results.
Although previous studies on the spatiotemporal dynamics of face perception have addressed “when” and “where” face stimulus is processed in the brain (Bentin et al., 1996; Kanwisher et al., 1997; Seeck et al., 1997; Grill-Spector et al., 2004), our study contributes a unique perspective by delving into the temporal structure of face processing, which offers a novel way to understand how a certain kind of information is manipulated and transformed in the brain (King and Dehaene, 2014). Besides, in contrast to traditional activation-based interpretations in electrophysiological studies, our MVPA approach adopts an information-based stance, providing richer insights and complementing earlier literature that predominantly focused on potential/field amplitudes (Grootswagers et al., 2017; Hebart and Baker, 2018). Furthermore, our work incorporates BMS to quantitatively measure the contributions of four predefined models of temporal generalization principles to each face dimension. Although we did not find a dominant pattern for the generalization of age and gender over time (possibly due to the relatively large variance among participants), they tended to show reactivated and chain patterns, respectively, at the group level (Fig. 2, Column 4). In contrast, both expression and race exhibited a significant sustained temporal generalization pattern. This may indicate that the neural codes of expression and race are represented in a persistent form on the scale of whole-brain activity during the first few hundred milliseconds of face perception; alternatively, it may suggest a single sustained brain network recruited and sustained during the generalization time window (King et al., 2014). Future work is expected to explore the local temporal generalization patterns in more specific brain regions (e.g., FFA) to uncover the more nuanced temporal structures of various facial dimensions. Also, it is worth exploring the neural substrates underlying various generalization patterns identified by the data-driven temporal generalization approach.
We leverage the high sensitivity of multivariate decoding to directly investigate the temporal interactions among different face dimensions, which differs from relevant ERP/ERF studies (Ito and Urland, 2005; Wild-Wall et al., 2008; Melinder et al., 2010; Wiese, 2012). We show that the decoding performance for gender is significantly higher in the Asian (own-race) condition at ∼400 ms after the stimulus onset. This could be explained by the own-race bias and indexed by the N400 (Willadsen-Jensen and Ito, 2008), supplementing the previous literature which reports a race effect on gender categorization peaking earlier, at 150–200 ms (i.e., P2; T. Zhang et al., 2023a). Additionally, we observe a significant distinction in gender decoding performance between young and old faces at ∼200 and 400 ms, such an age effect on gender is consistent with a previous study showing an effect ∼200–400 ms during explicit age and gender discrimination (Mouchetant-Rostaing and Giard, 2003). Besides, we identify an effect of expression on race, which occurred at ∼220 ms. This might be indexed by P200, implying interdependent processing of these two dimensions in a social context (Tortosa et al., 2013). Notably, through the multivariate decoding approach, we discover asymmetric interactions between face dimensions, such as the effect of race over dynamics of gender at ∼340 ms, while no reverse effect is observed. This enriches conventional ERP/ERF studies that typically infer the nondirectional interaction between two factors based on amplitude.
Despite the rapid development of OPM-MEG and its applications in the field of cognitive neuroscience (Brookes et al., 2022), it is still in its infancy. The integration of this technique with research on human cognition has yet to be fully realized because it faces various challenges in practice, such as interference suppression (Seymour et al., 2022). In this study, we took the pioneering step of leveraging the benefits of OPM-MEG's high temporal resolution and improved signal strength to systematically address critical issues in face perception. Our work demonstrates highly test–retestable decoding results based on OPM-MEG signals, suggesting a promising future for integrating high-density OPM-MEG with machine learning or deep learning methodologies to facilitate the flexible application of OPM-MEG in different topics of cognitive neuroscience and brain–computer interface based on wearable OPM-MEG.
Finally, in this study, our focus is to investigate the temporal structure of neural representations for different face dimensions at the whole-brain level from all OPM channels. Future studies could enhance spatial precision by employing temporal generalization within specific face-selective regions following MEG source localization (Z. Zhang et al., 2023b). Besides, the current study aims to investigate automatic face perception, with a one-back task that only requires the individuation of the faces. This can be extended by manipulating social task demands or contexts, which are crucial for the social categorization of faces (T. Zhang et al., 2023a). Lastly, given the importance of diversity in research samples for enhancing the generalizability of findings across populations, future work could be extended to participants of various races to replicate the findings in the current study.
In summary, our study seamlessly integrates multivariate decoding with OPM-MEG, shedding light on the temporal structures of neural representations for various face dimensions during rapid face perception. This approach provides a comprehensive understanding of how various face dimensions mutually influence each other. Furthermore, our investigation into the test–retest reliability of OPM-MEG underscores its potential for the exploration of the intricate neural basis of higher-level human cognition.
Footnotes
This work was financially supported by the National Natural Science Foundation of China (81727808), the National Science and Technology Innovation 2030 Major Program (2021ZD0200506, 2021ZD0200500, and 2022ZD0206000), Changping Laboratory, and the Collaborative Research Fund of the Chinese Institute for Brain Research, Beijing (No. 2020-NKX-PT-02). We also thank the National Center for Protein Sciences at Peking University in Beijing, China, for the assistance on data acquisition.
↵*W.X. and B.L. contributed equally to this work.
The authors declare no competing interests.
- Correspondence should be addressed to Jia-Hong Gao at jgao{at}pku.edu.cn.