Abstract
Humans can recognize their whole-body movements even when displayed as dynamic dot patterns. The sparse depiction of whole-body movements, coupled with a lack of visual experience watching ourselves in the world, has long implicated nonvisual mechanisms to self-action recognition. Using general linear modeling and multivariate analyses on human brain imaging data from male and female participants, we aimed to identify the neural systems for this ability. First, we found that cortical areas linked to motor processes, including frontoparietal and primary somatomotor cortices, exhibit greater engagement and functional connectivity when recognizing self-generated versus other-generated actions. Next, we show that these regions encode self-identity based on motor familiarity, even after regressing out idiosyncratic visual cues using multiple regression representational similarity analysis. Last, we found the reverse pattern for unfamiliar individuals: encoding localized to occipitotemporal visual regions. These findings suggest that self-awareness from actions emerges from the interplay of motor and visual processes.
Significance Statement
We report that self-recognition from visual observation of our whole-body actions implicates brain regions associated with motor processes. On functional neuroimaging data, we found greater activity and unique representational patterns in brain areas and networks linked to motor processes when viewing our own actions relative to viewing the actions of others. These findings highlight an important role of motor mechanisms in differentiating the self from others.
Introduction
Self-recognition is possible even from visually minimalistic dot displays (Johansson, 1973; Cutting and Kozlowski, 1977; Loula et al., 2005). These displays, called point-light displays (PLDs), depict whole-body actions with around a dozen moving dots (Johansson, 1973; Cutting and Kozlowski, 1977; Loula et al., 2005). While glimpses of our whole bodies may be captured in videos or glass mirrors, they are far less observable than the rich visual experiences we have seeing movements of close friends or family members. Yet, humans recognize their own movements better than familiar others in PLDs (Beardsworth and Buckner, 1981; Loula et al., 2005). This self-recognition advantage persists across viewpoints (Jokisch et al., 2006; Prasad and Shiffrar, 2009), task judgments (Knoblich and Flach, 2001; Bischoff et al., 2012), body parts (Daprati and Sirigu, 2002; Frassinetti et al., 2009; Conson et al., 2010), action types (Burling et al., 2019; Kadambi et al., 2024), suggesting that self-action recognition relies on modalities more than vision alone. Despite consistent behavioral evidence, the neural mechanisms remain untested, representing a crucial gap in understanding human self-awareness.
Neuroimaging studies in visual neuroscience often omit the self and focus on the neural mechanisms coding other people's actions (Astafiev et al., 2004; Vangeneugden et al., 2014; Lingnau and Downing, 2015). These studies show that action recognition of others engages a distributed network of cortical areas, termed action observation network (AON). This network consists of occipitotemporal [OT; posterior superior temporal sulcus (pSTS), extrastriate body area (EBA), fusiform gyri] and frontoparietal circuits engaged during action production, including inferior parietal lobe (IPL), premotor cortex (PM), inferior frontal cortex (IFC), and supplementary motor area (SMA). The crucial connection between OT and frontoparietal regions is via pSTS→IPL direct connections, bridging action recognition via visual processing with cognitive theories of action simulation (Grèzes et al., 2003; Ürgen et al., 2019).
While OT regions encode actions irrespective of identity, frontoparietal and somatomotor regions may be critical for self-recognition (Uddin et al., 2005). These regions are attributed action simulation, or mirroring, functions that map observed actions onto one's own motor system. For instance, spiking activity from single and multiunits recorded first in frontoparietal regions in macaques (Di Pellegrino et al., 1992; Fogassi et al., 2005) and later in medial frontal cortex (likely pre-SMA) in humans (Mukamel et al., 2010) during action observation show similar activity during action production. This correspondence in spiking activity is further seen with systems-level activity in these regions during brain imaging and is modulated by the observer's motor familiarity with the action (Rizzolatti et al., 1996; Rizzolatti and Craighero, 2004; Calvo-Merino et al., 2006; Iacoboni, 2009). Since self-generated actions are most motorically familiar, this could be one mechanism to help differentiate self and other actions.
To date, only a few neuroimaging studies have investigated self-action recognition from PLDs, despite the fact that the self is considered a moving and acting agent (Van den bos and Jeannerod, 2002). These studies support the involvement of frontoparietal regions, but used isolated body parts (Macuga and Frey, 2011; Bischoff et al., 2012) or actions that were not self-generated, but associated with self-identity (Woźniak et al., 2022). Hence, the neural mechanisms supporting self-recognition of whole-body actions remain untested. Moreover, beyond regional univariate activity, representational markers are needed to elucidate the featural space supporting self-recognition. Representational similarity analysis (RSA; Kriegeskorte et al., 2008) can be a viable tool to localize and infer the type of information encoded in neural activity patterns.
In the present study, we asked the following: what is the neural basis for self-recognition from whole-body actions? Does self-action recognition rely more on motor mechanisms, even after accounting for distinctive visual features of the actions, as compared with other identities? To address these questions, we conducted a multimodal imaging study across two sessions. In Session 1, we motion captured a range of actions of participants and their close friend of the same sex. These actions were performed using both visual instruction (imitation) and verbal instruction (freely performed). After a delay period, participants returned in Session 2 for fMRI where they underwent an identity recognition task on PLDs of themselves, friends, and strangers.
We hypothesized that the involvement of AON would be involved during action observation for all identities (self, friends, strangers), encoded in occipitotemporal regions. However, we expected that frontoparietal regions associated with motor processes would greater engage for the self, controlling for visual familiarity (friend) and person identity (stranger). Moreover, if these regions encode motor information to achieve self-recognition, then we expected that activity patterns in frontoparietal and motor regions would relate to motor familiarity with actions, captured over and above visual feature contributions.
Materials and Methods
Participants
Twenty right-handed undergraduate participants (Mage = 20.55; SDage = 1.73; females, 12; males, 8) were recruited from around the University of California, Los Angeles area using convenience sampling. All participants were provided payment for their participation. Sample size was based on prior fMRI studies most similar to ours using biological motion (Saygin et al., 2004; Engelen et al., 2015; Chang et al., 2021) and self-generated point-light displays (Bischoff et al., 2012). The study was approved by the UCLA Institutional Review Board. All participants were naive to the purpose of the study. Participants had normal or corrected-to-normal vision and no physical disabilities.
Apparatus
The Microsoft Kinect V2.0 and Kinect SDK were used for motion capture of actions, as in previous studies on self-action recognition (Burling et al., 2019; Kadambi et al., 2024). Customized software developed in our lab was used to enhance movement signals and to carry out additional processing and trimming for actions presented later in the testing phase (Van Boxtel and Lu, 2013). Three-dimensional (x–y–z) coordinates of the key joints were extracted at a rate of ∼33 frames per second. Each action was trimmed to the start and stop of a T-position signaled by the participant and normalized to scale for use in the experimental task. Note that while motion capture accuracy was high, the Kinect occasionally produced noise jittering in the stimuli, where frame-to-frame joints positions occasionally showed sudden jumps in position. Hence, to remove noisy frame-to-frame jitter, we impinged a manual correction for certain frames (i.e., replacing with the closest previous frame where the jitter was not present).
Stimuli
Twelve actions were selected from our previous work on self-action recognition (Burling et al., 2019; Kadambi and Lu, 2019; Kadambi et al., 2024). These actions conveyed a range of variability in terms of action planning. Six of the actions (i.e., argue, wash windows, get attention, hurry up, stretch, and play guitar) were categorized as “verbally instructed actions,” delineated by a high degree of motor goal complexity as defined in our previous work (Burling et al., 2019; Kadambi et al., 2024). These actions were verbally instructed to the participant (e.g., please perform the action: “to argue”). The remaining six actions were visually instructed (imitation) actions, depicting a range of simple and complex goals (i.e., jumping jacks, basketball, digging, chopping, laughing, directing traffic). For these actions, participants observed a stick figure performing an action without any verbal label provided and were then instructed to “imitate the movements of the action.” These stick figure actions were selected from the Carnegie Mellon Graphics (CMU) Lab Motion Capture Database available online (http://mocap.cs.cmu.edu), generated from actors whose motions were already precaptured. PLDs were thus created using the above method for each participant, a sex-matched friend, and a sex-matched stranger. The stranger's action was randomly selected from one of three possible distractors for each sex (six total), precaptured from actions of two of the experimenters and research assistants. The categorization of the action types, in addition to providing variability of the action goal, allowed us to further explore secondary analyses contrasting actions involving less motor familiarity due to copying someone else's motor plan (visual instruction) versus actions that involved more motor familiarity due to freely performing the action (verbal instruction).
Procedure
Behavioral session
In the first session, participants' body movements were recorded using the Microsoft Kinect V2.0 and Kinect SDK in a quiet testing room. Participants were instructed to perform the actions in a rectangular space, in order to provide flexibility in how to perform the action, while remaining within recording distance. The Kinect was placed 1.5 m above the floor and 2.59 m away from the participant. Participants were instructed to naturalistically perform 12 actions as described above and recorded by our motion capture system. Participants signaled the start and stop of action performance by performing an outstretched T-pose with their arms. Participant actions were then recorded and converted to point-light stimuli for use in the fMRI session.
Each of the 20 participants also brought a close friend of the same sex, who was also separately recorded with the same paradigm. None of the participants were informed about the study's purpose on self-recognition but were informed that this study was about general visual action processing. We used the recordings of the close friend in the fMRI session to assess the impact of visual familiarity. After the recording session, participants completed a few attitudinal questionnaires including the Autism-Spectrum Quotient (AQ; Baron-Cohen et al., 2001), Schizotypal Personality Questionnaire (SPQ; Raine, 1991), and Vividness of Motor Imagery-2 (VMIQ-2; Roberts et al., 2008). These questionnaires were selected since they measure motor simulation ability (VMIQ-2) or disturbances in sensorimotor self-recognition (SPQ, AQ).
fMRI session
After a delay period of 2–3 weeks (mean delay days = 18.55; SD = 2.87), participants returned for fMRI brain imaging in Session 2 (Fig. 1, trial structure). During brain imaging, participants passively observed a point-light display consisting of 25 joints. These joints included the head (head, neck, clavicle; 3 dots), arm (biceps, elbows, wrists; 6 dots), hands (fingers; 6 dots), stomach (1 dot), hips (3 dots), knees (2 dots), and leg (shin, feet; 4 dots). Each point-light display either showed their own action (self), same-sex familiar friend, or same-sex stranger action for a 5 s duration. The same-sex stranger was selected at random (out of two options) between participants. After the stranger was selected, this stranger was consistently used for all actions involved in the experiment for this participant. Following the 5 s observation of the action, participants were prompted to identify on the next screen whether the action video shown was their own, friend, or stranger within a 2 s maximum response period. Participants responded with their right hand by pressing one of three keys, having the index finger on the first, the middle finger on the second, and the ring finger on the third key. One identity was assigned to each key, and identity–key mapping was counterbalanced across subjects. Participants' response was followed by jittered intertrial intervals (ITIs) mean centered at 5 s. There were four runs per participant, each consisting of 36 trials (12 trials per identity condition) in an event-related design. For each run, experimental conditions were pseudorandomized to reduce stimulus autocorrelation related to order and sequence effects as well as correlated noise, such as scanner drift. Response mapping of self/friend/stranger was randomized between participants to reduce effects of trial structure or motor preparation and planning demands. Duration of the experimental task during functional brain imaging was ∼24 min. Total brain imaging duration lasted ∼45 min.
Trial structure including timing. Participants centrally attended to a white fixation cross until the action (self/friend/other) appeared for 5 s. On a subsequent screen, participants were then provided 2 s to make their identity judgment, followed by the variable ITI (mean centered at 5 s). The response order of self, friend, and other was counterbalanced in order to reduce any impact of motor order.
Experimental design and statistical analysis
MRI acquisition
The Siemens 3 Tesla Prisma Fit scanner at the Staglin IMHRO Center for Cognitive Neuroscience was used for magnetic resonance imaging, equipped with a 32-channel head coil. Structural data was acquired using a T1-weighted MPRAGE protocol (1.0 mm3 resolution; repetition time, 2,000 ms). Functional data was acquired utilizing T2*-weighted gradient recall echo sequence. Scanning parameters for the main task included the following: repetition time, 700 ms; echo time, 33 ms; voxel size, 2.5 mm3 voxels; field of view, 192 mm; and flip angle, 70°. Four dummy scans were acquired and discarded before each scan to account for scanner stabilization. Participants viewed the stimuli presented on a projector through a mirror mounted on the head cover in the scanner. Participants underwent four runs of 36 trials each. Each run lasted ∼360 s.
Imaging analyses
Univariate analysis
Statistical analyses were conducted using FEAT (FMRI Expert Analysis Tool) version 6.00, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl) using the GLM approach. Individual functional scans were coregistered to the high-resolution structural image using boundary-based registration (Greve and Fischl, 2009). Registration of the high-resolution structural scan to the Montreal Neurological Institute (MNI) template was implemented using FSL's FLIRT (Jenkinson and Smith, 2001; Jenkinson et al., 2002) with 12-parameter DOF affine transformation. The following preprocessing steps were applied: motion correction using MCFLIRT (Jenkinson et al., 2002); slice-timing correction using Fourier-space time series phase-shifting; nonbrain tissue removal using BET (Smith, 2002); spatial smoothing using a Gaussian kernel of FWHM 5 mm; grand-mean scaling of the entire 4D dataset by a single multiplicative factor; and high-pass temporal filtering (Gaussian-weighted least squares straight line fitting, with sigma = 50.0 s). Regressors were defined based on the onsets and durations of the three identities (self, friend, stranger) across all actions. Individual runs were aggregated into a mixed effects higher-level model using FLAME (FMRIB's Local Analysis of Mixed Effects) stage 1 and stage 2 (Beckmann et al., 2003; Woolrich et al., 2004, 2009) for both within-session single subject variance and between-session group-level variance. Significance testing on the statistical parametric maps was then assessed at the group level using two approaches in FSL: (1) randomise with threshold-free cluster enhancement (TFCE) and p < 0.05 FWE-corrected (Smith and Nichols, 2009; Winkler et al., 2014), TFCE-p threshold = 0.05 and (2) random-field (RFT) based thresholding at Z > 3.1, cluster corrected to a significance level of p < 0.05 (Worsley 2001). Randomise served as our main approach to significance testing given its more conservative, specific, and sensitive significance criteria (Smith and Nichols, 2009). All figures and tables generated from the parametric RFT analysis are reported in Extended Data Figures 5-2, 5-4, and 5-5. Conjunction analysis to localize self-specific activity was also implemented in FSL using the easythresh_conj script (easythresh_conj) on univariate activation maps for both self > stranger and self > friend contrasts (Price and Friston, 1997; Nichols et al., 2005). The conjunction specifically tested the “conjunction null hypothesis” as to whether both conditions showed significant functional activation (Z > 3.1; p < 0.05), which were later used as seed regions in the connectivity analyses.
Functional connectivity: psychophysiological interaction
To identify a neural circuitry prioritized for self-processing, we implemented psychophysiological interaction (PPI; Friston et al., 1997) to assess task-specific changes in functional connectivity. PPI examines how the relationships between a seed region and voxels in other brain regions are modulated by the psychological state of the participant (task dependent). The degree to which the seed regions and sink (other brain regions) vary as a function of the task is measured by testing the significance of the β coefficient of the interaction computed between the experimental contrast vector and the sink region. As our analyses focused on identifying a self-action circuitry, we constrained our seeds to those determined by group-level functional activations in separate GLMs for the self (i.e., self > stranger or self > friend contrasts). We used a conjunction analysis implemented in FSL using the easythresh_conj script (easythresh_conj) on univariate activation maps for both self > stranger and self > friend contrasts. The seed region in the left IPL was generated from creating a sphere (8 mm radius) around the peak functional activation for the conjunction of the self > stranger and self > friend contrasts (centered at peak center of gravity, x, y, z = −56, −44, 42). We initially focused on the IPL in the left hemisphere, since the TFCE thresholding only produced left hemispheric activity in the IPL. However, to more comprehensively investigate IPL involvement during self-processing, we also conducted PPI with the right hemisphere IPL seed. The seed regions were each defined in standard space and resampled to match 2.5 mm isotropic voxel resolution. The resampled masks were then inversely transformed to native space, applied with nearest neighbor interpolation. Time courses in the seed region were extracted using fslmeants (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Fslutils), which generated a vector of mean activity in the mask for each volume. This time course was then entered as the ROI time series regressor into the PPI GLM. Thus, the full GLM consisted of the interaction vector (PPI regressor), the main effects of the contrasts of interest (the psychological variables), and a vector representing the seed region time course (the physiological variable, Y regressor). At the group level, statistical parametric maps for the interaction term were thresholded (Z > 2.3; p < 0.01) to compute significance of the interaction term.
Representational similarity analysis
Whole-brain representational dissimilarity analysis (RDA; Kriegeskorte et al., 2008; Haxby et al., 2014) was implemented using the CoSMoMVPA toolbox (http://www.cosmomvpa.org/; Oosterhof et al., 2016) and custom MATLAB scripts (R2020a). Regressors were defined based on the onsets and durations of the three experimental conditions (self actions, friend actions, or stranger actions) during the action observation period of the task. Using the Least-Squares Separate approach, β-series parameter estimates (Rissman et al., 2004; Mumford et al., 2012) were iteratively estimated per trial by modeling a regressor for the event of interest in the trial and a regressor for all other events within the run. Standard motion parameters were also included as regressors in each GLM. Preprocessing was identical to the univariate analysis, but no smoothing was applied. We generated multiple target representational dissimilarity matrices (RDMs) based on differences related to spatiotemporal movement distinctiveness (dynamic time warping, DTW), speed, acceleration, jerk, body structure consisting of limb segment length, and a theoretical RDM based on proprioceptive familiarity. To generate neural RDMs for each participant, we extracted 36 beta weights for each run, normalized each beta weight within run, computed the average for each of the 36 action targets across all runs, and then demeaned the data (i.e., subtraction of the grand mean of all averaged targets from each averaged target). All RDMs (behavioral, theoretical, and neural) were square, symmetric, and reflected the pairwise dissimilarity between each element in the matrix. Each RDM (proprioceptive familiarity, identity, movement distinctiveness, speed, acceleration, jerk, body structure) was either correlated separately with neural activity (standard RDA) or entered as input into a multiple regression RDA with other RDMs. The RDMs in the multiple regression analysis included a subset of the prior RDMs: proprioceptive familiarity and identity (self, friend, or stranger), visual feature-based models related to movement distinctiveness (DTW), and speed. Each RDM was z-transformed prior to estimating the regression coefficients in the multiple regression analysis.
For the whole-brain searchlight RDA, each searchlight window was defined by a Gaussian sphere of 2 mm radius. Each spherical searchlight included every voxel in the brain, along with neighboring voxels within the window. The standard searchlight RDA was implemented through correlating the target RDM with neural RDM in each searchlight across the whole brain. The correlations were then Fisher z-transformed and mapped to the center of each searchlight to create individual similarity maps in native space as inputs to the higher-level nonparametric analyses. For the multiple regression searchlight RDA, a multiple regression analysis was conducted in each searchlight across the whole brain. For each participant in native space, the betas were mapped to the center of each searchlight to create individual similarity maps for each predictor as inputs to the higher-level nonparametric analyses. All individual maps were normalized to the MNI-152 template using FSL's FLIRT functionality (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT) using trilinear interpolation for group analysis. One-sample t tests were computed at the group level, correcting for multiple comparisons using permutation-based TFCE with a corrected threshold of p < 0.01 (Smith and Nichols, 2009) with 10,000 Monte Carlo simulations.
Target RDMs
Shown in Figure 2, we constructed the following RDMs used as predictors for both standard and multiple regression RDAs:
Left panel, Representational (dis)similarity matrices (RDMs) used for each RDA averaged across participants. RDMs reflect the Euclidean distance between identity and action categories for speed, movement distinctiveness, and body structure. For motor familiarity, identity was based on the degree of motor dissimilarity to oneself (self-generated actions, i.e., verbal instruction: zero dissimilarity; self-imitated actions, i.e., visual instruction: small dissimilarity, 0.3; friend actions: medium dissimilarity, 0.6; strangers: most dissimilarity, 1). Brighter colors for all RDMs indicate more dissimilarity. Top right panel, Upper triangular pairwise dissimilarity (1, Spearman's rho) between each of the group-level RDMs. Brighter colors indicate more dissimilarity. Bottom right panel, DTW figure showing movement trajectory of one joint from one actor's action time series (shown as red dots indicating locations) with lines measuring similarity to the corresponding joint in another actor's time series (shown as green dots) to find the optimal decrease in dissimilarity over time.
Movement distinctiveness
The behavioral RDM for movement distinctiveness was generated using the DTW algorithm to compare trajectory differences between a pair of actions. DTW measures the pairwise movement dissimilarity between action time series via an alignment procedure that accounts for variability in time series length or duration. DTW aims to find the lowest cost function (warping path) between pairwise action time series that stretches or shrinks the time series to reflect warped distances. Greater DTW values indicate greater movement dissimilarity between joint trajectories. A 36 × 36 RDM was created for each participant that computed the pairwise DTW dissimilarity between each of the 12 actions across each identity (self, friend, stranger). The following steps were implemented for DTW analysis in MATLAB R2020a:
For each participant's actions, 3D positions of each of the 25 joints were extracted using the BioMotion toolbox (van Boxtel and Lu, 2013).
Each joint trajectory was centered to zero in order to remove the impact of global factors (e.g., global body displacements, limb length, etc.) on the similarity measures.
The action DTW algorithm (Pham et al., 2014) was implemented to search for a temporal warping function shared across all 25 joints.
After deriving the optimal warping function, the analysis computed the frame-by-frame Euclidean distances of the temporally warped joint trajectories in actions performed by different actors.
DTW distance was computed as the sum of the distances between all joint trajectories normalized by the number of frames of a target actor. This normalization step is required to account for the different durations across participants performing the same action.
For each participant, the dissimilarity of the target participant performing an action from all other identities was captured by a mean DTW distance measure, computed by averaging across pairwise DTW distances between the target participant with the other actors (friend, stranger) in performing this action to construct the 36 × 36 RDM.
Speed, acceleration, and jerk differences
To measure the contribution of movement speed to self-recognition, we calculated a speed distinctiveness value for every participant's individual action in MATLAB R2020a. For each action, we computed the average 3D positional displacement across all frames and all 25 joints (using the first-order derivative of position) extracted from BioMotion Toolbox (Van Boxtel and Lu, 2013). We then computed the average pairwise Euclidean distance to all other identities and actions as a measure of speed distinctiveness to construct the 36 × 36 RDM. Acceleration and jerk were identically computed, though taking the first and second derivative of speed, respectively.
Body structure (postural limb length)
The body structure RDA was computed based on the limb length of each of the 24 limbs (for 25 joints) of the PLD. Limb length was computed using the 3D Euclidean distance between pairs of joints that made up each limb in the PLD. Pairwise absolute value dissimilarities were then calculated across participants for each limb and averaged together across all limbs to comprise the 36 × 36 target RDM.
Motor familiarity
We computed a simple theoretical RDM based on the theorized motor familiarity between each of the identities. This was based on common coding theory (Prinz, 1997), which posits a common representational platform and shared overlap between visual and motor codes. Thus, identity for the self was coded as 0 (most familiarity due to prior motor experience; least dissimilarity). We coded friend as 0.6 to capture the low-medium level of familiarity, since participants had a high degree of visual familiarity with the friends' actions, translating to a small degree of motor familiarity. Note that the specific value of 0.6 was not critical, as the main findings (as described in the results section) remained for a range of possible values. Since common coding theory posits shared or overlapping visual and motor codes, repeated visual exposure to friends' actions could establish partial motor simulation, where repeated observation of common movements of familiar friends activates motor circuits even without direct execution of those actions (Rizzolatti and Craighero, 2004; Gallese 2006). This would account for stronger neural encoding seen for friends' actions compared with strangers. Hence, stranger was coded as 1 for all actions (no familiarity; most dissimilarity). Within self-identity, we further weighted the actions by their motor familiarity. Specifically, actions that were more motorically familiar to participants due to freely performing the action and self-generating the motor plan (i.e., via verbal instruction) were coded as most similar (0). Actions that involved copying someone else's motor plan (i.e., imitated via visual instruction) were coded as less familiar (0.3). All other identities (friend, stranger) were computed equally similar across actions (friend coded as 0.6, stranger coded as 1). Thus, dissimilarity was computed between identities and weighted by motor familiarity to comprise the 36 × 36 theoretical RDM.
Identity: self (motor familiarity), friend (visual familiarity), or stranger
We also computed theoretical RDMs specific to identity for either self actions, friend actions, or stranger actions. For each identity RDM, the identity of interest (e.g., self) was coded as 0 (most similar), while the other two identities (e.g., friend, stranger) were coded equally as dissimilar (1). Dissimilarity was only computed between identities (and not individual actions) to comprise 36 × 36 theoretical RDMs for each identity (self RDM, friend RDM, or stranger RDM).
Results
Identity recognition from sparse actions
First, we examined whether self-recognition was possible in visually sparse point-light displays. We found that participants could discriminate all identities (self, friend, stranger) significantly above chance (0.33), self: M = 0.563, SD = 0.180, t(19) = 5.789, p < 0.001, Cohen's d = 1.29; friend: M = 0.483, SD = 0.182, t(19) = 3.754, p = 0.001, d = 0.839; stranger: M = 0.5052, SD = 0.172, t(19) = 4.554, p < 0.001, d = 1.01 (Fig. 3).
Behavioral results of identity recognition accuracy. Top, Self-recognition performance for different actions color coded by action type (verbal instruction, gray; visual instruction, blue). Light gray fill indicates bar plots for verbal instruction. Light blue fill indicates bar plot for visual instruction. Inference bands denote 95% Bayesian highest density interval with 1,000 iterations. Horizontal blue line indicates chance-level recognition accuracy (0.33). Bottom left panel, Confusion matrix for each identity. No significant misattributions were found for the self relative to other identities, though friend and stranger were more confused relative to the self (∼55% increase in misattributions for friend and strangers). Bottom right panel, Average recognition accuracy for each identity. All identities were recognized significantly above chance. Self actions were recognized significantly better than friend actions. Light gray fill indicates bar plots. Inference bands denote 95% Bayesian highest density interval with 1,000 iterations. Horizontal blue line indicates chance-level recognition accuracy (0.33). *p < 0.05, **p < 0.01, ***p < 0.001.
Recognition of self-generated actions (M = 0.563, SD = 0.180) was significantly higher than friends' actions (M = 0.483; SD = 0.182), t(19) = 2.673, padj = 0.049, d = 0.598, but not significantly higher than correctly identifying strangers' actions (M = 0.505; SD = 0.172), t(19) = 1.353, padj = 0.192. Self-recognition accuracy was also modulated by motor planning, revealed by a significant interaction effect between action type and identity F(2,19) = 7.546, p = 0.002,
As shown in the top panel of Figure 3, self-recognition was greatest for the stretch action (M = 0.788; SD = 0.412) and lowest for digging (M = 0.375; SD = 0.487). Across all actions, no relationships were found between self-recognition accuracy and distinctiveness related to speed (p = 0.747), acceleration (p = 0.380), postural length (p = 0.410), or movement dissimilarity (p = 0.174). These results confirm that action identity could be distinguished in the sparse visual displays, with an advantage for actions generated with one's own motor plan.
AON is engaged during identity recognition
Our main goal was to examine the neural mechanisms underlying self-recognition from whole-body movements. To do so, we first compared neural activity for each identity (self, friend, stranger) relative to baseline. We found bilateral recruitment of the AON for all identities (Fig. 4, overlayed in MNI space). The activity spanned regions classically found in visual neuroscience, including the pSTS (right: x, y, z = 56, 42, 10; left: x, y, z = −52, −50, 10), and lateral occipital cortices, including the EBA (right x, y, z = 44, −60, 10; left x, y, z = −51, −69, 10), as well as regions with motor properties also described in the action observation literature (Rizzolatti and Craighero, 2004; Bonini et al., 2022), including the bilateral supplementary motor areas (right x, y, z = 12, 6, 56; left x, y, z = 4, −8, 52), premotor cortices (right x, y, z = 39, 1, 53; left x, y, z = −45, 2, 50), inferior frontal gyri (IFG; right x, y, z = 50, 15, 10; left x, y, z = −55, 16, 10), and IPLs (right x, y, z = 50, −40, 14; left x, y, z = −56, −44, 11). Due to large cluster sizes obtained with TFCE, all peaks here reported using manual thresholding (Fig. 4).
Group-level activity obtained using FSL's nonparametric permutation approach (randomise) with TFCE, p < 0.05. From left to right, Self versus baseline; friend versus baseline; and stranger versus baseline. +Large cluster sizes were obtained with TFCE due to the optimal cluster-defining threshold; hence cluster peaks are reported with visual interpolation using manual thresholding with a sliding scale. Abbreviations: IFC, inferior frontal cortex; STS, superior temporal sulcus; LOC, lateral occipital cortex; SMA, supplementary motor area; SMG, supramarginal gyrus; Ang, angular gyrus.
A frontoparietal network for self-action processing
Though visual and motor systems were involved during action observation of all identities, we expected greater activity in motor regions when participants observed their own actions, since self-generated actions are privileged by prior motor experience. According to common coding theory, vision and proprioception share a degree of functional equivalence, such that action recognition is facilitated by a matching process between these modalities (Prinz 1997; Hommel et al., 2001).
Since visual and proprioceptive codes are most closely matched when observing our own actions relative to observing actions of others, self-recognition should be facilitated in brain regions with motor properties that are also active during action observation (Knoblich and Flach, 2003; Limanowski and Blankenburg, 2016; Abdulkarim et al., 2023). Indeed, both self contrasts of interest (self > stranger and self > friend) uniquely evoked greater activity in frontoparietal regions with these properties. For self > stranger, activity was localized to the left posterior supramarginal gyrus (peak x, y, z = −62, −48, 28) into the angular gyrus, as well as the left insular cortex and the inferior frontal gyrus, pars opercularis (x, y, z = −42, 10, −8; Fig. 5). A few small clusters in the anterior cingulate cortex (ACC; x, y, z = −2, 20, 18; x, y, z = 4, 14, 28) and one small cluster in the right insular cortex (x, y, z = 40, 10, −2) were also observed. Self > friend similarly recruited the left posterior SMG of the IPL (x, y, z = −54, −50, 30), spanning the angular gyrus (Fig. 5, right panel). For friend > stranger and stranger > friend, FSL's randomise approach did not yield significant activity. All peak clusters from the analyses are reported in the Extended Data Tables 5-1–5-3.
Univariate group-level activity for self > stranger (left) and self > friend (right) using the FSL randomise permutation approach, cluster corrected with TFCE (p < 0.05). Violin plot shows mean parameter estimates (PE) for the left posterior supramarginal gyrus (SMG) for all identities. The left SMG significantly discriminated contrasts of PE for both self versus stranger (p = 0.001) and self versus friend (p = 0.005), but not friend versus stranger (p = 0.821). Extended Data Figures 5-1 and 5-3 report the activity maps and peak clusters for both TFCE contrasts, as well as RFT cluster-corrected results (Extended Data Figs. 5-2 and 5-5). Abbreviations: IFC, inferior frontal cortex; Ins, insula; IPL, inferior parietal lobule; ACC, anterior cingulate cortex.
Figure 5-1
Self = stranger (TFCE, p < .05 FWE-corrected). Download Figure 5-1, TIF file.
Figure 5-2
Self = stranger (Z = 3.1, p < .05; RFT Cluster Correction). Download Figure 5-2, TIF file.
Figure 5-3
Self vs Friend (TFCE; p < .05 FWE-corrected). Download Figure 5-3, TIF file.
Figure 5-4
Self = Friend (Z = 3.1, p < .05; RFT Cluster Correction). Download Figure 5-4, TIF file.
Figure 5-5
Stranger = Self (Z = 3.1, p < .05; RFT Cluster Correction). Download Figure 5-5, TIF file.
Coactivation in these regions does not necessarily implicate a network for self-processing. Thus, we further measured network-related activity during self-processing using task-based functional connectivity (PPI; Friston et al., 1997). The bilateral IPL (peak sphere from the group-level conjunction maps for self-processing: left: x, y, z = −56, −44, 42; right: 54, −38, 40) was set as seed regions in separate PPIs, due to the important role of the IPL in motor simulation and hub status in action processing. All peak PPI clusters are reported in Table 1.
PPI results with bilateral IPL seeds
We found very similar results across both hemispheric seeds. For both seed regions, we observed strengthened frontoparietal and parietovisual connectivity for the self-processing contrasts (self > stranger and self > friend). The left IPL seed for self > stranger showed the greatest peak connectivity between parietovisual regions: the right lateral occipital cortex (x, y, z = 54 −50, −2) and the left occipitotemporal fusiform area (x, y, z = −52, −70, −12). We also found strengthened frontoparietal connectivity, specifically with the bilateral inferior frontal cortices (left x, y, z = −54, 16, 30; right x, y, z = 46, 18, 20), as well as bilateral intraparietal sulcus spanning the somatomotor cortex (left x, y, z = 26, −50, 44; right x, y, z = 32, −36, 44; Fig. 6). For the right IPL seed, we found similar connectivity patterns to the left. For self > friend with the right IPL seed, we found the greatest frontoparietal functional connectivity, between the right IPL and the bilateral IFC (x, y, z = −36, 30, 34), extending from the middle frontal gyrus to the IFG pars opercularis and spanning the primary motor cortex and premotor cortex. Additional activity was found in the right pre-SMA (x, y, z = 4, 12, 58) as well as bilateral occipitotemporal regions, with peaks in the right occipital-temporal cortex (x, y, z = 46, −56, −2) and left superior temporal sulcus (x, y, z = −62, −50, 8). For self > stranger, we observed strengthened parieto-occipitotemporal activity, with peaks in the left lateral occipitotemporal cortex (x, y, z = −46, −68, 12) and right fusiform area (x, y, z = 42, −40, −20). Additionally, we found strengthened connectivity with the frontal lobe, with peaks in the bilateral IFC, spanning the premotor and primary motor regions. No activity was found for friend > stranger. All activity maps were cluster corrected at Z > 2.3, p < 0.01.
Task-modulated functional connectivity of left and right IPL. Left IPL (top panel) seed showed increased connectivity with bilateral occipitotemporal regions, bilateral superior and inferior parietal areas, and bilateral inferior frontal cortex during self > stranger. For self > friend, functional connectivity analysis revealed greater connectivity with the bilateral inferior frontal cortices and occipitotemporal regions. Task-modulated functional connectivity of the right IPL (bottom panel) showed a similar activity pattern to the left: strengthened frontoparietal and parieto-occipital connectivity for both contrasts. All activity cluster corrected at Z > 2.3, p < 0.01. Abbreviations: IPL, inferior parietal lobule; IPS, intraparietal sulcus; IFC, inferior frontal cortex; OT, occipitotemporal regions; EBA, extrastriate body area; STS, superior temporal sulcus.
Evaluating a visuomotor representational space for self-processing
Based on the strengthened frontoparietal connectivity for self-processing, the analysis below focused on underlying representational structure. Specifically, we examined the extent to which self-recognition relied on factors that resembled motor familiarity, while accounting for visual signatures of the actions across the whole-brain using multiple regression RDA. We opted for whole-brain analyses since frontoparietal regions often comprise multiple brain networks (e.g., AON, central executive network) and since additional regions associated with motor functions also encode self-processing. If self-recognition relies on motor mechanisms, then encoding patterns may further span other regions associated with motor properties, such as the somatomotor cortex. Thereafter, we conducted four multiple regression RDAs for the following predictors of interest: (1) motor familiarity and (2) for each identity—(2a) self, (2b) friend, or (2c) stranger in separate regression models, accounting for visual features related to speed or movement distinctiveness.
Multiple regression motor familiarity RDA: somatomotor cortex and occipitotemporal regions
The motor familiarity RDM was computed based on the theorized motor familiarity between each of the identities (self as most motorically familiar, friend as medium, and stranger as least). Within self-identity, we further weighted the actions by their degree of motor familiarity. Actions that were most motorically familiar to participants due to self-generating the motor plan were coded as most similar. Actions that involved copying someone else's motor plan (i.e., imitated via visual instruction) were coded as less familiar.
Shown in Figure 7, we found robust encoding in the somatomotor, frontoparietal, and lateral-occipital cortices. Specifically, the motor familiarity multiple regression RDA (accounting for differences in speed and movement distinctiveness) revealed the largest pattern of encoding in the bilateral primary motor cortex (M1), spanning the primary somatosensory cortex (S1), and showed stronger representation in the left hemisphere (left peak x, y, z = −46, −22, 50) than in the right (right peak x, y, z = 52, 1, 34). Activity patterns were also found in frontoparietal regions, including inferior parietal (right peak x, y, z: 54, −36, 36; left peak x, y, z: 46, −66, 34) and a large cluster spanning the anterior cingulate, mid-superior frontal areas, and supplementary motor areas (right peak x, y, z = 11, 50, 17; left peak x, y, z = −18, 3, 41). Activity patterns were also observed in the occipital and lateral-occipital regions, extending into the bilateral lingual gyrus, precuneus, cuneus (right peak x, y, z = 22, −61, −2). These results together reveal a gradation of encoding in motor-related regions using identity-based motor familiarity. Specifically, motor-related brain regions were most strongly encoded when viewing self-generated actions, followed by friend, and followed by stranger. An exhaustive table of all activity patterns is reported in Extended Data Table 7-1.
Multiple regression searchlight RDA results for motor familiarity. This figure depicts the z-transformed activity map for significant correlations between the motor familiarity RDM and the neural RDM based on activity patterns for actions (self encoded as least dissimilar, with action separation to account for motor familiarity between action types; friend as medium dissimilarity, stranger as most), after accounting for speed and movement distinctiveness (DTW). Activation map reflects brain activity after 10,000 nonparametric Monte Carlo simulations, using TFCE and p < 0.01. Regions, bilateral somatomotor cortex: primary motor cortex, primary somatosensory cortex, superior parietal lobule; frontoparietal cortex: inferior parietal lobule, inferior frontal cortex, medial prefrontal cortex; occipitotemporal cortex: inferior temporal cortex, superior temporal sulcus and gyrus. All activity patterns are reported in Extended Data Table 7-1.
Table 7-1
Extended data table for RDA results for motor familiarity (p<.01). Identity was based on the degree of motor dissimilarity to oneself (self-generated actions, i.e., verbal instruction: zero dissimilarity; self-imitated actions, i.e., visual instruction: small dissimilarity, 0.3; friend actions: medium dissimilarity, 0.6; strangers: most dissimilarity, 1). Download Table 7-1, DOCX file.
Multiple regression identity RDAs: stronger representation in somatomotor cortex and mPFC
We then measured whether the representational encoding found in these regions was specialized for self-identity. We compared activity patterns generated from multiple regression RDAs that specified self actions as the predictor of interest, as compared with multiple regression RDAs for each other identity (friend or stranger).
The self-identity RDA generated the largest activity patterns in the bilateral somatomotor regions, with its peak in the left hemisphere (left peak x, y, z = −30,−23,57) and visually identified in the right hemisphere (right peak x, y, z = 40, −12, 50; Fig. 8). We also found large activity patterns in frontoparietal regions, spanning the IPL (left peak x, y, z = −37,−64,40; right peak x, y, z = 60,−36,27), supplementary motor area (left peak x, y, z = −8,−7,58, right peak x, y, z = 11,15,58), and lateral to medial prefrontal cortices (peak x, y, z = 46,50,4) for the self-identity multiple regression RDA. These results suggest that the somatomotor and frontoparietal regions—associated with motor simulation—primarily encoded self actions relative to actions of others. Further, the strength of encoding in the somatomotor and frontoparietal cortices systematically degraded as a function of identity. Specifically, the friend RDA produced less encoding, and the stranger RDA produced no significant encoding in these regions. Activity patterns were also most visually distributed for the self, followed by friend, and followed by stranger (examined at a reduced threshold, p < 0.05).
Multiple regression searchlight RDA results for each identity (self, friend, stranger). Activation maps reflect TFCE-corrected brain activity after 10,000 nonparametric Monte Carlo simulations, p < 0.01 for self and friend; p < 0.05 for stranger. Dissimilarity matrices reflect dissimilarity based on identity across all actions. Regions, Frontoparietal: inferior parietal lobule; superior frontal gyrus, lateral and medial prefrontal cortices. Somatomotor: primary motor cortex (M1), primary somatosensory cortex (S1). Occipitotemporal: superior temporal sulcus, middle temporal gyrus, extrastriate body area. Activity patterns are reported in Extended Data Tables 8-1–8-4 and Figure 8-1.
Table 8-1
Extended data table for multiple regression RDA results specifying self-identity as the representational dissimilarity matrix (RDM; p<.01), accounting for distinctiveness in speed and movement similarity (Dynamic Time Warping). Self was encoded as least dissimilar (0: no dissimilarity in RDM), friend and stranger were each encoded as most dissimilar (1: high dissimilarity in RDM). Download Table 8-1, DOCX file.
Table 8-2
Extended data table for multiple regression RDA results specifying friend-identity as the representational dissimilarity matrix (RDM; p<.01), accounting for distinctiveness in speed and movement similarity (Dynamic Time Warping). Friend was encoded as least dissimilar (0: no dissimilarity in RDM), self and stranger were each encoded as most dissimilar (1: high dissimilarity in RDM). Download Table 8-2, DOCX file.
Table 8-3
Extended data table for multiple regression RDA results specifying stranger-identity as the representational dissimilarity matrix (RDM; p<.01), accounting for distinctiveness in speed and movement similarity (Dynamic Time Warping. Stranger encoded as least dissimilar (0: no dissimilarity in RDM), self and friend were each encoded as most dissimilar (1: high dissimilarity in RDM). Download Table 8-3, DOCX file.
Additional activity patterns unique to self-identity were also found in bilateral parahippocampal gyri (left peak x, y, z = −16,−13,−20, right peak x, y, z = 32,−28,−4), with much smaller activity patterns found in the left occipital pole (left peak x, y, z = −23,−98,−14), bilateral temporal pole (right peak x, y, z = 46,4,−33, left peak x, y, z = −32,−39,16), thalamus (peak x, y, z = 14,−22,18), and precuneus (peak x, y, z = 8,−38,6). For the friend RDA, the activity patterns were noticeably sparser and largely overlapped with self-identity but mostly constrained to the cortical midline. These regions spanned the precentral gyri, SMA, IPL, insula (peak x, y, z = −46, −30, 23), the left calcarine and occipitotemporal regions (peak x, y, z = −16, −61, 16), and thalamus (peak x, y, z = −8, 34, −0). For the stranger RDA, only sparse activity patterns were found in visual regions: right middle temporal gyrus and occipitotemporal cortex (peak x, y, z = 62, −47, 6) at a reduced threshold (Z > 1.96). See Extended Data Tables 8-1–8-3 for an exhaustive report of all clusters from all RDAs, visually depicted in Figure 8. Voxel number comparisons between identity RDAs are reported in Table 2.
Number of voxels in regions of interest for each main identity RDA
Finally, to account for any effect of motor planning of the button responses producing the large motor cluster in the left hemisphere for the self-RDA, we conducted an additional RDA for self-identity that included the timing of the motor responses as a covariate in the multiple regression analysis. The results maintained the original findings of the self-RDA. Specifically, the largest cluster from the RDA was observed in the left somatomotor cortex (left peak x, y, z = −42, −20, 46) and preserved the main findings. See Extended Data Table 9-1 for an exhaustive report of all clusters from the RDA, visually depicted in Figure 9.
Multiple regression searchlight RDA results for self-identity, regressing out motor responses. Activation maps reflect TFCE-corrected brain activity after 10,000 nonparametric Monte Carlo simulations, p < 0.01 for self. Dissimilarity matrix reflects dissimilarity based on self-identity across all actions. Regions, frontoparietal: inferior parietal lobule; superior frontal gyrus, lateral and medial prefrontal cortices. Somatomotor: primary motor cortex (M1), primary somatosensory cortex (S1). Occipitotemporal: superior temporal sulcus, middle temporal gyrus, extrastriate body area. Activity patterns are reported in Extended Data Table 9-1.
Table 9-1
Supplementary table for multiple regression RDA results specifying self-identity as the representational dissimilarity matrix (RDM; p<.01), accounting for distinctiveness in speed, movement similarity (Dynamic Time Warping), and motor responses. Self was encoded as least dissimilar (0: no dissimilarity in RDM), friend and stranger were each encoded as most dissimilar (1: high dissimilarity in RDM). Download Table 9-1, DOCX file.
Combined with results from the motor familiarity RDA, these findings lend support to motor simulation accounts (Jeannerod, 2001; Jeannerod and Pacherie, 2004). Self-processing, due to its high degree of motor familiarity, would be expected to have the strongest degree of motor simulation during action observation, reflected by the largest activity patterns in motor-related regions, followed by friend, and then stranger. This aligns with prevalent accounts suggesting that action observation of others involves an internal simulation of the action onto our own motor systems (Rizzolatti and Craighero, 2004; Iacoboni et al., 2005; Iacoboni, 2008).
Discussion
Our study investigated the neural correlates for self-recognition of our whole-body movements. On functional brain imaging data, we report that merely observing our whole bodies in motion evokes greater activity in neural systems traditionally construed as having motor functions, in comparison with observing the actions of others.
While boundaries between visual and motor functions have been increasingly blurred over the last few decades of systems neuroscience research, traditionally frontoparietal areas are mostly conceived as having motor functions, whereas occipitotemporal areas are typically construed as involved in visual processing. Here, we found that both areas were involved in action observation of all identities. However, unique to self-action observation, we observed greater activity and functional connectivity of frontoparietal regions (left IPL and IFC), functionally connected to occipitotemporal regions. Note that significance for the univariate activity subtraction contrasts was assessed using nonparametric TFCE, as TFCE has been shown to be more sensitive yet less prone to false positives in the literature (Smith and Nichols, 2009). This resulted in left-lateralized activity for self-processing. However, bilateral involvement of these regions was clearly observed when using FSL's standard RFT cluster correction (Z > 3.1; p < 0.05) as well as in our multivariate analyses. To avoid false positives, we interpret the nonparametric results but do not make strong claims on observed laterality.
Action simulation accounts posit a central role of the motor system during action observation (Gallese and Goldman, 1998; Rizzolatti and Sinigaglia, 2010). The degree of motor experience with actions (Hohmann et al., 2011) is thought to parametrically modulate activity in these frontoparietal and motor regions during action observation (even across modalities; Blakemore and Frith, 2003; Kaplan et al., 2008; Kirsch and Cross, 2015). Since self-generated actions benefit from prior motor experience, action simulation could be one candidate mechanism for the increased activity and connectivity in these regions. However, these regions, notably frontoparietal, also support functions beyond action simulation, including working memory (Baddeley, 2003), cognitive control (Corbetta and Shulman, 2002), and multisensory integration (Macaluso and Driver, 2005). While we are unaware of any direct links between cognitive control and self-recognition on a visual perception task—multisensory integration, particularly in the IPL, could be an important region to facilitate self-action recognition by combining visual and proprioceptive information for bodily self-awareness (Bréchet et al., 2018) and agency (Jackson and Decety, 2004). Similarly, working memory could facilitate retention of the action in order to differentiate identity, implicating the intraparietal sulcus and numerous occipitotemporal regions (Woźniak et al., 2022).
It is important to note that merely observing actions may not veridically engage the same cognitive and neural resources associated with action simulation. For instance, while action observation can engage sensorimotor areas, it may not trigger the same internal model mechanisms that would predict somatosensory attenuation during action production, as expected in action simulation accounts (Kilteni et al., 2021). Conversely, other processes such as motor imagery can engage these mechanisms (Kilteni et al., 2018). Hence, we do not make strong claims on positing the functional mechanism associated with these areas but highlight action simulation as one possible candidate.
Strengthened connectivity was also observed between the bilateral IPL and the IFC anterior to the premotor cortex, during self-action recognition. Action simulation accounts often implicate both the IFC and IPL, two anatomically and functionally connected areas. Other proposals suggest that anterior parcellations of the IFC might be locally involved in abstracted aspects of action understanding, such as goal selection, intention inference, and semantic understanding (Devue et al., 2007; Morin and Michaud, 2007; Liakakis et al., 2011; Soch et al., 2017; Sokolov et al., 2018). During self-action recognition, the IFC (including its more anterior portions) could support the integration of action observation with higher-order cognitive processes. Information flow may originate from strengthened parieto-occipitotemporal functional connectivity during action processing and either decoded in the IPL (Ürgen et al., 2019; Tholen et al., 2019) or passed onto the IFC (in both anterior and posterior IFG in our data) for more conceptual action understanding.
Our results also highlight the role of parieto-occipitotemporal regions in action observation. These regions may distinguish fine-grained visual features that facilitate discrimination between identities (e.g., Downing et al., 2001; Peelen and Downing, 2005; Downing et al,. 2006a,b; Downing and Peelen, 2011; Bracci et al., 2015). Together with the IPL and the IFC (Kilner, 2011), this set of areas may comprise an expanded AON for self-recognition (Jeannerod, 2004). That is, occipital-temporal regions first decode coarse visual identity based on low- and mid-level action features (including for person perception in the superior temporal sulcus; Yovel and O'Toole, 2016; Isik et al., 2017), and category representations (Wurm and Lingnau, 2015; Wurm et al., 2017), while frontoparietal regions may process self actions at deeper motoric and proprioceptive levels (Rizzolatti and Craighero, 2004; Rizzolatti et al., 2014).
In addition to frontoparietal and occipitotemporal regions engaged during self-action observation, the multivariate results revealed largest activity patterns in bilateral somatomotor regions. Neural activity in the primary motor cortex has been observed during the perception of human movement (Orgs et al., 2016). Activity for both the motor familiarity and self-identity representational (dis)similarity analyses (RSA) spanned the primary motor, primary somatosensory, supplementary motor areas, and the premotor cortices. Further, the strength of encoding in the somatomotor and frontoparietal cortices systematically degraded as a function of identity. These regions most strongly encoded self-identity, moderately encoded friend-identity, and did not encode stranger-identity, which instead revealed activity patterns in primarily occipitotemporal regions. The relatively parametric degradation of somatomotor and frontoparietal encoding as a function of person identity lends further support to action simulation accounts.
While neural activity in these frontoparietal and somatomotor regions is often implicated in motor production (Muir and Lemon, 1983) as well as control, attention, and working memory processes as noted earlier, these regions are also functionally implicated in tasks that could involve action simulation, including action observation (Gallese and Goldman, 1998; Keysers and Gazzola, 2007), motor imagery (Pfurtscheller and Neuper, 1997; Schnitzler et al., 1997; Porro et al., 2000; Ehrsson et al., 2003; Pilgramm et al., 2016), action prediction (Blakemore and Frith, 2003; Lamm et al., 2007), motor memory (Romo et al., 2012), and motor planning (Gale et al., 2021). Moreover, coactivation in both premotor and posterior parietal areas appears to depend on the match between motor and visual information that facilitates one's sense of body ownership (Abdulkarim et al., 2023). The greater match between common visual and proprioceptive codes may provide the increased sense of bodily awareness needed to facilitate self-recognition. This is reflected by the greater signal encoding in these regions for the self, which degraded by visuomotor person familiarity (i.e., less for friend, none for stranger).
The RSA results also revealed that the neural encoding was most distributed for self-identity, followed by friend, and least for stranger, where it was primarily localized to occipitotemporal regions. A substantial body of research suggests that self-processing generally engages systems-wide and distributed activity compared with processing other identities (Turk et al., 2003; Molnar-Szakacs and Uddin, 2013; Yeshurun et al., 2021). Indeed, at the network-level, self-processing involves strong interactions between both low-level feature-based processing and higher-level conceptual processing, facilitating a sense of identity due to the wealth of information we have stored about our own identities (Molnar-Szakacs and Uddin, 2013; Uddin et al., 2007).
Results from the self-identity RSA also revealed distributed encoding patterns in other regions (Extended Data Table 8-1). The activity patterns spanned regions traditionally associated with mentalizing (Frith and Frith, 2006) and higher-order reflective and conceptual self- and other-processing, including the bilateral posterior cingulate cortex, medial (and lateral) prefrontal cortex, bilateral hippocampus, and the precuneus. These regions not only engage during mentalizing for others, but also for conceptual mentalizing about oneself (Lombardo et al., 2010; Qin and Northoff, 2011), and conscious awareness of oneself (Tacikowski et al., 2017; Lieberman et al., 2019). Well-known action frameworks (Keysers and Gazzola, 2007) characterize a degree of dynamic connectivity between simulative motor representations and abstracted, self-reflective judgments. It is possible that these regions may store action representations in memory, or motor schemas, which are later accessed as a comparison to the visual consequence during action observation (Schmidt, 1975; Arbib 1981; Arbib, 1992). That is, rather than identifying one's body based solely on visual cues that we generally lack access to in daily life, we may access stored proprioceptive schemas at a more abstract level of processing (i.e., “remembered selves”; Neisser, 1988) that interact with action observation to facilitate the visuo-proprioceptive match needed for self-recognition.
Finally, a cluster of activity in the ACC was also observed in the RSA as well as a small cluster during the univariate task contrast of self > stranger actions. While ACC engagement may be due to multiple reasons including multimodal processing and integration of self-related information (Apps and Tsakiris, 2014; Morita et al., 2014), or salience network involvement (Asakage and Nakano, 2023) a key account of ACC function is related to cognitive conflict (Braver et al., 2001). Prior research has shown that the ACC is involved in discriminating one's own touch from an external touch, with the activity linked to the conflict between expected and actual sensorimotor feedback (Blakemore et al., 1999; Stetson et al., 2006; Kilteni and Ehrsson, 2024). There may be a similar conflict mechanism here when participants merely view their own and other people's actions. The brain has well-established representations of self-generated actions, and viewing these actions might generate conflict between the internal sensorimotor expectations and the stimulus-driven visual feedback during action observation. This conflict should be less pronounced or even absent when viewing others' actions, since the internal sensorimotor predictions for others' actions are less accessible.
In summary, our three main analyses—univariate, functional connectivity, and RSA—converge on a cortical ensemble of visuomotor regions, spanning frontoparietal, somatomotor, and occipitotemporal areas, that seem prioritized for self-recognition of whole-body actions. These regions, notably frontoparietal and somatomotor cortices, are often linked to simulative motor functions during action observation, which may provide a functional explanation for the increased motor-related activity we observed. Our findings together reveal an important contribution of motoric indices to human self-awareness, helping to facilitate the basic differentiation between ourselves and others.
Data Availability
All analysis scripts, behavioral data, and results from the imaging analyses can be downloaded from our GitHub repository: https://github.com/akilakada/self-fmri. Raw nifti data can be shared upon request to the corresponding author and subject to the UCLA Institutional Review Board Guidelines.
Footnotes
We thank Sophia Baia and Kelly Xue for assistance with data collection and stimuli creation and Elinor Yeo, Jolie Wu, Kelly Nola, Nicolas Jeong, Danya Elghebagy, David Lipkin, and Shahan McGahee for assistance with stimuli creation. We thank Jeff Chiang, Burcu Ürgen, and Giuseppe Marrazzo for helpful advice on the analyses. We thank Lisa Aziz-Zadeh and Sofronia Ringold for helpful feedback on an earlier draft of this manuscript. This work was supported by National Science Foundation BCS-2142269 to H.L., UCLA faculty research grant to H.L., Tiny Blue Dot Foundation Grant to M.M.M., and APA Dissertation Award to A.K. Preliminary versions of this project were presented at the Virtual Society for Neuroscience (2020), V-Vision Sciences Society (2020), Society for Neuroscience (2022), and Association for Scientific Study of Consciousness (2023).
The authors declare no competing financial interests.
- Correspondence should be addressed to Akila Kadambi at akadambi{at}ucla.edu.