Communicative intentions are transmitted by many perceptual cues, including gaze direction, body gesture, and facial expressions. However, little is known about how these visual social cues are integrated over time in the brain and, notably, whether this binding occurs in the emotional or the motor system. By coupling magnetic resonance and electroencephalography imaging in humans, we were able to show that, 200 ms after stimulus onset, the premotor cortex integrated gaze, gesture, and emotion displayed by a congener. At earlier stages, emotional content was processed independently in the amygdala (170 ms), whereas directional cues (gaze direction with pointing gesture) were combined at ∼190 ms in the parietal and supplementary motor cortices. These results demonstrate that the early binding of visual social signals displayed by an agent engaged the dorsal pathway and the premotor cortex, possibly to facilitate the preparation of an adaptive response to another person's immediate intention.
During social interactions, facial expressions, gaze direction, and gestures are crucial visual cues to the appraisal other people's communicative intentions. The neural bases for the perception of each of these social signals has been provided but mostly separately (Haxby et al., 2000; Rizzolatti et al., 2001; Hoffman et al., 2007). However, these social signals can take on new significance once merged. In particular, processing of these social signals will vary according to their self-relevance, e.g., when coupled with direct gaze, angry faces are perceived to be more threatening (Sander et al., 2007; Hadjikhani et al., 2008; N′Diaye et al., 2009; Sato et al., 2010). So far, it remains unclear how these social signals are integrated in the brain.
At the neural level, there is some evidence that emotion and gaze direction interact in the amygdala (Adams and Kleck, 2003; Hadjikhani et al., 2008; N′Diaye et al., 2009; Sato et al., 2010), a key structure for the processing of emotionally salient stimuli (Adolphs, 2002). The amygdala may thus sustain early binding of visually presented social signals. Electroencephalography (EEG) studies suggest that the interaction between emotion and gaze direction occurs at ∼200–300 ms (Klucharev and Sams, 2004; Rigato et al., 2010), but direct implication of the amygdala in such a mechanism has yet to be provided.
It has also been established that, when one observes other people's bodily actions, there is activity in motor-related cortical areas (Grèzes and Decety, 2001; Rizzolatti et al., 2001) and that activity reaches these areas 150–200 ms after the onset of a perceived action (Nishitani and Hari, 2002; Caetano et al., 2007; Tkach et al., 2007; Catmur et al., 2010). Its activity being modulated by social relevance (Kilner et al., 2006) and by eye contact (Wang et al., 2011), the motor system is thus another good neural candidate for the integration of social cues.
Here, we set out to experimentally address whether the emotional system or the motor system sustains early binding of social cues and when such an operation occurs. We manipulated three visual cues that affect the appraisal of the self-relevance of social signals: gaze direction, emotion, and gesture. To induce a parametric variation of self-involvement at the neural level, our experimental design capitalized on the ability to change the number of social cues displayed by the actors toward the self (see Fig. 1a), i.e., one (gaze direction only), two (gaze direction and emotion or gaze direction and gesture), or three (gaze direction, emotion, and gesture) visual cues. We then combined functional magnetic resonance imaging (fMRI) with EEG [recording of event-related potentials (ERPs)] to identify the spatiotemporal characteristics of social cues binding mechanism. First, we analyzed the ERPs to identify the time course of early binding of social cues. We expected a temporal marker of their integration at ∼200 ms (Klucharev and Sams, 2004; Rigato et al., 2010). Then, we quantified the parametric variation of self-involvement on the neural sources of the ERPs by combining the ERPs with fMRI data.
Materials and Methods
Twenty-two healthy volunteers (11 males, 11 females; mean age, 25.0 ± 0.5 years) participated in an initial behavioral pretest to validate the parametric variation of self-involvement in our paradigm. Twenty-one healthy volunteers participated in the final experiment (11 males, 10 females; mean age, 23.4 ± 0.5 years). All participants had normal or corrected-to-normal vision, were right-handed, and had no neurological or psychiatric history.
Stimuli consisted of photographs of 12 actors (six males). For each actor, three social parameters were manipulated: (1) gaze direction [head, eye gaze, and bust directed toward the participant (direct gaze condition) or rotated by 30° to the left (averted gaze condition)]; (2) emotion (neutral or angry); and (3) gesture (pointing or not pointing). This manipulation resulted for each actor in eight conditions of interest. For each of the actors, we created an additional photograph in which they had a neutral expression, arms by their sides, and an intermediate eye direction of 15°. This position was thereafter referred to as the “initial position.” For all stimuli, right- and left-side deviation was obtained by mirror imaging. Thus, each actor was seen under 16 conditions: 2 gaze directions (direct/averted) × 2 emotions (anger/neutral) × 2 gestures (pointing/no pointing) × 2 directions of gaze deviation (rightward/leftward), resulting in 192 stimuli. For each photograph, the actor's body was cut and pasted on a uniform gray background and displayed in 256 colors. Each stimulus was shown in such a way that the actor's face covered the participant's central vision (<6° of visual angle both horizontally and vertically) while the actor's body covered a visual angle inferior to 15° vertically and 12° horizontally.
Each trial was initiated for 500 ms by a fixation area consisting of a central red fixation point and four red angles delimiting a square of 6° of central visual angle in the experimental context. This fixation area remained on the screen throughout the trial, until the appearance of a response screen. The participant was instructed to fixate the central point and to keep his/her attention inside the fixation area at the level of the central point during the trial, avoiding eye blinks and saccades (for additional details about instructions, see Conty and Grèzes, 2012). Given the importance of an ecologically valid approach (Zaki and Ochsner, 2009; Schilbach, 2010; Wilms et al., 2010), we kept our design as naturalistic as possible. To do so, an apparent movement was created by the consecutive presentation of two photographs on the screen (Conty et al., 2007). The first photograph showed an actor in the initial position during a random time, ranging from 300 to 600 ms. This was immediately followed by a second stimulus presenting the same actor in one of the eight conditions of interest (Fig. 1). This second stimulus remained on the screen for 1.3 s. Throughout the trial, the actor's face remained within the fixation area.
An explicit task on the parameter of interest, i.e., to judge the direction of attention of the perceived agent (Schilbach et al., 2006), was used. Thus, after each actor presentation, the participant was instructed to indicate whether the actor was addressing them or another. This was signified by a response screen containing the expressions “me” and “other.” The participant had to answer by pressing one of two buttons (left or right) corresponding to the correct answer. The response screen remained until 1.5 s had elapsed and was followed by a black screen of 0.5 s preceding the next trial.
Behavioral and EEG/fMRI experiments.
In a behavioral pretest, the above procedure was used, with the exception that each actor stimulus was presented in either the left or right side of deviation (the assignment was reserved for half of the participants). Moreover, following the “me–other ” task, participants had to judge the degree of self-involvement they felt on a scale of 0 to 9 (0, “not involved”; 9, “highly involved”). The response screen remained visible until the participant had responded.
In the scanner, the 192 trials were presented in an 18 min block, including 68 null events (34 black screens of 4.1 s and 34 of 4.4 s). The block was then repeated with a different order of trials within the block.
Behavioral data analyses.
During both the behavioral pretest and the EEG/fMRI experiment, participants perfectly performed the me–other task (behavioral: mean of reaction time = 622 ± 23 ms; mean of correct responses = 97 ± 0.8%; EEG/fMRI: mean of reaction time = 594 ± 18 ms; mean of correct responses = 99 ± 0.4%). These data were not further analyzed. For the behavioral pretest, repeated-measures ANOVA was performed on percentage of self-involvement, with gaze direction (direct/averted), emotion (anger/neutral), and gesture (pointing/no pointing) as within-subjects factors.
EEG data acquisition, processing, and analyses.
In the fMRI, EEGs were recorded at a sampling frequency of 5 kHz with an MR-compatible amplifier (Brain Products) placed inside the MR scanner. The signal was amplified and bandpass filtered online at 0.16–160 Hz. Participants were fitted with an electrode cap equipped with carbon wired silver/silver–chloride electrodes (Easycap). Vertical eye movement was acquired from below the right eye; the electrocardiogram was recorded from the subject's clavicle. Channels were referenced to FCz, with a forehead ground and impedances kept <5 kΩ. EEGs were downsampled offline to 2500 Hz for gradient subtraction and then to 250 Hz for pulse subtraction (using EEGlab version 7; sccn.ucsd.edu/eeglab). After recalculation to average reference, the raw EEG data were downsampled to 125 Hz and low-pass filtered at 30 Hz. Trials containing artifacts or blinks were manually rejected. To study the ERPs in response to the perception of the actor's movement, ERPs were computed for each condition separately between 100 ms before and 600 ms after the second photograph and baseline corrected.
P100-related activity was measured by extracting the mean activity averaged on four occipito-parietal electrodes around the wave peak between 112 and 136 ms in each hemisphere (PO7/PO3/P7/P5, PO8/PO4/P8/P6). Early N170-related activity was measured by extracting the mean activity averaged on four electrodes around the peak between 160 and 184 ms in each hemisphere (P5/P7/CP5/TP7, P6/P8/CP6/TP8). Late N170-related activity was measured similarly around the peak of the direct attention condition between 176 and 200 ms. P200-related activity was measured by extracting the mean activity averaged on six frontal electrodes around the peak between 200 and 224 ms (F1/AF3/Fz/AFz/F2/AF4). Repeated-measures ANOVA was performed on each measure with gaze direction (direct/averted), emotion (anger/neutral), gesture (no pointing/pointing), and, when relevant, hemisphere (right/left) as within-subjects factors (the analyses pooled over rightward and leftward sides of actor's deviation).
fMRI data acquisition and processing.
Gradient-echo T2*-weighted transverse echo-planar images (EPIs) with blood oxygen-level dependent (BOLD) contrast were acquired with a 3 T Siemens whole-body scanner. Each volume contained 40 axial slices (repetition time, 2000 ms; echo time, 50 ms; 3.0 mm thickness without gap yielding isotropic voxels of 3.0 mm3; flip angle, 78°; field of view, 192 mm; resolution, 64 × 64), acquired in an interleaved manner. We collected a total of 1120 functional volumes for each participant.
Image processing was performed using Statistical Parametric Mapping (SPM5; Wellcome Department of Imaging Neuroscience, University College London, London, UK; www.fil.ion.ucl.ac.uk/spm) implemented in MATLAB (MathWorks). For each subject, the 1120 functional images acquired were reoriented to the anterior commissure–posterior commissure line, corrected for differences in slice acquisition time using the middle slice as reference, spatially realigned to the first volume by rigid body transformation, spatially normalized to the standard Montreal Neurological Institute (MNI) EPI template to allow group analysis, resampled to an isotropic voxel size of 2 mm, and spatially smoothed with an isotropic 8 mm full-width at half-maximum Gaussian kernel. To remove low-frequency drifts from the data, we applied a high-pass filter using a standard cutoff frequency of 128 Hz.
Joint ERP–fMRI analysis.
Statistical analysis was performed using SPM5. At the subject level, all the trials taken into account in the EEG analyses were modeled at the appearance of the second photograph with a duration of 0 s. Trials rejected from EEG analyses were modeled separately. The times of the fixation area (192 trials of 500 ms duration) of the first photograph (192 trials of between 300 and 600 ms) and of the response (192 trials of 1.5 s duration) as well as six additional covariates capturing residual movement-related artifacts were also modeled. To identify regions in which the percentage signal change in fMRI correlated with the ERP data, we extracted the mean amplitude of each ERP peak, trial by trial, subject by subject, and introduced them as parametric modulators of the trials of interest into the fMRI model. This resulted in four parametric modulators (P100, early N170, late N170, and P200) that were automatically orthogonalized by the software. Effects of the ERP modulators were estimated at each brain voxel using a least-squares algorithm to produce four condition-specific images of parameter estimates. At the group level, we performed four t tests, corresponding to P100, early N170, late N170, and P200 image parameter estimates obtained at the subject level.
A significance threshold of p ≤ 0.001 (uncorrected for multiple comparisons) for the maximal voxel level and of p < 0.05 at the cluster level (corresponding to an extent threshold of 150 contiguously active voxels) was applied for late N170 and P200 contrasts. A small volume correction (p < 0.05 corrected for familywise error) approach was also applied to bilateral amygdala using an anatomical mask from SPM Anatomy Toolbox (version 17) for P100, early N170, late N170, and P200 contrasts. The Anatomy Toolbox (version 17) was also used to identify the localization of active clusters. Coordinates of activations were reported in millimeters in the MNI space.
As expected, we found that our stimuli were judged more self-involving when displaying direct compared with averted gaze (F(1,21) = 56.7, p < 0.001), angry compared with neutral facial expression (F(1,21) = 9.2, p < 0.01), and pointing compared with no pointing (F(1,21) = 21.7, p < 0.001). Interestingly, interactions were also observed between gaze direction and emotion (F(1,21) = 8.5, p < 0.01) and between gaze direction and gesture (F(1,21) = 4.6, p < 0.05). Post hoc analyses showed that the effect of emotion was greater when the participant was the target of attention (F(1,21) = 12.4, p < 0.01; mean effect = 15.4 ± 5.2%) than when this was not the case (F(1,21) = 4.3, p < 0.05; mean effect = 6.1 ± 2.1%). Pointing actors were also judged more self-involving when the participant was the target (F(1,21) = 23.2, p < 0.001; mean effect = 12.1 ± 1.5%) than when this was not the case (F(1,21) = 6.2, p < 0.05; mean effect = 5.5 ± 1.4%). The triple interaction between gaze direction, emotion, and gesture failed to reach significance (F(1,21) = 3.6, p < 0.07). However, post hoc analyses revealed, as expected, that the feeling of self-involvement increased with the number of self-relevant cues (all t(1,21) > 2.4, all p < 0.05; see Fig. 3). As a result, we succeeded in creating a parametric paradigm in which the self-relevance increased with the number of self-oriented social signals.
Time course of social visual cue processing and integration
Our first step in analysis was to address the time course of social signal processing and their integration. The sequence of short electric brain responses was indexed by three classical and successive generic ERP components: the occipital P100, the occipito-temporal N170, and the frontal P200 (Ashley et al., 2004; Vlamings et al., 2009). As also reported in the literature (Puce et al., 2000; Conty et al., 2007), we observed that N170 in response to direct attention peaked later than in the other conditions (184 ms vs a mean of 168 ms). Thus, N170 was divided into an early component and a late component.
We observed a main effect of each factor of interest on P100 activity. Direct gaze (F(1,20) = 4.52, p < 0.05), anger (F(1,20) = 9.16, p < 0.01), and pointing (F(1,20) = 17.62, p < 0.001) conditions induced greater positive activity than the averted gaze, neutral emotion, and no-pointing conditions, respectively. However, no interactions between factors were observed (all F < 1).
Analysis on the early N170 revealed first greater activity in the right than the left hemisphere (F(1,20) = 10.55, p < 0.01). Moreover, anger (F(1,20) = 13.27, p < 0.01) and pointing gesture (F(1,20) = 29.53, p < 0.001) induced greater negative activity when compared with the neutral and no-pointing gesture conditions, respectively. However, no interactions between factors were observed (F < 1).
Analyses run on late N170 revealed a main effect of all the factors. The activity was globally greater in the right than in the left hemisphere (F(1,20) = 6.56, p < 0.05). Direct gaze (F(1,20) = 4.52, p < 0.05), anger (F(1,20) = 25.94, p < 0.001), and pointing condition (F(1,20) = 19.78, p < 0.001) induced greater negative activity than, respectively, averted gaze, neutral, and no-pointing condition. The first interaction between gaze direction and gesture emerged on this component (F(1,20) = 12.27, p < 0.005). The condition in which the actor pointed and looked toward the subject induced greater activity than all other conditions (all t > 3.6, all p < 0.01). The late N170 on temporo-parietal sites thus marked the integration of directional social cues (Fig. 2).
On the frontal P200, we observed a main effect of angry expressions (F(1,20) = 5.51, p < 0.03) and direct attention (F(1,20) = 5.02, p < 0.05). Importantly, however, a triple interaction between gaze direction, emotional expressions, and pointing gesture was detected (F(1,20) = 4.71, p < 0.05). The most self-relevant condition, in which the actor expressed anger, looked, and pointed toward participants, induced greater positive activity than all other conditions (all t(1,20) > 2.15, all p < 0.05). Moreover, P200 activity tended to increase with the number of self-directed social cues (Fig. 3). Thus far, our data suggest that the integration between three main social signals is achieved just after 200 ms in frontal sites, yet they do not provide information about the neural source of such integration.
Brain network involved in the integration of self-relevant visual social cues
To explore the brain sources that positively covary with the amplitude of previously identified ERPs, we performed a joint EEG–fMRI analysis. At the subject level, mean amplitudes of P100, early N170, late N170, and P200 peaks (extracted trial × trial) were introduced as four parametric modulators in the fMRI model. This method enables us to search for brain regions in which the percentage signal change in fMRI is correlated with the ERP data without a priori assumptions regarding the location (Ritter and Villringer, 2006). At the group level, we calculated t tests for P100, early N170, late N170, and P200 and looked for brain areas in which the percentage signal change in fMRI correlated with ERP amplitudes.
The goal of the present study was to identify the spatiotemporal course of social visual signal binding. Thus, we first concentrated on late N170 and the P200 when integration occurred. The right operculum parietal cortex (PFop) extending to somatosensory cortex SII (labeled from here on in as PF/SII) and right supplementary motor area (SMA), extending to primary motor area 4a, positively covaried with late N170 amplitude implicated in the integration of self-relevant directional signals (attention and gesture pointing toward the self). The source of P200 modulations involved in the integration of all available self-relevant cues (directional signals toward the self with emotional expression) was found in the right premotor cortex (PM) (Fig. 4, Table 1). In humans, the border between the ventral PM and dorsal PM is located with 40 < z coordinates < 56 (Tomassini et al., 2007). The present source of P200 ranges from z = 34 to z = 58 and is thus located in the dorsal part of the ventral PM. This region is likely equivalent to the macaque area F5c (Rizzolatti et al., 2001). It is strongly connected to the SMA, primary motor area M1, PFop, and SII (Luppino et al., 1993; Rozzi et al., 2006; Gerbella et al., 2011) and hosts visuomotor representations (Rizzolatti et al., 2001).
To assess whether the emotional system also participates in early binding of gaze, emotion, and gesture, we tested whether ERP components modulated by the emotional content of stimuli (P100, N170, and P200) (Batty and Taylor, 2003; Blau et al., 2007; van Heijnsbergen et al., 2007) were associated with activity in the amygdala, known to be highly involved in threat (Adolphs, 1999) and self-relevance processing (Sander et al., 2007; N′Diaye et al., 2009; Sato et al., 2010). To do so, we used that structure bilaterally as a region of interest. BOLD responses in the left amygdala significantly covaried with changes in the early component of N170 (Fig. 4, Table 1). This finding validates our approach by replicating previous results using intracranial ERPs (Krolak-Salmon et al., 2004; Pourtois et al., 2010) and surface EEG (Pourtois and Vuilleumier, 2006; Eimer and Holmes, 2007), showing that information about the emotional content of a perceived facial expression quickly reaches the amygdala (140–170 ms), in parallel with the processing of other facial cues within the visual cortex. Here, we show that emotional processing in the amygdala occurs just before the integration of directional social signals (gaze and pointing toward the self) detected on the late component of N170.
By coupling fMRI with EEG, we demonstrate for the first time that the integration of gaze direction, pointing gesture, and emotion is completed just after 200 ms in the right PM, possibly to facilitate the preparation of an adaptive response to another's immediate intention. We confirm that activity within motor-related cortical areas arises 150–200 ms after the onset of a perceived action (Nishitani and Hari, 2002; Caetano et al., 2007; Tkach et al., 2007; Catmur et al., 2010) and that the interaction between gaze direction and emotion takes place at ∼200–300 ms (Klucharev and Sams, 2004; Rigato et al., 2010). However, in contrast to recent accounts of human amygdala function in social cue integration (Sander et al., 2007; N′Diaye et al., 2009; Cristinzio et al., 2010; Sato et al., 2010), we found that emotional content is processed earlier within the amygdala and independently of other cues.
Early binding of social cues in the PM 200 ms after stimulus onset may relate to an embodied response that serves evaluative functions of others' internal states (Jeannerod, 1994; Gallese, 2006; Keysers and Gazzola, 2007; Sinigaglia and Rizzolatti, 2011). The emotional convergence between the emitter and the observer enhances social and empathic bonds and thus facilitates prosocial behavior and fosters affiliation (Chartrand and Bargh, 1999; Lakin and Chartrand, 2003; Yabar et al., 2006; Schilbach et al., 2008), yet strict motor resonance processing cannot explain the present activation in the PM. Indeed, anger expressions directed at the observer are perceived as clear signals of non-affiliative intentions and are thus less mimicked than averted anger expressions (Hess and Kleck, 2007; Bourgeois and Hess, 2008).
Activity in the PM may relate to the estimation of prior expectations about the perceived agent's immediate intent. Hierarchical models of motor control purport that higher and lower motor modules are reciprocally connected to each other (Wolpert and Flanagan, 2001; Kilner et al., 2007). Within such perspectives, the generative models used to predict the sensory consequences of one's own actions are also used to predict another's behavior. Backward connections inform lower levels about expected sensory consequences, i.e., the visual signal corresponding to the sight of another's action. Conversely, the inversion of the generative models allows for the inference of what motor commands have caused the action, given the visual inputs. The extraction of prior expectations about another's intention corresponds to the inverse model (Wolpert et al., 2003; Csibra and Gergely, 2007), which needs to be estimated from available cues. Crucially, this estimation is proposed to be implemented in the bottom-up path from the temporal cortex to the inferior parietal lobule (PF) to the PM during the observation of the beginning of an action (Kilner et al., 2007). Thus, the present activity in the PM may reflect prior expectations about another's communicative intention, first built from directional cues (gaze and pointing gesture) in the dorsal pathway before integrating the emotional content in the PM. Only then could prior expectations influence, through feedforward mechanisms, the perception of ongoing motor acts via a top-down activation of perceptual areas, generating expectations and predictions of the unfolding action (Wilson and Knoblich, 2005; Kilner et al., 2007). The above-mentioned mechanisms won't be relevant for novel, unexpected and complex actions for which the goal needs to be estimated from the context without the involvement of low-level motor systems (Csibra, 2007; Csibra and Gergely, 2007). Indeed, these mechanisms rely on the equivalence assumption that the observed actor shares the same motor constraints as the observer, and may thus only apply to actions that are in the observer's motor repertoire, such as those manipulated in the present study.
The question arises as to why P200 and PM activity was greater when the actor expressed anger, looked, and pointed toward participants. One possible explanation for this pattern of activity is that information is filtered as a function of its social salience (Kilner et al., 2006; Schilbach et al., 2011; Wang et al., 2011) before the estimation of prior expectations. An alternative and complementary hypothesis is related to the role of the PM in using sensory information to specify currently available actions to deal with an immediate situation (Cisek, 2007). Prior expectations about the perceived agent's immediate intent would thus afford the perceiver specific types of interactions (Gangopadhyay and Schilbach, 2011; Schilbach et al., 2011). Hence, the highest level of activity in the PM reflects the highest degree of potential social interaction, which corresponds here to facing an angry person pointing and looking toward oneself. Indeed, the expression of direct anger signals a probable physical and/or symbolic attack (Schupp et al., 2004), is perceived as threatening (Dimberg and Ohman, 1983; Dimberg, 1986; Strauss et al., 2005), and triggers adaptive action in the observer (Frijda, 1986; Pichon et al., 2008, 2009, 2012; Grèzes et al., 2011; Van den Stock et al., 2011). In accordance with such a view, defensive responses in monkeys are elicited by electrical stimulation at the border between the ventral and dorsal PM (Cooke and Graziano, 2004; Graziano and Cooke, 2006) and are supposed, in humans, to be facilitated within a 250 ms timeframe after the perception of a danger signal (Williams and Gordon, 2007). Here, emotional signals were processed first in the amygdala at ∼170 ms. Interestingly, a substantial number of studies have shown that lesions of the amygdala not only disrupt the ability to process fear signals (LeDoux, 2000) but can also abolish characteristic defensive behavior in primates (Emery et al., 2001). In this model, the amygdala plays a critical role in initiating adaptive behavioral responses to social signals via its connections with subcortical areas and the PM (Avendano, 1983; Amaral and Price, 1984). Thus, we propose that, after having been processed in the amygdala, emotional information is integrated with self-directed directional cues in the PM, enabling prior expectations to be developed about another's intentions and the preparation of one's own action.
At ∼170 ms, emotional processing occurs in the amygdala, independently of self-directed directional cues (gaze direction and pointing gesture). The activation of the amygdala while observers perceived bodily expressions of anger replicates previous studies (Pichon et al., 2009) and supports its proposed role in the automatic detection of threat (Emery and Amaral, 2000; LeDoux, 2000; Amaral et al., 2003; Feinstein et al., 2011). Amygdala damage diminishes the brain's response to threatening faces at both the ∼100–150 and ∼500–600 ms time ranges (Rotshtein et al., 2010), and, in both infants and adults, the interaction between gaze direction and emotion takes place at ∼200–300 ms (Klucharev and Sams, 2004; Rigato et al., 2010). Furthermore, previous fMRI studies manipulating self-involvement during face perception revealed that facial expression and gaze direction are integrated in the medial temporal poles (Schilbach et al., 2006; Conty and Grèzes, 2012) or in amygdala (Adams and Kleck, 2003; Hadjikhani et al., 2008; N′Diaye et al., 2009; Sato et al., 2010). Here, we show that the binding of emotion with gaze direction and pointing gesture arises at ∼200 ms in the PM. This suggests that the pattern of integration revealed previously using fMRI could reflect later rather than early processes.
Before being integrated with emotional content in the PM, self-directed directional cues (gaze direction and pointing gesture) are firstly merged within 190 ms in the parietal areas (PF/SII) and in the SMA. Could the absence of interaction at an early stage between directional cues and emotion have been attributable to some feature of the present stimuli and task? First, when present, pointing gesture always indicated the same direction of attention as did gaze. Second, the participant's task was to judge the actor's direction of attention (toward the self or another) regardless of the emotional content. This may have led participants to prioritize task-relevant directional cues and thus their integration in the PF/SII and in the SMA for response selection and preparation (Passingham, 1993; Rushworth et al., 2003), independently of emotion. However, higher activity in the PF/SII and in the SMA for self-directed compared with other-directed social cues, and right lateralized activations for right-handed participants, do not fully support such an explanation. Rather, right-lateralized activations suggest processing related to representation of another's action (Decety and Chaminade, 2003).
In conclusion, the current data clearly demonstrate that the early binding of visual social cues displayed by a congener is achieved in the motor system rather than in the emotional system. We propose that this would allow one to expedite the preparation of an adaptive response, particularly for self-relevant social cues—in this case, another's threatening intention toward oneself.
This work was supported by the European Union Research Funding NEST Program Grant FP6-2005-NEST-Path Imp 043403, Inserm, and Ecole de Neuroscience de Paris and Région Ile-de-France.
The authors declare no competing financial interests.
- Correspondence should be addressed to either of the following : Dr. Laurence Conty, Laboratory of Psychopathology and Neuropsychology, EA 2027, Université Paris 8, 2 rue de la Liberté, 93526 Saint-Denis, France, ; or Dr. Julie Grèzes, Cognitive Neuroscience Laboratory, Inserm, Unité 960, Ecole Normale Supérieure, 29 Rue d'Ulm, 75005 Paris, France,