Abstract
Successful communication between two people depends first on the recognition of the intention to communicate. Such intentions may be conveyed by signals directed at the self, such as calling a person's name or making eye contact. In this study we use functional magnetic resonance imaging to show that the perception of these two signals, which differ in modality and sensory channel, activate common brain regions: the paracingulate cortex and temporal poles bilaterally. These regions are part of a network that has been consistently activated when people are asked to think about the mental states of others. Activation of this network is independent of arousal as measured by changes in pupil diameter.
- Theory of Mind
- mentalizing
- eye gaze
- faces
- names
- medial prefrontal cortex
- paracingulate cortex
- temporal poles
- fMRI
- autism
Introduction
The cognitive process underlying our ability to attribute intentions to self and others has been termed the “Theory of Mind” (Premack and Woodruff, 1978), the “intentional stance” (Dennett, 1987), or “mentalizing” (Frith et al., 1991). Far from being a complex process of conscious inferences, mentalizing is thought to be an automatic cognitive process (Leslie, 1987; Scholl and Leslie, 1999) and may not require a deliberate decision to attend.
Mentalizing appears to rest on a dedicated neural system, as a number of neuroimaging studies indicate (Fletcher et al., 1995; Goel et al., 1995; Baron-Cohen et al., 1999; Gallagher et al., 2000; Brunet et al., 2000; Castelli et al., 2000; Vogeley et al., 2001; McCabe et al., 2001). The three cortical regions most consistently activated during mentalizing are the paracingulate cortex, the temporal poles, and the superior temporal sulcus at the temporoparietal junction (Frith, 2001). Individuals with autism, who typically fail mentalizing tasks, show reduced activation during mentalizing in these regions (Baron-Cohen et al., 1985; Castelli et al., 2002), and in the paracingulate region in particular (Happé et al., 1996).
The question we wished to address was whether the neural circuit involved in mentalizing is also engaged in the initial stage of communication, when the intention to communicate is signaled. Typically, a subject has to recognize that such a signal is directed at himself. If recognizing the communicative intention of another toward oneself triggers the mentalizing mechanism, then perception of a variety of signals, normally associated with the intention to communicate, should activate the neural circuit implicated in mentalizing. Furthermore, if this process is automatic, then deliberate attention to gestures signaling communicative intention should not be required. Processing should be implicit, and activation of this neural circuit should occur even when the two participants do not subsequently interact. Critically, these signals, directed at the self should access this neural system independently of the modality used. We conjectured that the two most common gestures used to initiate communication (calling someone's name and looking directly at someone), which differ in most low-level features, would converge on the same neural substrate, specifically the regions implicated in mentalizing. We used event-related functional magnetic resonance imaging (fMRI) to test these hypotheses.
Subjects were scanned while seeing faces or hearing voices. The faces either looked directly at the subjects or away from them (Figure 1). As a baseline condition, scrambled images of the same luminance and spatial frequency as the faces were included. The voices called the subject's first name or the first name of someone else twice, for example: “John, hey John!” Half of the recordings called the subject's first name and half a control name. Again, we also included a baseline condition made of superimposed speech recordings of 10 subjects. This made individual sentences unintelligible but preserved the tonal qualities of human voices and some rhythmic aspects of the English language. The subjects' explicit task was to press a button when they detected faces with closed eyes or heard surnames rather than first names.
Materials and Methods
Subjects. Sixteen healthy, paid, right-handed subjects (8 male, 8 female; age, 20–31 years) gave written consent to take part in this study, which was approved by the National Hospital for Neurology and Neurosurgery Ethics committee.
Visual stimulus preparation and presentation. Forty faces (half male, half female) were photographed with the subjects looking either straight into the camera or at a point deviated horizontally by 30°, using a digital video camera. Facial expression was neutral. We expected that the predicted effect of seen gaze direction should be found independently of head orientation (George et al., 2001). Thus, for both gaze directions two views were taken (head frontal or head deviated to the side by 30°). Scrambled, nonface control images were derived using an in-house program that preserved the brightness and spatial frequencies of the original faces.
During scanning, every subject was presented two sessions of 80 images (counterbalanced for head orientation and sex) and 40 scrambled images (null events) in a randomized event-related design (Josephs and Henson, 1999), 3.5 sec stimulus onset asynchrony (SOA): 1 sec scrambled image, 1.2 sec face (or null event), 1.3 sec scrambled image. Half of the stimuli faces looked directly at the subject (eye contact) and half to the right or left of the subject (non-eye-contact condition). No task was required from the subjects while they viewed these images, but to secure and test the subjects' attention, six target images (faces with closed eyes) were included, for which the subjects had to press a button. In the statistical analysis of the fMRI data these latter images were modeled as events of no interest and excluded from additional evaluation.
Acoustic stimulus preparation and presentation. Voice recordings of 20 different subjects (half male, half female) calling the subject's name (e.g., “John, hey John!”) or a different name (control condition) twice were made and cut to 1.2 sec with a commercial program (COOLEDIT2000; Syntrillium Software, Phoenix, AZ). The loudness of the stimulus “calling the subject's name” and the loudness of the stimulus “calling a different name” was normalized with COOLEDIT; thus, they were identical. Usually the first name used as a stimulus for one subject was used in a circular form as a control name for other subjects. As a baseline condition, speech recordings of 10 subjects were made and superimposed. This made individual sentences unintelligible but preserved the tonal qualities of human voices and some rhythmic aspects of the English language. When subjects were asked about the baseline recording the most common answer was that it sounded like the background din in a noisy pub. It was clearly thought to be English, but understanding was not possible beyond the occasional comprehension of some single words.
During scanning, every subject was presented two sessions of 80 voice recordings. Half of the voices called the subject's name and half a different control name (one of two alternatives, to parallel the two non-eye-contact conditions; e.g., the subjects looking to the right or to the left). Again 40 null events (only the background of overlapping voices audible) were included. The SOA and timing were kept strictly parallel to the visual task. Again, six target voice recordings (surnames) were included, which the subjects had to identify and react to by pressing a button.
Data acquisition. Scanning occurred in alternating sessions for the visual and the acoustic stimuli. Every subject had to perform two sessions with both stimulus types. Between subjects the block performed first (e.g., visual or acoustic) was randomized. A session lasted ∼7 min. During a session, 153 gradient echo planar imaging T2*-weighted functional volumes [volume repetition time, 3.16 sec; 32 transverse slices; 3 × 3 × 3 mm; repetition time (TR) = 86 msec; echo time (TE) = 40 msec; flip angle = 90] were acquired using a Siemens (Erlangen, Germany) VISION system at 2 T. The first six volumes of each session were discarded to eliminate magnetic saturation effects, yielding 588 volumes per subject that were entered into the analysis.
Alertness was controlled by registering the subjects' reactions to the irrelevant target stimuli and additionally by monitoring the subjects' eye-gaze with an infrared video camera while scanning. On-line monitoring showed that the subjects consistently look at the faces with particular attention to the eyes. In addition, in 14 of the 16 subjects it was possible to sample pupil diameter continuously in the scanner at 60 Hz as a measure of arousal.
Data preprocessing and analysis. All fMRI data were preprocessed and analyzed using the SPM99b software package (Wellcome Department of Cognitive Neurology, London, UK; www.fil.ion.ucl.ac.uk/spm/). Standard time correction, linear image realignment, linear normalization to the stereotactic anatomical space, and spatial smoothing [three-dimensional Gaussian kernel, 8 mm full-width at half maximum (FWHM)] were successively performed for each subject using standard statistical parametric mapping (SPM) methods (Ashburner and Friston, 1997). Low-pass (Gaussian FWHM, 4.0 sec) and high-pass (minimum cutoff period, 53 sec) frequency filters were applied to the time series. Individual events were modeled by a synthetic hemodynamic response and its temporal derivative. The data were analyzed using the General Linear Model to obtain parameter estimates of event-related activity at each voxel, for each condition and each subject, and to generate statistical parametric maps of the t statistic resulting from linear contrasts between different conditions (Friston et al., 1995). These were transformed to a normal distribution and thresholded at p = 0.05 for main effects and corrected for multiple comparisons across the whole brain. The data were then analyzed in a random-effects model to allow population inferences. Based on previous imaging studies, activation was predicted in regions implicated in mentalizing, specifically the paracingulate cortex, the temporoparietal junction, and the temporal poles. At these locations significance was tested using a small volume correction (Worsley et al., 1996), constraining our analysis to a sphere of 16 mm, centered on the region of interest derived from previous studies. Both the individual contrast “eye contact versus averted gaze” and “hearing one's own name versus a different name” activated the paracingulate cortex and the temporal poles (see Results). To formally identify regions that were activated across both contrasts we used the statistic parametric t map of the random-effects analysis of the first contrast as an inclusive mask in the random-effects analysis of the second contrast. Both the first and second task were thresholded on the single-voxel level at p = 0.01. This yields voxels whose probability of being active by chance across both contrasts is 0.001 or less (using the Fisher method for combining p values). Functional images were superimposed on the Montreal Neurological Institute template of 152 averaged T1 images (spoiled FLASH sequence, TE = 10, TR = 18, FA = 30) provided within SPM.
Pupil-diameter data. Eye blinks were removed by interpolating the pupil diameter at the onset and end of a blink. No filtering was applied. The data were mean-corrected and binned into 2.6 sec intervals, separately for each trial type. Subsequently, the event-specific responses were averaged across subjects.
Results
After scanning, subjects were debriefed and asked about their subjective reactions to the stimuli. In the visual scanning session, subjects reported feeling observed when eye gaze was directed at them. The long period of eye contact (1.2 sec) seemed to enhance this effect. The overall impression of the auditory scanning session, as described by the subjects, was that of a noisy environment like a pub with some people calling them or calling someone else. Several subjects mentioned feeling tempted to turn around when their name was called.
Areas activated when subjects perceived being looked at
Our specific prediction was that direct eye contact would elicit a stronger activation in the regions implicated in mentalizing, specifically the paracingulate cortex, the temporal poles, and the temporoparietal junction. Therefore, we used a small volume correction (svc) for these regions (Worsley et al., 1996), constraining our analysis to the region of interest. Because a fixed-effects analysis restricts inferences to the subjects studied, a second-level random-effects analysis was added to determine the extent to which these results could be generalized to the population at large. Activity associated with direct eye gaze compared with averted eye gaze was observed in right paracingulate cortex (x = 8, y = 50, z = 14; p ≤ 0.05, svc) and the left temporal pole (x = -46, y = 6, z = -36; p ≤ 0.05, svc) (Fig. 2, activated areas are in yellow).
Areas activated when the personal name of the subject was called
We then investigated which areas were activated when the subjects were called by their personal names compared with a different first name. As before, we tested our specific prediction that the ostensive act of calling someone's name would activate regions implicated in mentalizing. Again we performed a fixed-effects analysis on individual subjects and then a random-effects analysis across subjects. Significant activation was observed in the right paracingulate cortex (x = 8, y = 60, z = 22; p ≤ 0.05, svc), and in the right (x = 46, y = 4, z = -46; p ≤ 0.05, svc) and left temporal pole (x = -46, y = 2, z = -42; p = 0.001, uncorrected) (Fig. 2, blue).
Calling the subject's own name as opposed to a different name activated additional areas, specifically, the medial surface of the superior frontal gyrus (x = 0, y = 20, z = 58) (Fig. 2A, gray arrow) and the inferior frontal gyrus/insula on both sides (x = 48, y = 32, z = -2; x = -46, y = 18, z = -8; data not shown). Kiehl et al. (2001) saw activation in these regions when subjects detect novel stimuli compared with a baseline condition. Activation of these areas might be specific to the auditory domain, and was not seen in the contrast “faces with direct eye contact” versus “faces with averted gaze.”
Areas activated across both contrasts, both by direct eye gaze and by hearing one's own name
As just shown, both the individual contrast “eye contact versus averted gaze” and “hearing one's own name versus a different name” activated the paracingulate cortex and the temporal poles. To formally identify regions that were activated across both contrasts, we used the statistic parametric t map of the random-effects analysis of the first contrast as an inclusive mask in the random-effects analysis of the second contrast. Both the first and second tasks were thresholded on the single voxel level at p = 0.01. This yields voxels whose probability of being active by chance across both contrasts is 0.001 or less (using the Fisher method for combining p values). The sole regions conjointly activated at this threshold were the paracingulate cortex (x = 6, y = 60, z = 20 on the right) (Fig. 2A) and the temporal pole on the left (x = -46, y = 2, z = -42 on the left) (Fig. 2B); activation maps of the conjunction analysis are added and overlaid on Figure 2 in green. Additional activations at other locations were not seen.
Activation of the paracingulate cortex and the temporal poles is independent of arousal
Mentalizing tasks have consistently activated the paracingulate cortex, the brain area extending from the geniculate paracingulate sulcus to Brodman areas 9 and 10. However, arousal has been shown to activate the middle portion of the anterior cingulate cortex, a close, but clearly distinct area (Chua et al., 1999; Critchley et al., 2000, 2001). To demonstrate that activation of the paracingulate cortex by the Theory of Mind processing is independent of arousal, we monitored pupil diameter as a measure of arousal while scanning (Fig. 3).
The subjects had been instructed to react and press a button when they detected faces with closed eyes or heard a surname instead of a first name called. As expected, there was a task-related increase in pupil diameter, indicating arousal, when subjects detected such stimuli. When no stimulus appeared, arousal remained constant in both conditions, because the subjects continued to wait. When a face with open eyes was presented, pupil diameter decreased and stayed low for the time the stimulus was presented. Subjects may have recognized that they did not have to respond during this trial and relaxed. This decrease in pupil diameter was seen both when the eyes were directed at the subject and when eye gaze was averted to one side. After the presentation of a face, arousal slowly increased again in anticipation of the next event. Hearing a name moderately increased pupil diameter This was most likely attributable to the fact that stimuli were very loud and distinct above the background of scanner noise and the baseline condition of unintelligible voices. There was no difference in arousal when the subject's own name or a different name was called.
Discussion
Communication is achieved by encoding a message, which cannot travel, into a signal, which can, and by decoding this signal at the receiving end (Sperber and Wilson, 1995). To initiate communication a sender has to transfer the message “I want to communicate with you” to the receiver. Inherently such a message is “self-referential” to the receiver. The signal must show the recipient that he himself is meant, that he himself is the addressee of the signal that he just caught, that someone wants to communicate with him.
There are many ways by which communication can be initiated, and very different sensory channels can be used for this. Most unambiguously communication is initiated by calling someone: “Hey, you!” or more specifically his name: “Hey John.” But the information “I want to communicate with you” can also be transmitted using other sensory channels and nonverbal encoding. Examples are touching someone or looking directly at someone.
Mentalizing ability has been pinpointed as a crucial factor in everyday human communication and provides the mechanism needed to recognize the intention to communicate (Leslie and Happé 1989; Sperber and Wilson, 1995; Bloom, 2000). Mentalizing is assumed to rest on a dedicated cognitive mechanism (Leslie, 1987; Scholl and Leslie, 1999), which appears to be subserved by a defined neural system, as demonstrated in a number of neuroimaging studies (for review, see Frith, 2001). As predicted from the theory, we show that two signals directed at a subject that may signal communicative intent, “prolonged eye contact” (versus averted gaze) and calling “someone's own name” (as compared with a different name) activate the mentalizing circuit. Both signals activated the paracingulate cortex and the temporal poles, two of the three critical cortical regions implicated in a variety of mentalizing tasks. Most of these studies targeted explicit mentalizing and were “off line.” The subjects usually had to consider a scenario and retrospectively explain the behavior of the protagonists involved. Only a few studies have investigated “on-line” mentalizing (McCabe et al., 2001).
Our task was an implicit mentalizing task, and also an on-line task insofar as the sender's gesture triggered the intentional stance of the receiver. In a recent study of on-line mentalizing, when participants played the game of stone, paper, and scissors against a human opponent compared with a computer, the paracingulate cortex alone was activated (Gallagher et al., 2002). This suggests that it has a central role in the cortical circuits involved in attributing intentions to others or when these are highly relevant to the self's own action or reaction. The second region activated in the present study by the two self referential signals was in the temporal pole on both sides, a region also consistently activated in previous mentalizing studies. However, a third region that has been activated in these studies, the temporoparietal junction, was not seen in the present study. Additional studies are needed to determine the exact function of these regions.
Individuals with autism are impaired in mentalizing (for review, see Baron-Cohen et al., 2000). The present study was inspired by the observation that autistic subjects, in whom the inability to mentalize appears to be a core deficit, also have huge difficulties recognizing when they are addressed, when they themselves are meant and requested to respond. Lack of orienting to their names is possibly the earliest feature that distinguishes autistic children from mentally retarded children (Osterling et al., 2002). Likewise, responses to eye gaze have long reported to be abnormal in autistic children.
It appears that mentalizing is involved in understanding the signals that a sender emits to initiate communication with someone. It is likely that we attribute mental states such as beliefs, desires, and intentions to the sender while guessing the meaning of these signals. Through mentalizing we distinguish whether someone accidentally bumps us or whether she wants to communicate and signal something to us. Without mentalizing, the nature of these signals as referring to ourselves would not be recognized. A recipient needs to mentalize: I am Chris. I heard the word “Chris.” Is this “me Chris” that is meant? Or any other Chris? Does the person who just called “Chris” want to address me? Able autistic patients of the Asperger type, who show a delayed development of Theory of Mind and for whom this process stays laborious, have sometimes commented that it had been a real surprise when they discovered at the age of 10 or 12 that a person actually wanted to talk to and communicate with them when calling their name (Gerland, 1997).
For normal people, in contrast, mentalizing appears to be a rapid automatic process that does not require conscious effort. To underpin this observation, the present study targeted implicit processing, that is, the subject's explicit task was to detect certain stimuli (faces with eyes closed in the visual condition, surnames rather than first names in the auditory condition). These appeared infrequently and the subject had to press a button when they appeared. The subject was not asked to respond to the stimuli of apparently communicative intent (voices calling the subject's own first name or faces looking intently at the subject, in contrast to voices calling another first name or faces looking away).
The activation we demonstrate is independent of arousal. Mentalizing tasks have consistently activated the paracingulate cortex, the brain area extending from the geniculate paracingulate sulcus to Brodman areas 9 and 10. This region is ∼3 cm anterior and clearly distinct from the middle portion of the anterior cingulate cortex activated by arousal (Chua et al., 1999; Critchley et al., 2000, 2001). We demonstrated that activation of the paracingulate cortex by Theory of Mind processing is independent of arousal by monitoring pupil diameter. Pupil diameter changed only in relation to the explicit task, consisting of detecting the target stimuli. No change was observed between the two implicit contrasts of interest “direct eye contact versus averted gaze” and “hearing their own name versus a different name.”
In summary, the present study provides empirical support for the physiological basis of the initial stages of communication addressing the self and its close link to mentalizing. The task was on-line insofar as the signals automatically triggered the intentional stance of the receiver. The conjunction of both types of self-addressing signals, prolonged eye contact (versus averted gaze) and calling someone's first name (compared with a different name) activated two of the three critical cortical regions implicated in mentalizing.
Footnotes
This research was facilitated by the Medical Research Council Cooperative in “Analysis of cognitive impairment and imaging of cognition” at University College London. We thank Oliver Josephs for providing the program to create scrambled baseline images from the original faces.
Correspondence should be addressed to Dr. Knut Kampe, Neurologische Klinik, Haus S10, Martinistrasse 52, 20246 Hamburg, Germany. E-mail: k.kampe{at}uke.uni-hamburg.de.
Copyright © 2003 Society for Neuroscience 0270-6474/03/235258-06$15.00/0