Abstract
Neuroimaging and neuropsychological studies implicate both frontal and temporoparietal cortices when humans reason about the mental states of others. Here, we report an event-related potentials study of the time course of one such “theory of mind” ability: visual perspective taking. The findings suggest that posterior cortex, perhaps the temporoparietal cortex, calculates and represents the perspective of self versus other, and then, later, the right frontal cortex resolves conflict between perspectives during response selection.
Introduction
“Theory of mind” (ToM) judgments about what others see, know, or think require a range of functional processes and recruit a reliable set of brain regions (Frith and Frith, 2003; Carrington and Bailey, 2009). It is commonly supposed that this constitutes a ToM network, but we currently lack evidence about the timing of these functional and neural processes in real time. Interpretations of the first functional magnetic resonance imaging (fMRI) studies argued that mPFC was of primary importance for calculating and representing someone else's perspective (Gallagher et al., 2000; Frith and Frith, 2003), while later studies emphasized the importance of temporoparietal junction (TPJ) (Saxe and Kanwisher, 2003). The likely involvement of both of these regions converges with evidence from temporally sensitive, event-related potential (ERP) recordings. Several studies have shown that ToM judgments elicit a late slow activity over frontal (Sabbagh and Taylor, 2000; Liu et al., 2009) and right posterior areas of scalp (Liu et al., 2009), beginning 500–1000 ms after the experimental stimulus. However, since these frontal and posterior effects were observed in different ERP studies, it is not possible to make inferences about the relative involvement of these brain regions across time.
Recent fMRI studies have led to more nuanced claims about potential roles for both the TPJ and PFC (Perner et al., 2006; Schmitz and Johnson, 2007). Nonetheless, it remains unclear how these regions interact with each other or with functional and neural processes for executive function. For example, neuroimaging evidence suggests that lateral prefrontal cortex (lPFC)—a region associated with inhibitory control (Vendrellet al., 1995; Ridderinkhof et al., 2004)—is recruited when participants judge perspectives that differ from their own (Vogeley et al., 2001), and neuropsychological evidence suggests that strong egocentric interference may result from injury to right lPFC (Samson et al., 2005). However, it remains to be discovered whether lPFC is supporting calculation of another's perspective, or the process of selecting between self and other perspectives to make a judgment. Evidence about the timing of activity associated with these different neural regions would be critically informative about the potential interrelationships of ToM processing and executive functions.
Here, we combine ERP recordings with a recently developed task in which participants make rapid ToM judgments about the visual perspectives of self and other, which could be mutually consistent or inconsistent (Samson et al., 2010). Findings from this paradigm suggest a temporal and functional separation between an earlier process of perspective calculation that does not require inhibitory control and a later process of perspective selection that does require inhibitory control (Qureshi et al., 2010). We expected that perspective calculation would yield an ERP component that discriminated Self trials from Other trials, the topology of which would indicate whether frontal or temporoparietal systems were recruited. We further expected that perspective selection would yield an ERP component that discriminated between trials where perspectives were consistent versus inconsistent, and we predicted this to occur over lateral prefrontal regions of scalp, reflecting recruitment of neural systems for inhibitory control.
Materials and Methods
Overview
On each trial, participants viewed a picture of a room with discs on the wall, and an avatar whose position in the room meant he saw fewer discs than the participant (perspectives Inconsistent) or all of the discs that the participant could see (perspectives Consistent). On Other-perspective trials, the picture was preceded by a sentence describing the avatar's perspective (e.g., “He sees three”), and participants judged whether the avatar's perspective corresponded to what was said in the sentence (Samson et al., 2010) (Fig. 1). On Self-perspective trials, the picture was preceded by a sentence such as “You see three,” and participants judged the number of discs that they themselves could see. Therefore, this paradigm orthogonally varied Perspective judgments (Self vs Other) with the Consistency between self and other perspectives. Response times and scalp potentials were recorded from the onset of the test picture (Fig. 1).
Participants
Participants were 17 undergraduate students (11 female, 6 male; 2 left handed, 15 right handed) at the University of Birmingham. These participants had a mean age of 21.5 years (range: 18–38 years). Data from five additional participants were excluded from analysis when it was determined that they produced <30 artifact-free trials in one or more of the four conditions.
Design and procedure
Instructions included a detailed description of the procedure and an instruction to respond as quickly and accurately as possible. Practice trials were completed with feedback until the participant successfully answered at least one question for each of the four conditions (Self-Consistent, Self-Inconsistent, Other-Consistent, Other-Inconsistent).
On each trial, participants viewed three fixation stimuli that were presented without a perceptible interval. The first fixation cross (600 ms) was followed by a second fixation cross (1800 ms) accompanied by an auditory stimulus, followed by a third fixation cross for an additional variable interval of 150–350 ms. The auditory stimulus was either “He sees N” (for Other trials) or “You see N” (for Self trials), where N ranged from 1 to 3. The fixation stimuli were followed by a test picture, depicting an avatar in a room with between 1 and 3 discs on the wall, so that the number of discs was within the range that could be enumerated quickly and accurately via subitization (Trick and Pylyshyn, 1994). On half of the trials, the auditory stimulus matched the picture, and on half it did not. Participants pressed one of two response pad keys to indicate whether or not the auditory stimulus correctly described the picture (Correct = Key 1, Incorrect = Key 2). Response time was measured from the onset of the test picture. The picture was displayed on the screen until a response occurred or for a maximum of 1000 ms. Following practice trials, participants completed 768 test trials, equally divided between the four conditions (Self-Consistent, Self-Inconsistent, Other-Consistent, Other-Inconsistent). Self and Other trials were pseudorandomly mixed within each block of trials, such that no block contained more than three trials in a row without a change in consistency, perspective, response button, and direction of avatar. The experiment was presented using E-prime 2.0 (Psychology Software Tools).
EEG recording
EEG was recorded continuously using a 128 channel Hydrocel Geodesic Sensor Net (HCGSN, Electrical Geodesics) (Tucker, 1993), referenced to a single vertex electrode, Cz (sample rate = 500 Hz; on-line high-pass filter = 0.1 Hz). Electrode impedances were kept at <80 KΩ. Visual test stimuli were presented with dimensions of 20 cm in width and 11.8 cm in height on a video monitor at a viewing distance of 50 cm, and therefore subtended a visual angle of ∼22.6° horizontal by ∼13.5° vertical.
EEG recordings were processed off-line using NetStation 4.2 software (Electrical Geodesics). The data were filtered (bandpass filter = 0.3–40 Hz, finite impulse response) and segmented to epochs beginning 100 ms before and continuing 850 ms after the presentation of the visual stimuli. Data were processed using an artifact detection tool that marked channels bad if the recording was poor for >20% of the time (threshold maximum − minimum, >100.00), if eye blinks occurred (threshold maximum − minimum, >100.00) and/or if eye movements occurred (threshold maximum − minimum, >55.00). Segments were marked bad if they contained >10 bad channels, eye blinks, and/or eye movements. Bad channels in the data were replaced using a spherical spline interpolation algorithm (Srinivasan et al., 1996). Each trial was then examined individually to remove any trials with remaining eye-blink or eye-movement artifacts from further analysis. The data were then averaged for each participant, re-referenced to an average reference, and baseline corrected to a 100 ms prestimulus interval.
Source analysis: estimation of equivalent current dipoles and current densities
Average MRI.
An average head was used for the analyses. The average came from 87 participants ranging in age from 20 to 24 years. The MRI data were collected on a Siemens Medical Systems 3T Trio with an overall duration of approximately 15 min. A 3D, T1-weighted, MPRAGE radio frequency-spoiled rapid flash scan in the sagittal plane and a T2-weighted, multislice, axial 3D, dual Fast Turbo spin-echo scan in the sagittal plane were used. The T1 scans had 1 mm3 resolution and sufficient FOV to cover from the top of the head down to the neck.
The T1-weighted MRI images were averaged with the procedures described by Sanchez et al. (2011) (compare with Avants et al., 2008). The procedure involved an iterative process where a tentative MRI average was made. The original MRI volumes were then registered to this tentative volume and transformed in size and orientation with nonlinear registration [using ANTS (Advanced Normalization Tools)] (Avants et al., 2008) into the tentative average space. A new average was constructed from the transformed MRI files. This new average then became the next reference template for the registrations. This procedure results in an average MRI template that is approximately the same volume as the average volume size from the individual MRIs. The average MRI was rotated so that its orientation was approximate to that of the ICBM-152 template, which is oriented to MNI space (Collins et al., 1994, MNI-305 defined; Mazziotta et al., 2001, ICBM-152 defined; Joshi et al., 2004).
Electrodes.
Electrodes for the average head were obtained from an average electrode map. This came from a database of 93 individuals whose “geodesic sensor net” (GSN, EGI) (Tucker, 1993) electrode positions were measured with a Geodesic Photogrammetry System (EGI) (Russell et al., 2005). The participants also had a T1-weighted structural MRI. The electrodes were mapped into the average MRI space by registering the T1 weight of each individual to the average MRI with a 12 degrees of freedom affine registration (Linear Image Registration Tool, Oxford Centre for Functional MRI of the Brain) (Jenkinson and Smith, 2001), transforming the electrodes into the average space, and averaging the electrode positions. This resulted in electrode positions located on the average MRI derived from the individual participants.
Processing in CURRY.
The average MRI with the electrode positions was used as the head model in Curry 6.0.16 (Neuroscan Compumedics), which allows for both dipole and S-LORETA-based source estimations. The volume conductor model was a three-compartment realistic boundary element model (Fuchs et al., 1998), modeling the surface of the skin, the outside/skull, and the inside of the skull (liquor) with a total of 6361 triangle nodes. The conductivities of the skin, skull, and liquor were set to 0.33, 0.042, and 0.33, respectively. The average head had segmented gray and white matter. Source locations for the current dipoles and current densities were constrained to 3 mm tetrahedra volumes of the gray matter. Talairach locations were obtained from CURRY, which translates the head size and positions of the average MRI into the head size and positions for the Talairach stereotaxic atlas (Talairach and Tournoux, 1988).
Results
Behavioral results
Response times
An ANOVA with Perspective and Consistency as within-subjects factors revealed a significant effect of Perspective (F(1,16) = 31.753; p < 0.001, ηp2 = 0.665; Self = 526.53 ms; Other = 569.07 ms), a significant effect of Consistency (F(1,16) = 210.534; p < 0.001, ηp2 = 0.929; Consistent = 518.02 ms; Inconsistent = 577.58 ms), and an interaction between Perspective and Consistency (F(1,16) = 7.337; p = 0.015, ηp2 = 0.314). The effect of Consistency was greatest for Other perspective-taking (t(16) = 13.444, p < 0.001; Consistent = 533.11 ms; Inconsistent = 605.03 ms), but also significant for Self Perspective-taking (t(16) = 6.904, p < 0.001; Consistent = 502.93 ms; Inconsistent = 550.13 ms).
Errors
A similar ANOVA on errors revealed no significant effect of Perspective (F(1,16) = 0.001, p = 0.974, ηp2< 0.001). There was a significant effect of Consistency (F(1,16) = 30.700, p < 0.001, ηp2 = 0.657; Consistent = 2.45%; Inconsistent = 9.84%). There was not a significant interaction between Perspective and Consistency (F(1,16) = 2.561, p = 0.129, ηp2= 0.138).
ERP results
ERP components
Five ERP components were identified via the piloting of an initial eight adult participants whose data are not included in the current report. These components were early components peaking at ∼200 ms over the left and right lateral frontal (FL190; negative-going) and central occipital cortices (P200; positive-going), respectively; a middle-latency component peaking between ∼400 and 500 ms recorded from electrodes over the left and right temporoparietal cortices (TP450; positive-going); and a late frontal slow-wave (LFSW) component with a mean amplitude difference between 600 and 800 ms recorded over the right frontal cortex. These same components were also clearly observed in the 17 participants in the current experiment. Electrodes used to measure each component were determined by examination of both the grand average and individual subject data of the pilot participants, and then confirmation of these as appropriate electrodes for the final 17 participants whose data are reported here. The electrodes selected for pilot data analysis were also deemed to be the most appropriate for the final 17 participants, and are as follows: left frontal: 25, 26, 27, 32, 33; right frontal: 1, 2, 8, 122, 123; occipital: 69, 70, 73, 74, 75, 81, 82, 83, 88, 89; left temporoparietal: 30, 36, 37, 41, 42, 46, 47, 52, 53; right temporoparietal: 86, 87, 92, 93, 98, 102, 103, 104, 105. Peak amplitudes, latencies to peak amplitudes, and mean amplitudes (i.e., LFSW) were measured for each individual electrode in the relevant montages and then averaged within relevant regions for each participant. Time windows for each component were as follows: F190: 160–240 ms; P200: 165–230 ms; TP450: 325–525; LFSW: 600–800 ms. Only trials in which participants correctly responded “yes” were included in the analyses.
Three-factor ANOVAs including Perspective (self, other), Consistency (consistent, inconsistent), and Hemisphere (left, right) as within-subjects factors were conducted on the latency and amplitude data for the temporoparietal and lateral frontal components. Two-factor ANOVAs including Perspective and Consistency as within-subjects factors were conducted on the latency and amplitude data for the central occipital component.
ERP component effects
The latencies of the early frontal cortex (FL190) component exhibited a Perspective-by-Hemisphere interaction (F(1,16) = 39.425, p < 0.001, ηp2> 0.99), with longer latency for Self than Other over the right hemisphere, and the reverse pattern over left hemisphere. The amplitudes of this same component also exhibited a Perspective-by-Consistency-by-Hemisphere interaction (F(1,16) = 9.969, p = 0.006, ηp2 = 0.842), whereby Self judgments on Inconsistent trials elicited a larger amplitude response over the right hemisphere only. Simultaneous to this, the early central occipital component (P200) also exhibited a Perspective-by-Consistency interaction (F(1,16) = 4.53, p = 0.049, ηp2 = 0.516), whereby Self judgments on Inconsistent trials elicited a larger amplitude response.
The latencies of the middle-latency temporoparietal component (TP450) exhibited a main effect of Perspective, whereby latencies for Other perspective judgments were longer than for Self-perspective judgments (F(1,16) = 53.315, p < 0.001, ηp2 = 1.0). However, this main effect was moderated by a Perspective-by-Consistency interaction (F(1,16) = 10.619, p = 0.005, ηp2 = 0.864), whereby latencies for Other perspective judgments were longer than for Self-perspective judgments, with Other Inconsistent trials eliciting the longest latency responses (Self-Consistent = 417 ms, SE = 8.6; Self-Inconsistent = 409 ms, SE = 9.3; Other-Consistent = 444 ms, SE = 9.0, Other-Inconsistent = 476 ms, SE = 7.4). The amplitudes of the TP450 exhibited a main effect of Consistency (F(1,16) = 6.628, p = 0.020, ηp2 = 0.293) and a main effect of Hemisphere (F(1,16) = 9.903, p = 0.006, ηp2 = 0.382), whereby amplitudes were larger for Consistent compared with Inconsistent trials, and larger over the right hemisphere compared with the left. TP450 amplitudes also exhibited a Perspective-by-Hemisphere interaction (F(1,16) = 6.026, p = 0.026, ηp2 = 0.274), and a trend for a Consistency-by-Hemisphere interaction (F(1,16) = 4.217, p = 0.057, ηp2 = 0.209), where in both cases the pattern was for condition differences seen in Figure 2 to be larger in the left hemisphere compared with the right hemisphere. Most notably, the Self Inconsistent condition exhibited right lateralized activity, whereas all other conditions were distributed more bilaterally (Fig. 2).
The LFSW (600–800 ms) exhibited a Consistency-by-Hemisphere interaction (F(1,16) = 39.425, p < 0.001, ηp2> 0.99), whereby mean amplitudes differed for Consistent versus Inconsistent over the right hemisphere only (Fig. 3).
Source estimates
Although the TP450 effect was limited to a particular region of the scalp, it occurred simultaneously with activity of a larger posterior component. The larger component during this time window likely reflects occipital, temporal, and parietal activity that is shared across our conditions. However, the existence of this broad component simultaneously with the TP450 component of interest complicates unconstrained source analysis procedures, such as s-LORETA and unconstrained current source dipole analyses, particularly when deep sources are implicated. This complication was, in fact, observed in S-LORETA analyses. These analyses, which followed the constrained dipole analyses described here, confirmed that the source solutions using this unconstrained method were dominated by deep sources in the occipital, temporal, and parietal lobes in all conditions. Therefore, given our strong hypothesis that the differences observed in the TP450 component reflect differences in the posterior ToM region, the temporoparietal junction, we conducted confirmatory equivalent dipole analyses with dipoles located in the gray matter of the temporoparietal junction during the time of the TP450 component (fixed location, rotating amplitude vector) (Fig. 4). A model with a unilateral dipole in the right TPJ (rTPJ) (Talairach coordinates: 60, −40, 20) accounted for 63, 60, or 70% of variance for the Self-Consistent, Other-Consistent, and Other-Inconsistent conditions, respectively; adding a second dipole in the left TPJ (lTPJ) [Talairach coordinates: +/−60, −40, 20 (e.g., bilateral TPJ dipoles)] added a significant amount of explained variance (27, 27, 21%, respectively). For the Self-Inconsistent scalp ERP, the unilateral rTPJ model accounted for 76% of the variance, and a bilateral model added only 13% explained variance. For all of these models, additional dipoles located in the medial prefrontal lobes did not add significant additional explained variance (<4%). This confirmatory source analysis for the TP450 component is consistent with a model of bilateral temporoparietal junction activity for Self-Consistent, Other-Consistent, and Other-Inconsistent; predominantly unilateral right temporoparietal junction involvement for Self-Inconsistent; and primarily posterior and lateral sources of brain activity underlying the TP450 results.
To estimate sources for neural activity during the time of the late frontal slow-wave component, we used Curry's S-LORETA with Lp Norm equal to 2, with sources constrained to the gray matter. Three sources were observed for the 600–800 ms time window of the late frontal slow wave component difference. The first of these was a source in the right anterior portion of the inferior frontal gyrus and middle frontal gyrus (Talairach coordinates: 37, 55, −5) that differed for the Consistent versus Inconsistent trials for both the Self and Other conditions. The second was a medial orbital frontal source that was present in all four conditions (Talairach coordinates: −1, 37, −21). Finally, a left temporal pole source was observed in all conditions except Self Inconsistent (Talairach coordinates: −24, 8, −60) (Fig. 5).
Discussion
The current study combined a ToM task that required rapid processing of information about perspectives together with EEG recording. It is commonly supposed that ToM judgments require multiple functional and neural processes, perhaps including some that are truly specific to ToM, and most likely also including generic processes for executive control (Van Overwalle, 2009). However, most studies of ToM have been unable to distinguish between these component processes and study their relative time course. The ToM task in the current study is unusually well suited to this purpose because existing behavioral data indicate a distinction between an initial process of perspective calculation followed by a process of selecting the appropriate (Self or Other) perspective to respond on a given trial. Perspective calculation is not disrupted when participants perform a dual task that taxes inhibitory control (suggesting that it does not require general cognitive processes for inhibition), whereas the same dual task does disrupt perspective selection (Qureshi et al., 2010). By combining this task with EEG recording that allowed neural responses to the task to be monitored with high temporal precision, we found several distinct neural processes, indexed as ERP components. We discuss these components in turn.
Two early components were observed over the lateral frontal and central occipital cortices peaking at 190 and 200 ms, respectively, after the presentation of the test picture. Both components showed larger amplitudes for Self-Inconsistent trials in comparison with the other three trial types. A distinctive characteristic of this condition is that it is the only trial type in which participants must attend to discs appearing on the wall behind the character. In contrast, on both Self-Consistent and Other-Consistent trials, the discs all appear in front of the character, whereas on Other-Inconsistent trials, participants must attend to discs appearing in front of the character while ignoring those behind the character. Thus, we believe that both the early frontal and occipital components are likely to be an artifact of strategic visual attention, and so, these effects are unlikely to be informative about ToM processing per se.
The ERP component recorded from leads over the bilateral temporoparietal scalp (TP450) was the first component to reflect the processing costs of calculating the Other's perspective, reflected in longer latencies for Other-perspective judgments that were longest of all when Other was inconsistent with Self. Recall from the introduction that there is much debate about the relative importance of mPFC and TPJ regions in ToM (Frith and Frith, 2003; Saxe and Kanwisher, 2003). The confirmatory equivalent current dipole source analyses are consistent with a primacy for the TP cortex versus the medial frontal cortex in the initial calculation of simple visual perspectives. This interpretation converges with the view that the mPFC is mainly necessary for ToM tasks that entail more complex or uncertain judgments than the simple visual perspectives used in the current study (Aichhorn et al., 2006; Mitchell et al., 2006). It is also notable that all four conditions appeared to have sources in right posterior cortex at the time of the TP450, whereas there was more variability in the recruitment of left posterior cortex. This observation is consistent with neuroimaging evidence for functional differentiation between right and left TPJ, and indeed fits with the view that right TP cortex may represent any psychological perspective (Self, Other, Consistent, Inconsistent), whereas left TP cortex may be indexing differences in perspective and ownership of perspective (Perner et al., 2006; Aichhorn et al., 2009).
Finally, our findings extend existing studies in identifying the functions served by frontal processes activated during this simple ToM task. First, the late frontal component effect was right lateralized, and was sensitive to inconsistency between perspectives rather than whether participants judged Self or Other perspective. This converges with evidence that right lateral PFC in particular may be involved in managing interference between perspectives (Samson et al., 2005; Saxe et al., 2006). Second, the frontal component was the final component observed, and overlapped in time with participants' responses, suggesting that they must have already completed the calculation of perspectives. Given that behavioral data indicate that this process of perspective selection may be selectively disrupted when participants undertake a dual task that taxes inhibitory control (Qureshi et al., 2010), it seems plausible that differences in this late slow-wave component reflect the differential recruitment of executive processes for inhibitory control, which are strongly associated with right lateral PFC (Vendrell et al., 1995). This component was also associated with sources in medial orbital frontal cortex and left temporal pole, regions that have been implicated in some fMRI studies of ToM (Frith and Frith, 2003; Van Overwalle, 2009). However, unlike the source in right lateral PFC, these sources were not reliably sensitive to perspective consistency or Self versus Other perspective, and so the variables of our perspective-taking task cast no new light on the function of these regions.
In sum, our findings reveal both functional and temporal differentiation in the neural processes recruited for a type of ToM ability, visual perspective taking. Following initial visual analysis of the stimuli, posterior regions of cortex, perhaps the left and right temporoparietal cortex, are involved in calculating and representing Self versus Other perspectives. A later right frontal component reflects the exercise of executive functions for cognitive management of perspective selection. These findings reveal the stages of processing involved in ToM judgments for visual perspective taking, and also open the way for asking how ToM processing develops, as well as how it may be impaired in disorders of social cognitive functioning (Senju et al., 2009).
Footnotes
We thank Dr. Sandra Utz and Consuelo del Grande for assistance with data collection, and Emma Cross for assistance with data collection and analysis.
The authors declare no competing financial interests.
- Correspondence should be addressed to Ian A. Apperly, School of Psychology, University of Birmingham, Edgbaston, Birmingham B15 2TT, West Midlands, United Kingdom. i.a.apperly{at}bham.ac.uk