Abstract
In human lateral temporal cortex, some regions show specific sensitivity to human motion. Here we examine whether such effects reflect a general biological–nonbiological organizational principle or a process specific to human–agent processing by comparing processing of human, animal, and tool motion in a functional magnetic resonance imaging (fMRI) experiment with healthy participants and a voxel-based lesion-symptom mapping (VLSM) study of patients with brain damage (77 stroke patients). The fMRI experiment revealed that in the lateral temporal cortex, the posterior superior temporal sulcus shows a preference for human and animal motion, whereas the middle part of the right superior temporal sulcus/gyrus (mSTS/STG) shows a preference for human and functional tool motion. VLSM analyses also revealed that damage to this right mSTS/STG region led to more severe impairment in the recognition of human and functional tool motion relative to animal motion, indicating the causal role of this brain area in human–agent motion processing. The findings for the right mSTS/STG cannot be reduced to a preference for articulated motion or processing of social variables since neither factor is involved in functional tool motion recognition. We conclude that a unidimensional biological–nonbiological distinction cannot fully explain the visual motion effects in lateral temporal cortex. Instead, the results suggest the existence of distinct components in right posterior temporal cortex and mSTS/STG that are associated, respectively, with biological motion and human–agent motion processing.
Introduction
Compelling findings in cognitive neuroscience show that the animate–inanimate distinction plays a fundamental role in the neural organization of perceptual and cognitive processes in both humans and monkeys (Martin et al., 1996; Caramazza and Shelton, 1998; Kriegeskorte et al., 2008; Haxby et al., 2011). In the visual motion processing stream in human lateral temporal cortex, some regions have been found to be differentially sensitive to motion of biological entities relative to other types of motion, with the posterior superior temporal sulcus (pSTS) seemingly specifically involved in the recognition of (articulated) biological motion (Beauchamp et al., 2002, 2003; Grossman et al., 2005; Saxe et al., 2004; for review, see Blake and Shiffrar, 2007; Grosbras et al., 2012).
The precise role of the pSTS and nearby regions in the recognition of biological motion remains unclear. One common caveat to the biological motion research is that the critical evidence is typically based on stronger activation for human motion relative to other stimuli, most commonly, scrambled motion (Grossman and Blake, 2002; but see Pelphrey et al., 2003; Gobbini et al., 2007). Thus, it is not clear whether the observed effects reflect a human–nonhuman or a general biological–nonbiological distinction along which the visual motion processing stream is organized (Chouchourelou et al., 2013). A further debate is whether the potential human–nonhuman distinction is to be explained by the effect of socially relevant motion (Kaiser et al., 2012). Human motion is intrinsically volitional and is naturally interpreted as being intentional (Lahnakoski et al., 2012). However, agency need not be social: solitary, instrument-directed (e.g., hammering), and intransitive (e.g., walking) acts need not have a social dimension. Thus, the question remains whether there are brain regions that are specifically involved in human (conspecific) motion recognition, as distinct from more general biological motion recognition, and whether such regions also show specialization for nonsocial human agency.
To further clarify the organization of visual motion processing regions, we distinguish among three categories of object motion: human motion, animal motion, and functional tool motion. The latter type of motion depicts a moving tool in the manner typical of its use by humans. Such motion is not articulated biological motion and does not have apparent social valence, but it implies a human agent: typical tools do not move by themselves, and their patterned, systematic motion is usually the result of human manipulation. Thus, a brain region's preferential response to both human and functional tool motion compared with animal motion would reflect a human agent effect that is not to be attributed to articulated motion or social human agency. In contrast, preference to human and animal motion would indicate a more general biological effect. We tested neural responses to these three types of motion in a functional magnetic resonance imaging (fMRI) experiment with healthy participants and further examined the causal role of specific brain regions on the processing of the three types of motion stimuli using the voxel-based lesion-symptom mapping (VLSM) approach (Bates et al., 2003) in a group of stroke patients.
Materials and Methods
Experiment 1: fMRI experiment with healthy participants
Participants
Sixteen college students (six males) from Beijing Normal University participated (with pay) in the study. They were all native Mandarin Chinese speakers and right handed [Edinburgh Handedness Inventory (Oldfield, 1971)], with a mean age of 20.7 years (range, 18–23) and 14.4 (range, 13–16) mean years of education. They had normal or corrected-to-normal vision. None suffered from psychiatric or neurological disorders, had ever sustained a head injury, or were on any psychoactive medication. All participants gave written informed consent approved by the Institutional Review Board of the Beijing Normal University (BNU) Imaging Center for Brain Research.
Materials and procedures
Participants were asked to perform a delayed matching-to-sample task on motion stimuli. The motion stimuli included point-light animations of human (e.g., running), animal (e.g., jumping), and tool (e.g., hammering) movements. We created the point-light animation of human motion following the procedure described by Johansson (1973). Thirteen light-sensitive small dots were adhered on the major joints of an actor (head, shoulders, elbows, wrists, hips, knees, and ankles). The actor performed the actions. A 3D motion analysis-capture system (Eagle-4 model; Motion Analysis Corporation, www.motionanalysis.com) with eight cameras captured each action and transformed them into the point-light stimuli. We created the point-light animation for tool motion using a similar method: light-sensitive dots attached on a set of tools (e.g., a scissors had five dots adhered: two on knife blades, two on handles, and one on a joint; mean number of dots, 8.1), and thus only the motion of the tools was captured. The actor manipulated the tools in their typical manner. The point-light animal motion was created by scanning the Muybridge photography collection of animals in motion, sequencing the scans, and obtaining the main joint positions, and it was then scripted in Matlab (e.g., an elephant had 12 dots: head, nose, neck, shoulders, hip, knees, forefeet, hindfeet; mean number of dots, 9.9; courtesy of Emily Grossman, University of California Irvine, Irvine, CA). Ten items of each category were selected, and mirror-reverse stimuli were constructed to increase item numbers. We also included a nonobject point-light “global motion” condition as a baseline, where in each item all but one point-light moved in the same direction (mean number of dots, 10). The exception point-light was included to make participants pay more attention. There were 48 global motion stimuli, and they were run on Psychtoolbox version 3.0.9 (Brainard, 1997; Pelli, 1997) in Matlab 2009b (Mathworks). The participants were shown the entire list of the stimuli before entering the scanner for familiarization.
In the scanner, participants viewed the stimuli binocularly through a mirror attached to the head coil adjusted to allow foveal viewing of a back-projected monitor (refresh rate, 60 Hz; spatial resolution, 1024 × 768). The distance between the screen and participants was 110 cm. The width and height of the point-light stimuli subtended ∼13.5 × 10.1° on the screen. The size of the dots was around 0.16°. The stimuli were presented in blocks of four items from the same condition (human, animal, tool, or global motion). For each block, participants were instructed to judge whether the last item was identical to any of the first three in terms of item identity (human, animal, and tool conditions) or overall movement direction (for global motion condition). Note that for the human condition, the judgment was whether the items were the same action; for tools and animals, the judgment was whether the items were the same object. We used this task to prevent judgment on the mere basis of low-level perceptual features. Within a block, each of the four stimuli lasted for 2.5 s with a 1 s fixation cross appearing between stimuli. The fixation cross before and after the last trial was colored green to cue the participant about the beginning and end of each block; the last fixation stimulus lasted for an additional 1 s to allow for the response. Participants responded “yes” or “no” by pressing a button with the thumb of the right or left hand after seeing the last fixation cross. Thus, each block was 15 s long. A fixation cross of 6 s occurred between blocks, as well as before the first block and after the last block. Each run included 16 blocks, with 4 blocks of each category (human, tool, animal, global motion), and lasted for 5 min, 42 s. There were three runs, for a total of 17 min, 6 s. For each condition, there were 12 blocks (48 trials) in total. Each motion item of the three critical conditions was repeated four to five times. The block order across all runs was assigned in a Latin-square fashion; the order of runs was pseudo-randomized across participants.
MRI data acquisition
Structural and functional MRI data were collected with a 3T Siemens Trio Tim scanner at the BNU imaging center. A high-resolution 3D structural data set was acquired with a 3D magnetization-prepared rapid gradient echo (MPRAGE) sequence in the sagittal plane [time repetition (TR), 2530 ms; time echo (TE), 3.39 ms; time inversion (TI), 1100 ms; field of view (FOV), 200 × 200 mm2; flip angle (FA), 7°; matrix size, 256 × 256 mm2; voxel size, 1 × 1 × 1.33 mm3; slice number, 144 slices; slice thickness, 1.33 mm]. BOLD signals were measured with an EPI sequence (TR, 2000 ms; TE, 30 ms; FOV, 200 × 200 mm2; FA, 90°; matrix size, 64 × 64 mm2; voxel size, 3.125 × 3.125 × 4 mm3; slice number, 33 slices; slice thickness, 4 mm; slice orientation, axial). E-prime 2.0 was used for stimulus presentation and response recording. The whole scanning time for each participant was about 30 min.
fMRI data analysis
fMRI data were analyzed using SPM8 (Wellcome Trust Centre for Neuroimaging, http://www.fil.ion.ucl.ac.uk/spm/) and Matlab 7.9 (Mathworks, http://www.mathworks.com). The first 6 s (3 volumes) in each functional run were discarded to include only data collected when the scanner had reached in the magnetic steady state. Preprocessing of the functional data included 3D motion correction with respect to the mean image of the functional images, coregistering 3D structural image to mean image, and normalizing functional images to Montreal Neurological Institute (MNI) standard space with unified segmentation on structural image and spatial smoothing (Gaussian filter, 6 mm full-width half-maximum). During the normalization to MNI space, all the functional images were resampled to 3 × 3 × 3 mm3 resolution.
All functional data were then analyzed using the general linear model (GLM). We included four regressors of interest corresponding to the four conditions (three critical categories and one global motion condition) and six head motion parameters as regressors of no interest.
In the whole-brain analyses, random-effect GLM analyses were conducted to analyze the group data. We examined the three effects of interest: human motion, human–agent, and biological motion effects. Regions showing a human motion effect were derived from the contrast of human motion versus global motion, with threshold set at p < 0.05 corrected with the false discovery rate (FDR) and the cluster size (k) at >20 voxels (540 mm3); those showing a human–agent effect were computed from the conjunction of human motion > animal motion and functional tool motion > animal motion; those showing a biological motion effect were computed from two kinds of conjunctions: (1) the conjunction of human motion > tool motion and animal motion > tool motion and (2) the conjunction of human motion > global motion and animal motion > global motion. We used these two kinds of baseline for examination of biological motion effects because pSTS has been shown to be activated by implied motion stimuli (Beauchamp et al., 2002; Peuskens et al., 2005) and functional tool motion may imply biological motion (human hand motion) to some extent. The threshold for individual contrast in the conjunction analyses were set at FDR p < 0.05 and k > 20 voxels. All results were shown in the MNI templates and projected onto the MNI brain surface using the BrainNet viewer (http://www.nitrc.org/projects/bnv/) (Xia et al., 2013).
Experiment 2: VLSM experiment with patients
Participants
The stroke patients and healthy controls who participated in this behavioral and imaging study were all Mandarin Chinese native speakers with normal vision and hearing (without or with correction), and all provided written informed consent. The study was approved by the Institutional Review Board of the BNU Imaging Center for Brain Research. More detailed information about the two groups of participants is presented below.
Patients.
Seventy-seven (63 males) individuals who suffered from stroke were recruited from the China Rehabilitation Research Center. The mean age was 48 years (SD, 12; range, 20–76), and the mean years of formal education was 13 years (SD, 3; range, 2–19). They had no previous neurological disorders. The behavioral and imaging data were collected no earlier than 1 month after onset. They were absent from other neurological or psychiatric illnesses, such as alcohol abuse or severe depression. All could understand oral or/and written task instructions. The Edinburgh Handedness Inventory was used to assess their handedness (74 right handed, 3 left handed).
Healthy participants.
Fifty healthy participants (26 males) took part in the present study. They had a mean age of 50 years (SD, 11; range, 26–72) and mean education of 13 years (SD, 4; range, 6–22). All but two participants were right handed. The difference between healthy participants and the patients was not significant in age (t(125) < 1), education level (t(125) < 1), and handedness (χ2(1) < 1). The groups differed in gender distribution (χ2(1) = 12.85, p < 0.001).
Behavioral tasks
Motion–picture verification.
A motion–picture verification task was developed for three types of stimuli: humans (20 motion items constructed into 40 trials, half “yes” responses, half “no” responses), animals (15 motion items, 30 trials), and tools (22 tool motion items, 44 trials). Each trial consisted of a point-light motion animation (e.g., human, a person walking; animal, a bird flying; tool, a hammer hammering) presented above a black-and-white picture (e.g., human, a picture depicting a person kicking; animal, a bird; tool, a hammer). The point-light animations were created using the same procedure as in Experiment 1. For the human motion items, subjects were instructed to judge whether the animated action matched the action depicted by the picture by pressing the “yes” or “no” button on the touch screen; for the animal and tool motion conditions, the instruction was to judge whether the animated action was a typical action associated with the object in the picture. Animations lasted about 1–2 s, and the picture was displayed until the subject's response or after a 6 s deadline. Responses were scored 1 if correct and 0 if wrong. If no response was given within the deadline, a 0.5 point was assigned because any random guessing would have a 0.5 chance of being correct. Giving 0 to these items may bias against patients who tended to be more cautious compared with patients whose strategy relied more on guessing. Such cases occurred in <1% of trials (6 of 8778, 0.07%).
Picture–word verification.
A picture–word verification task was performed to control for any effects in the motion–picture verification task that might originate from the picture rather than the motion identification process. This task had the same trial structure as the motion–picture verification task except that a visual word rather than a point-light animation was presented along with a picture. The participants judged whether the word and picture matched. There were 20 human action trials (e.g., the word “kick” with the picture of a person kicking), 20 tool trials (e.g., the word “hammer” with the picture of a hammer), and 20 animal trials (e.g., the word “dog” with the picture of a dog). There were no “no responses” across all patients.
Imaging data acquisition
Each patient was scanned at the Imaging Center of China Rehabilitation Research Center (Signa Excite 1.5T; GE Healthcare). We obtained two types of whole-brain structural images: 3D T1 and FLAIR T2 images. The 3D T1 image is a dedicated high-resolution T1-weighted, three-dimensional MPRAGE image on a sagittal plane with the following parameters: TR, 12.26 ms; TE, 4.2 ms; TI, 400 ms; FOV, 250 × 250 mm2; FA, 15°; matrix size, 512 × 512; voxel size, 0.49 × 0.49 × 0.70 mm3; slice number, 248 slices. The FLAIR T2 image is a fluid-attenuated inversion recovery T2-weighted image on an axial plane with the following parameters: TR, 8002 ms; TE, 127.57 ms; TI, 2000 ms; FOV, 250 × 250 mm 2; FA, 90°; matrix size, 512 × 512; voxel size, 0.49 × 0.49 × 5 mm 3; slice number, 28 slices. Two identical sequences of the 3D T1 image were collected and averaged to improve the signal-to-noise ratio in analysis. We performed all analyses on the 3D T1 image and used the FLAIR T2 image for visual reference when manually drawing patients' lesions on the 3D T1 image (Rorden et al., 2007).
Imaging data preprocessing
The two sequences of 3D T1 images of patients were first coregistered on their respective native space and then averaged using SPM5. The FLAIR T2 images were coregistered and resliced to the native space averaged 3D images with SPM5. Using MRIcroN software (Rorden et al., 2007), two experimenters manually drew each patient's lesion contour on the native space averaged 3D T1 image slice by slice, visually referring to FLAIR T2 images. The procedure was supervised by an experienced radiologist. The degree of reliability of these two experimenters calculated on the four same patients [mean percentage volume difference, 9 ± 8 and 4 ± 3; mean percentage discrepant voxels, 7 ± 4 and 6 ± 2 (discrepant was defined as 2 voxels from the other manually drawn lesion volume)] was comparable with the same measures of inter-rater reliability reported previously (Fiez et al., 2000). The structural images of each patient were resliced into 1 × 1 × 1 mm3 voxel size and registered into Talairach space via BrainVoyager QX version 2.0 (www.brainvoyager.com). We used the ANTS software package (www.ants.com) to extract the affine transformation matrix between native and Talairach spaces, which was used to register the lesion description in Talairach space.
Behavioral performance standardization
Compared with case studies (Battelli et al., 2003) or lesion overlap analysis (Heberlein et al., 2004), which displays the lesion (overlap) maps of the behaviorally impaired patients with descriptive statistics, the VLSM analysis performs inferential statistical comparisons across voxels, making use of continuous behavioral and lesion information (Bates et al., 2003). Previous studies on biological motion using the VLSM approach (Saygin, 2007), however, tended to directly use behavioral scores of patients without considering the distribution of performance of healthy controls. The “raw” behavioral scores in those studies may be contaminated by demographic factors (e.g., age, education, gender) and may not accurately reflect the severity of the impairment (Crawford and Garthwaite, 2006). An ideal behavioral measure should consider the performance distribution in the reference healthy population. We therefore adopted the method developed by Crawford and Garthwaite (2006), which takes into account such information, and transformed raw accuracies into standard t scores on the basis of the distribution in the healthy population for each behavioral task in each patient. For each task, we first established a regression model on the basis of the 50 healthy control subjects (dependent variable was accuracy; predictors were age, education, and gender). A predicted value for each patient was then obtained by introducing his or her demographic information into the regression model. A discrepancy value (Discrepancypatient) was calculated as the difference value between the observed value and the predicted value. Then we computed the corrected SE of estimate for the patient (SEpatient) using the following formula: where Syx and N are the SE and number of subjects for the control group, respectively; rii and rij are main diagonal and off-diagonal elements of the inverted correlation matrix for the k predictor variables (k = 3; i.e., age, education, gender), respectively; and z0 (z10,…,zk0) identifies the patient's scores on the predictor variables in z score form. The patient's t score was then calculated: t-scorepatient = Discrepancypatient/SEpatient) (see details by Crawford and Garthwaite, 2006). This way, each patient had a t score on each task, which was used in the subsequent analyses as the behavioral performance index. Note that we also performed VLSM analyses using the raw accuracies of the patients directly as behavioral measures for comparison purposes. The results were highly consistent with those using the normalized t scores and are not presented here for simplicity.
Lesion-symptom mapping
A VLSM analysis (Bates et al., 2003; Rorden et al., 2007) was conducted on the data of the 77 patients using the NPM (nonparametric mapping) program in MRIcroN and the Voxbo brain-imaging package (www.voxbo.org). Voxels in which fewer than five patients had lesions were excluded from the analysis. For each voxel entered in the analysis, the patients were divided into the lesion group and the intact group. The behavioral performance index (t scores described above) separately on each of the three motion–picture verification tasks (human motion, animal motion, and tool motion) was compared between these two groups, while controlling for picture recognition effects by either regressing out the performance indices (t scores) on the corresponding control tasks (picture–word verification) or excluding patients who were impaired in the picture–word verification task. Given that the behavioral index scores for lesioned and intact groups across voxels may not comply with assumptions of the t test, a nonparametric Brunner-Munzel (BM) test (Brunner and Munzel, 2000) was performed for the statistical comparison in VLSM (Rorden et al., 2007; Medina et al., 2010). An independent-samples t test was also performed in the main analyses for comparison purposes. To correct for multiple comparisons, the significance threshold was set at FDR corrected p < 0.005 for all analyses unless otherwise noted. A whole-brain VLSM z-map (BM test) or t-map (t test) was then obtained for each task of interest. To further consolidate our findings while excluding the potential confounding influence of multiple lesions, the same VLSM analysis was also performed on the 38 stroke patients with unilateral, single, focal lesions. The overall data pattern was highly similar for the entire group and for the subset of patients with unilateral, single, focal lesions.
Human motion, human–agent, and biological motion effects.
Separate VLSM maps were obtained for human, animal, and tool motion. These maps were transformed into binary maps individually, in which each significant voxel (FDR, p < 0.005) was scored as 1 and others as 0. Conjunction maps were obtained using a similar rationale to that used for the fMRI experiment. For the human–agent map, we first created a human > animal map by removing the binary animal motion map from the binary human motion map and a tool > animal map by removing the animal map from the tool map. Then the human > animal map and the tool > animal map were overlaid, indicating the regions that were significant both in the human and the tool maps but not in the animal map. Similarly, two biological motion maps were obtained. A first map was obtained by subtracting the binary tool map from the binary human map and the binary animal map, respectively, and then overlaying the human > tool and the animal > tool maps; the second was obtained by overlaying the binary human map and the binary animal maps directly without comparison with the tool map.
Results
Experiment 1
One run of one participant was discarded from the behavioral and imaging data analyses because of incomplete collection of this run caused by an unexpected pause of stimuli presentation by the E-prime program.
Behavioral results
The mean accuracy of 16 participants in the motion judgment task for each condition was the following: humans, 0.95 ± 0.09 (SD); animals, 0.95 ± 0.06; tools, 0.93 ± 0.09; global motion, 0.94 ± 0.10. There was no significant difference among the four conditions (F(3,45) = 0.21, p = 0.89). Because the subjects were asked to respond only after seeing the fixation cue, response times were not meaningful and were not analyzed.
fMRI results
Given our focus on the lateral temporal cortex, below we present and discuss the results in right and left lateral temporal cortex and adjacent regions, including the following regions on the anatomical automatic labeling template (Tzourio-Mazoyer et al., 2002): right superior temporal gyrus (STG), middle temporal gyrus (MTG), inferior temporal gyrus (ITG), superior temporal pole, middle temporal pole, and angular gyrus. Whole-brain results are listed in Table 1.
Human motion effect.
To replicate previous findings on human motion recognition, we compared the human motion and global motion conditions. The contrast of human motion > global motion revealed highly significant effects in bilateral occipital and posterior temporal cortex (Fig. 1A), encompassing the posterior STG, MTG, and ITG (peak-point MNI coordinates in the lateral temporal cortex: 57, −63, 3). A significant cluster was also observed in the right middle STS/STG (mSTS/STG) (coordinates: 54, −3, −9). These results are in line with the biological motion effects in the literature, where posterior and middle clusters of temporal cortex have been reported (Allison et al., 2000; Grosbras et al., 2012).
Human–agent motion effect.
To explore the regions showing responses to human–agent motion stimuli, we computed whole-brain conjunction analyses of human motion > animal motion and tool motion > animal motion. The results (Fig. 1B) revealed significant clusters in the middle portion of the right STS/STG (center coordinates: 51, −25, 0) and a small cluster in left MTG (center coordinates: −62, −51, 5).
Biological motion effect.
To identify the brain regions showing responses to biological motion stimuli, we calculated whole-brain conjunction analyses of human motion > functional tool motion and animal motion > functional tool motion. We observed a highly significant cluster in the right occipital-temporal cortex (OTC) (center coordinates: 48, −68, 8), close to the well-documented extrastriate body area (EBA) (Fig. 1C). This activation also extended into right pSTS when global motion was used as the baseline, i.e., in the conjunction analysis of human motion > global motion and animal motion > global motion (Fig. 1D). This latter contrast also revealed a significant cluster in left OTC, encompassing left MT.
Summary.
We replicated the classical finding that bilateral occipital-temporal cortex, pSTS, and mSTS/STG are sensitive to human motion stimuli. Furthermore, in two areas of the right temporal cortex, different patterns of results were obtained: right mSTS/STG showed a human–agent motion effect and, more posteriorly, the right OTC showed a biological motion effect.
Experiment 2
Behavioral results
Mean accuracies and SDs in the motion–picture and picture–word verification tasks for the 77 patients and 50 healthy participants are as follows. In the patient group, for the motion–picture verification task, tool motion (0.75 ± 0.12) was recognized more accurately than human (0.67 ± 0.13) and animal (0.68 ± 0.14) motion (ps < 0.001), with no significant difference between human and animal motion (t <1); for the picture–word task, there was no significant difference among the three categories (human, 0.95 ± 0.08; animal, 0.94 ± 0.08; tool, 0.94 ± 0.09) (ps > 0.05). Similar patterns were found in healthy participants (motion–picture verification: human motion, 0.81 ± 0.11; animal motion, 0.79 ± 0.13; tool motion, 0.87 ± 0.08; picture–word verification: human, 0.99 ± 0.02; animal, 0.98 ± 0.03; tool, 0.98 ± 0.03) except that the performance on the human stimuli in the picture–word verification task was better than that for the other two categories (ps < 0.001). The patient group's performances for all categories in the two tasks were significantly worse than those of the healthy controls (F(1,125) = 23, p < 0.001). The patient group's performances also had larger variations, having greater variation coefficients (SD/mean) than the control group even in the picture–word verification task where both groups showed high mean accuracies (patients: human, 8%; animal, 8%; tool, 10%; controls: human, 2%; animal, 3%; tool, 3%). We further observed that subjects' performance on the three categories of motion stimuli in the motion–picture verification task were correlated in both the patient group (rhuman – animal = 0.65, rhuman – tool = 0.65, ranimal – tool = 0.60; ps < 0.001) and the healthy participant group (rhuman – animal = 0.61, rhuman – tool = 0.69, ranimal – tool = 0.55; ps < 0.001). These between-category r values did not differ significantly for either group (ps > 0.05).
VLSM results
Among the 77 stroke patients, 26 had a left-hemisphere lesion, 15 had a right-hemisphere lesion, and 36 had bilateral lesions. Of the patients, 38 had unilateral, single, focal lesions (22 left and 16 right) and will be addressed as the 38 single-lesion group below. VLSM analyses were performed separately for all 77 stroke patients and for the 38 single-lesion patients (Fig. 2). The lesion distribution patterns for the whole group (n = 77) and for single-lesion patients (n = 38) are presented in Figure 2E. In the analyses, we included voxels that were lesioned in at least five patients. This resulted in coverage of a substantial portion of bilateral temporal lobes, frontal lobes, and some portion of the parietal and occipital lobes and many subcortical and cerebellar regions. In Figure 2F, we show power maps reflecting the probability of each voxel reaching statistical significance with α set to p < 0.05 (Cohen, 1977). Given the variation in power across brain regions, negative results should be interpreted in the context of such variation.
As was done for the fMRI study, here we present and discuss the results for lateral temporal cortex and adjacent regions. Whole-brain results are listed in Table 1. Figure 2 displays the VLSM results for the human motion effect (Fig. 2A), human–agent motion effect (Fig. 2B), and biological motion effect (Fig. 2C,D). For all types of motion effects, results of four analyses are shown: the first three columns show results with the control task performance (picture–word verification task) regressed out. Columns 1 and 2 show the VLSM results with all 77 patients with the BM and the t test, respectively. Column 3 shows the results of the BM test with the 38 single-lesion patients. The last column shows a different way of controlling for the picture recognition effect. In this analysis, we excluded those patients from the whole patient group whose performance for the picture–word verification task was 2 SDs below that of controls in any of the three motion categories. With this procedure, 39 patients were excluded, leaving 38 patients for the analyses. The resulting maps with the BM test are shown in column 4. As can be seen in Figure 2, highly consistent patterns were obtained across these different analyses. For simplicity, only the detailed coordinate and cluster size information for column 1 (BM test with all patients) and column 3 (BM test with 38 single-lesion patients) in Figure 2 are presented below (and in Table 1).
The VLSM results on the human motion recognition task are presented in Figure 2A. For all analyses, a strong right lateralization was apparent, covering a large portion of the right mSTS/STG (extending to MTG). No significant clusters were obtained on the left temporal cortex in any of the analyses. A large cluster was obtained at the FDR < 0.005 threshold in the whole group (77 patients) analysis (center coordinates: 51, −11, −9; 8452 mm3) and at the FDR < 0.05 threshold in the 38 single-lesion patients (center coordinates: 48, −16, −3; 20,696 mm3).
A human–agent motion effect was found in the right mSTS/STG in all analyses (Fig. 2B). No significant cluster was observed in the left temporal cortex. A large cluster was obtained at the FDR p < 0.005 threshold in the whole group analysis (center coordinates: 47, −11, −9; 4697 mm3) and at the FDR p < 0.05 threshold in the subgroup of 38 single-lesion patients (center coordinates: 51, −15, −4; 17,438 mm3).
For the biological motion contrasts, a small cluster in the right STG was observed only in the conservative contrast (human and animal; Fig. 2D) for the whole group analysis (center coordinates: 45, −12, −10; 108 mm3). No significant cluster in the right temporal lobe was found in other analyses, including the stringent contrast (human > tool and animal > tool; Fig. 2C) for the whole group or the conservative and stringent contrasts for the 38 single-lesion patient group (Fig. 2C,D).
Results across Experiments 1 and 2
In both the fMRI experiment and the patient VLSM experiment, we found a region in right mSTS/STG that is sensitive to human–agent motion (stronger effects for human motion and tool motion relative to animal motion), whereas the regions showing the biological motion effect differed across the two experiments.
To quantify the convergence between the two experiments, we first assessed the degree of their overlap by calculating an overlap index (Bracci et al., 2012). We treated the significant regions in right temporal cortex for human–agent motion obtained in the fMRI map (Fig. 1B) and the VLSM map (Fig. 2B) as regions of interest (ROIs) and divided the volume common to the two ROIs by the volume of the smaller of the two ROIs. The left temporal cortex was not included in this analysis because no voxels were obtained in the VLSM map. When the whole group of 77 patients was considered, the common volume between fMRI human–agent ROI (1620 mm3) and VLSM human–agent ROI (4697 mm3) was 343 mm3 (center coordinates: 49, −23, 0) and the overlap index was 21%. When the 38 single-lesion patients were considered, the common volume was 1155 mm3 (center coordinates: 48, −14, −5) and the overlap index was 71%. In contrast, there was zero overlap between the fMRI biological motion ROI and the VLSM biological motion ROI for both the stringent contrast (biological motion 1; Figs. 1C, 2C) and the conservative contrast (biological motion 2; Figs. 1D, 2D) in both kinds of patient group analyses. Thus, the overlap index of biological motion was 0%.
We also assessed the effects of each experiment in the ROIs from the other experiment. We considered first the bilateral STS/STG clusters showing human–agent motion effects and biological motion effects defined by the fMRI experiment (Fig. 1B,D) and extracted the number of lesioned voxels in each ROI as the lesion volume index for each patient. We then correlated the lesion volumes in each ROI and the behavioral performance index on the motion–picture verification task in each category (human, animal, tool) across patients, while regressing out the scores of the picture–word verification task in the corresponding category and whole-brain lesion volume. In this way, we obtained a correlation coefficient (r) for each category in each ROI. The reverse analyses were also performed: for the VLSM-defined human–agent effect ROI and biological motion ROI obtained in Experiment 2 (Fig. 2B,D, BM test), we extracted the mean BOLD β values of each motion category for each healthy participant in Experiment 1 and plotted the effects of the three critical categories, as indexed by the β differences between each of the categories with the global motion. Figure 3 presents the results with all 77 patients. The results with 38 single-lesion patients showed similar overall patterns, especially for the human–agent effects, and are described below.
For the human–agent ROIs, the results showed that the effects converged well across the two experiments. Specifically, when all 77 patients were considered (Fig. 3A), in the ROIs defined in the healthy participant fMRI experiment (top row), for the patient data there was a significant correlation of the lesion volume in the rmSTS/STG ROI with human motion (r = −0.35, p < 0.01) and tool motion (r = −0.24, p < 0.05) performance but not with animal motion performance (r = −0.15, p = 0.21). The difference between human motion and animal motion was significant (p < 0.05), but not the difference between tool motion and animal motion (p = 0.21). The left pMTG ROI showed significant effects for all three types of motion stimuli (ps < 0.05), with no statistically significant differences among them (ps > 0.10). In the VLSM-defined human–agent ROI, the effects of human motion and tool motion in the fMRI experiment were significantly or marginally significantly stronger than animal motion (t(15) = 3.15, p < 0.01 and t(15) = 1.87, p = 0.08, respectively).
When the patient data for 38 single-lesion patients were considered, in the ROIs defined in the fMRI experiment, there was a significant correlation of the lesion volume in the rmSTS/STG ROI with human motion (r = −0.50, p < 0.01) and tool motion (r = −0.39, p < 0.05) performance but not with animal motion performance (r = −0.22, p = 0.20). The difference between human motion and animal motion was significant (p < 0.05), but not the difference between tool motion and animal motion (p = 0.20). For the left pMTG ROI, there was a significant correlation of the lesion volume in this ROI with animal motion performance (r = −0.38, p < 0.05) but not with human motion (r = −0.31, p = 0.07) or tool motion (r = −0.21, p = 0.21) performance. There was no significant difference among the three categories (ps > 0.05). In the VLSM-defined human–agent ROI, the effects of human motion and tool motion in the fMRI experiment were both significantly stronger than animal motion (t(15) = 3.50, p < 0.01 and t(15) = 2.96, p < 0.01, respectively).
For the biological motion ROIs, the results were rather divergent across the two experiments (Fig. 3B). When all 77 patients were considered, for the ROI obtained in the conservative biological motion contrast (human > baseline and animal > baseline) in the fMRI experiment, lesion volume tended to be more strongly correlated with human motion and animal motion performance than tool motion. The right-hemisphere ROI reached marginal significance for animal motion but not for the other two motion categories (Fig. 3B, top row; human motion: r = −0.17, p = 0.14; animal motion: r = −0.20, p = 0.09; tool motion: r = −0.16, p = 0.18). No significant effect was observed for any category in the left temporal ROI (rHuman = 0.19; rAnimal = 0.17; rTool = 0.08; ps > 0.10). For the VLSM-defined biological motion ROI (conservative contrast), the effects in the healthy participant fMRI experiment did not reach significance for any category either (rHuman = 0.16; rAnimal = 0.10; rTool = 0.07; ps > 0.20).
The pattern with the subgroup of the 38 single-lesion patients was again similar to that with the whole patient group analysis. For the ROI obtained in the conservative biological motion contrast in the fMRI experiment, lesion volume tended to be more strongly correlated with human motion and animal motion performance than tool motion. The right-hemisphere ROI reached significance for human motion (r = −0.35, p < 0.05) but not for the other two motion categories (animal motion: r = −0.30, p = 0.08; tool motion: r = −0.26, p = 0.12). For the left temporal ROI, no significant effect was obtained for any category (human motion, r = −0.07; animal motion, r = −0.24; tool motion, r = −0.18; ps > 0.14). No significant clusters were obtained in the VLSM analyses to allow for the reverse analysis.
Discussion
We examined whether human (conspecific) motion recognition and biological motion recognition are associated with distinct brain regions and, if so, what characterizes them. We compared the effects of three types of motion, human, animal, and functional tool motion, in a neuroimaging experiment and in a VLSM experiment. In the fMRI experiment with healthy participants, we observed two regions in the temporal lobe that are involved with different aspects of motion recognition: the middle portion of the right superior temporal gyrus/sulcus is more important for processing motion produced by human agents (human motion and functional tool motion) relative to animal motion, and the more posterior region in the bilateral occipital-temporal cortex responded more strongly to biological motion (human motion and animal motion) relative to nonbiological object motion (tool motion). The finding of a region seemingly specialized for the processing of human agency was confirmed in our VLSM study with brain-damaged patients, where we found that damage to right mSTG/STS is associated with impairment in the recognition of both human motion and functional tool motion, relative to animal motion. Highly consistent results were obtained across analyses when considering all stroke patients and only stroke patients with single, unilateral lesions. Note that one important caveat to consider is that the low-level motion properties of our motion stimuli and task requirements were not perfectly matched across conditions. However, the common effects of human and tools over animals could not be readily explained by such differences, as human and animal motion were more similar in motion properties (e.g., number of dots presented or articulated manner) and animal and tool motion had more similar task requirement (i.e., identification of object-specific actions). Furthermore, behavioral accuracies in Experiment 1 were comparable across the three categories, and in Experiment 2, they were more similar between human and animal motion conditions. Together, these result show that right mSTG/STS is not only activated during human–agent motion recognition, but is necessary for processing such motion.
The fMRI and VLSM finding that right mSTS/STG shows preference to human and tool motion stimuli relative to animal motion demonstrates that this region is not tuned to biological motion properties, but rather is selectively involved in processing the movement of human agents. Tools, as inanimate objects, do not have intrinsic, self-initiated motion, and their motion is usually induced by human agents. These considerations encourage the view that it is the processing of the implied agent of functional tool motion that is responsible for the observed association of human and tool motion in right mSTG/STS.
Previous functional imaging studies have reported that right mSTS/STG is more strongly activated in processing human motion (Howard et al., 1996; Allison et al., 2000; Iacoboni et al., 2004; Redcay, 2008; Grosbras et al., 2012) compared with scrambled motion and with the same types of motion produced by nonhuman agents such as robots/cartoon figures (Mar et al., 2007; Gobbini et al., 2011). Such human-related effects have generally been interpreted as the effects of a social information processing system. However, our results showing that right mSTS/STG is both more sensitive (fMRI) and necessary (VLSM) for tool motion processing, whereas it is insensitive and not necessary for animal motion processing, suggest that social property engagement is not necessary in driving such human agency effects. Assuming that social-related processing is defined by the interaction between two or more humans (and, by extension, between two or more animals), the motion of tools does not automatically involve such processing. That is, we have shown a human (agency) effect that is not to be reduced to general social or biological effects. Although it is undoubtedly the case that the human motion recognition system is at the service of social cognition (Lahnakoski et al., 2012; Pavlova, 2012; Simmons and Martin, 2012), it is not dependent on it and is not fully subsumed within it. The results reported here suggest that the right mSTS is involved in computing human agency in its most general form, including object-directed agency, and independently of social valence.
Our finding that in the posterior portion of the temporal lobe (the occipital-temporal cortex) human motion and animal motion elicited stronger activation relative to global motion and tool motion provides direct evidence for the common assumption that this region is sensitive to biological entities (see also Kaiser et al., 2012). This finding is consistent with the hypothesis that the human motion effect observed here is attributable to this region's sensitivity to a more general type of motion property, articulated motion, that is associated with biological entities (Beauchamp et al., 2002, 2003; Pelphrey et al., 2003). However, such findings were not supported by the VLSM study, perhaps because of low statistical power resulting from the small number of patients with lesions in the occipital-temporal cortex in our patient group. This issue remains to be explored.
The region showing biological motion selectivity in our fMRI study included a large cluster in the OTC that is more inferior to the pSTS region commonly indicated in human motion research and seems close to the well-documented EBA (Grosbras et al., 2012), which has been shown to be more responsive to human and animal bodies (Haxby et al., 2000; Downing et al., 2001; Peelen and Downing, 2007). Previous studies have found that point-light displays of human movement activate EBA along with right pSTS, and such effects might be driven by the body form information derived from the point-light display (Peelen et al., 2006). Our results might reflect effects from either or both of the two regions, the right pSTS for biological motion and the EBA for biological form. It is worth noting that the right OTC cluster observed in our study was obtained in the contrast human motion > tool motion and animal motion > tool motion, whereas the more classical right pSTS was obtained for the contrast human motion > global motion and animal motion > global motion. Although tool motion is not articulated, it shares other visual properties with biological motion, in that it can be seen as the extension of the effector causally involved in the object's motion (i.e., hand; Bracci et al., 2012). If such were the case, contrasting human and animal motion to functional tool motion might have the effect of subtracting out the biological “motion” component and leaving behind the shape dimension of the biological entities.
Compared with previous neuropsychological investigations, our results revealing the critical role of right mSTS/STG in human–agent motion recognition are better aligned with fMRI studies with healthy participants. Although most fMRI studies have reported right pSTS to be the peak of the human motion effects, right mSTS/STG has also been indicated (Grosbras et al., 2012). In contrast, previous patient studies have reported effects in left STS, premotor area, right superior parietal lobe, V5/MT, inferior temporal gyrus, medial frontal lobe, and right anterior temporal lobe, with little evidence for right middle/posterior STS/STG. There are several possibilities why the previous patient studies have not found right STS/STG effects in human motion processing. The brain regions of interest in those studies were limited to focal areas, such as the left hemisphere (Saygin, 2007), anterior temporal lobe (Vaina and Gross, 2004), and parietal lobes (Battelli et al., 2003); the sample size was relatively small with the exception of the study by Saygin (2007), which included a larger number of patients but could not examine the role of right STS/STG since only left-hemisphere lesion patients were included. Our study was performed on a much larger sample of patients (n = 77) with lesions covering a wide range of bilateral regions, allowing for greater power to detect the contribution of right STS/STG in human motion recognition.
In conclusion, we have shown the existence of distinct functional components in the motion recognition stream in right lateral temporal cortex. One component, in the right pSTS and bilateral OTC, is most likely driven by bottom-up visual motion and shape properties that are shared by biological entities. This component provides the initial interpretation of biological motion, which, together with the contribution of nearby areas, may then serve as the basis for the interpretation of observed actions. More importantly in the context of the present study, a second component, lying more anteriorly in the middle part of right superior temporal region, is involved in the recognition of human–agent motion. This component provides a more abstract interpretation of the agent, explicit or implicit, that is performing the motion. This region shows selectivity to the motion of a human agent, even when the stimulus itself is not biological and does not contain articulated motion properties or social valence. These findings suggest that the organization of lateral temporal cortex is not guided by a unidimensional biological–nonbiological principle but is hierarchically organized from undifferentiated biological motion processing to more complex (agency) and specific (human) dimensions. The careful distinction among different components of motion perception and interpretation is a necessary step in understanding the neural basis of human motion recognition at the service of both social and nonsocial cognition.
Footnotes
This work was supported by the 973 Program (Grant 2013CB837300), the Major Project of the National Social Science Foundation (Grant 11&ZD186), NSFC (Grants 31171073, 31222024, 31271115, 81030028, and 31221003), NCET (Grants 12-0055 and 12-0065), the National Science Fund for Distinguished Young Scholars (Grant 81225012 to Y.H.), and Grant BJNSF (7122089). A.C. was supported by the Fondazione Cassa di Risparmio di Trento e Rovereto. We thank Emily D. Grossman for sharing the animal motion stimuli; Myrna Schwartz, Daniel Kimberg, and Grant Walker for help with Voxbo software; Xueming Lu for fMRI data analyses; and Alex Martin, Marius Peelen, and Lorella Battelli for comments on a previous version of this manuscript. We are also grateful to all research participants.
The authors declare no competing financial interests.
- Correspondence should be addressed to Yanchao Bi, State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China. ybi{at}bnu.edu.cn