Abstract
Face recognition mechanisms need to extract information from static and dynamic faces. It has been hypothesized that the analysis of dynamic face attributes is performed by different face areas than the analysis of static facial attributes. To date, there is no evidence for such a division of labor in macaque monkeys. We used fMRI to determine specializations of macaque face areas for motion. Face areas in the fundus of the superior temporal sulcus responded to general object motion; face areas outside of the superior temporal sulcus fundus responded more to facial motion than general object motion. Thus, the macaque face-processing system exhibits regional specialization for facial motion. Human face areas, processing the same stimuli, exhibited specializations for facial motion as well. Yet the spatial patterns of facial motion selectivity differed across species, suggesting that facial dynamics are analyzed differently in humans and macaques.
Introduction
Faces provide a rich source of social information. Some information, such as individual identity, is transmitted by the structure of the face. Other information, such as its mood, involves dynamic transformations (Darwin, 1872). Thus, face recognition requires motion to be factored out for identification while simultaneously extracted to perceive changes in expression, head orientation, or gaze. The mechanisms for performing these very different computations have been suggested to reside in different parts of the human brain (Bruce and Young, 1986; Haxby et al., 2000; O'Toole et al., 2002). In particular, it has been suggested that the occipital face area (OFA) and the fusiform face area (FFA) represent invariant properties of faces (Kanwisher et al., 1997; McCarthy et al., 1997; Yovel and Kanwisher, 2004), whereas the superior temporal sulcus face area (STS-FA) is sensitive to dynamic face properties (Allison et al., 2000; Gobbini et al., 2011). Recently, Pitcher et al. (2011) found a clear functional dissociation, with the STS-FA selective for dynamic information and OFA and FFA insensitive to facial motion.
In macaque monkeys, a network of face-selective areas has been identified (Tsao et al., 2003, 2008; Pinsk et al., 2009; Rajimehr et al., 2009), but specializations for facial motion have not been investigated yet. To better understand how facial motion is processed across species, we probed the face-processing networks of both macaques and humans to address two questions: is the processing of dynamic information functionally separated within face-processing networks? If so, how does this separation inform putative homologies of face areas across the two primate species?
Materials and Methods
All animal procedures complied with the National Institutes of Health Guide for Care and Use of Laboratory Animals, regulations for the welfare of experimental animals issued by the California Institute of Technology, where all macaque experiments were conducted. All human subject procedures were approved by the Institutional Review Board of The Rockefeller University, and informed consent was obtained from all human subjects.
Surgery.
Implantation of MR-compatible headpost (Ultem; General Electric Plastics), MR-compatible ceramic screws (Thomas Recording), and acrylic cement (Grip Cement, Caulk; Dentsply International) followed standard anesthetic, aseptic, and postoperative treatment protocols (Wegener et al., 2004).
Monkey fMRI.
Scanning was performed on a 3T MR scanner (TIM Trio with AC88 gradient insert; Siemens). For each monkey, we acquired 16 anatomical volumes at high spatial resolution (0.5 mm isometric) with a T1-weighted inversion recovery sequence (MPRAGE) under anesthesia (ketamine and medetomidine, 8 mg/kg and 0.04 mg/kg). For functional imaging, contrast agent ferumoxytol (8 mg of Fe per kg body weight), was injected into the femoral vein before the scan session to increase the signal-to-noise ratio. Like MION (Vanduffel et al., 2001), ferumoxytol reduces signals in activated voxels, and we thus inverted signals for display of functional data to facilitate comparison with BOLD data.
All functional data were acquired in horizontal slices with a multiecho EPI sequence (TR 2 or 3 s, TE 30 ms, 1.5 or 1.0 mm3 voxel size) and a custom-made 1-channel or 8-channel surface coil as described previously (Tsao et al., 2008). The use of smaller voxel sizes in macaques reduces the effect of ear-canal-related susceptibility artifacts compared with humans (Devlin et al., 2000; Kriegeskorte et al., 2007). Three male rhesus monkeys (Macaca mulatta) were scanned while foveating a fixation dot at the center of the screen. Monkeys sat in sphinx position with their heads fixed (Vanduffel et al., 2001; Tsao et al., 2003). Juice reward was delivered after variable periods of time (2–4 s) during which the monkeys maintained fixation within 2 degrees of the fixation dot. Eye position was measured at 100 Hz using a commercial eye monitoring system (ISCAN).
Human fMRI.
All scanning was performed on a 3T MR scanner (TIM Trio; Siemens). Human functional data were acquired in horizontal slices, approximately aligned to the AC-PC line with a standard EPI sequence (TR 2 s, TE 32 ms, 64 × 64 matrix, 3.43 mm × 3.43 mm in-plane resolution, 3.4 mm slice thickness, flip angle 90°) and a 32-channel head coil. On each scan session, we obtained a high-resolution anatomical volume of the entire brain (MPRAGE, 1 mm isometric).
Six human subjects (3 females, 3 males; age 25–35 years) participated in the experiment. Subjects were instructed to maintain fixation on a central dot and indicate with a button press (right index finger) when the identity of a stimulus was repeated within a visual stimulation block. Eye position was measured at 100 Hz using a commercial system (ISCAN) to ensure that subjects were following fixation instructions within a 2 degree window.
Visual stimulation.
The same visual stimuli were presented to humans and macaque monkeys. Two different experiments were performed, both presented in block designs in separate runs.
The first experiment was a standard face localizer (Moeller et al., 2008), used to define face-selective ROIs. The duration of each block was set to equal 8 times the TR of the imaging sequence. Each image block contained pictures of one of the following categories: human faces (F), monkey faces (M), human hands (H), gadgets (G), fruits and vegetables (V), and human headless bodies (B). Each image block was preceded by a scramble block (S) with spatially scrambled versions of the pictures of the subsequent block. Runs concluded with a final block showing a gray random dot pattern (R). Thus, the sequence of blocks presented in each run was S_F_S_H_S_M_S_G_S_F_S_V_S_M_S_B_R. Images were presented at a subtended 5.9° visual angle (10.4 cm diameter at 100 cm distance) for 0.5 s.
The second experiment was performed to test for selectivity for stimulus dynamics. Blocks lasted 32 and 30 s for all scans. Stimulus conditions were comprised of blocks of moving faces, static faces, moving objects, and static objects. Motion blocks were composed of short movies (0.5–2.5 s long), whereas static blocks included pictures shown for the same amount of time. Face movies showed macaques or humans vocalizing and generating facial expressions. Macaque facial expressions included coo calls, lip smacking, aggressive teeth displays, and grunts. Human expressions included smiling, nodding, and simple vocalizations (similar to monkey calls). Object movies showed artificial and natural objects (computer mouse, shoe, canned food, toothbrush, comb, flowers, leaves, fruits) subject to naturalistic motions (such as falling or being shaken as if moved by the wind or sliding down a slope).
To minimize low-level differences across stimuli, images and movies were achromatic, objects and faces were placed in the picture center and on identical backgrounds of salt-and-pepper noise, and movies and pictures were manually adjusted to have an overall matched distribution of pixel intensities and similar object/face sizes. Motion energy is hard to measure in naturalistic movies. We compared activations in general motion-sensitive brain areas (see below) to estimate differences across motion blocks. Static control conditions were generated directly from the corresponding movie clip by extracting frames maximizing the social information conveyed. For instance, if the original clip showed a monkey with an aggressive expression, we used a frame with the teeth most visible. Image categories comprised static human faces (FHS), static objects (two sets, OS and OSbis), moving human faces (FHM), moving objects (two sets, OM and OMbis), static monkey faces (FMS), moving monkey faces (FMM), and scrambled images (S). The sequence of blocks used was as follows: S_FHS_S_OS_S_FHM_S_OM_S_FMS_S_OSbis_S_FMM_S_OMbis_S. Stimuli subtended 7.4° visual angle (13 cm diameter at 100 cm distance).
Visual stimulation was controlled by custom MATLAB (MathWorks) code using the Psychophysics Toolbox (Brainard, 1997). Stimuli were projected with a video projector (JVC DLA-G15E) at 30 Hz with 720 × 480 pixel resolution on a back-projection screen.
fMRI data analysis.
FreeSurfer and FSFAST (http://surfer.nmr.mgh.harvard.edu/) were used to reconstruct cortical surfaces and perform functional data analysis, following procedures detailed previously (Tsao et al., 2003). The same procedure was used to define face-selective areas in monkeys and humans. We used data from Experiment 1 and calculated the contrast of static faces versus all whole objects. Face-selective regions were identified by anatomical location and relative position. Identity of face-selective regions was then determined by comparison with published coordinates (Tsao et al., 2008; Pitcher et al., 2011), and established naming conventions used. Voxels within ROIs were pooled together for subsequent analysis of data from Experiment 2. We used high thresholds (at least p < 10−7) to define macaque ROIs to minimize partial volume effects. An exception is left middle face patch in the STS fundus (MF) in Monkey M1 (p < 10−2) where selectivity was confirmed in Experiment 2.
In macaques the number of runs used depended on individual performance and varied slightly: 22, 21, and 28 runs in Experiment 1 and 28, 21, and 16 in Experiment 2 for Monkeys M1, M2, and M3, respectively. In humans, we had 4 runs per subject and experiment.
For group analysis, a general linear model was fit to the β values obtained from every single run of Experiment 2 for each ROI. Because of the small sample size (3 subjects), typical for monkey fMRI studies, a fixed effects group analysis was used as in previous studies (Jastorff et al., 2012). Contrasts were computed with a two-way ANOVA, with single-run β values as repeated measures. Bonferroni corrections for multiple ROIs were used to adjust significance thresholds.
Motion energy controls.
Possible differences in motion energy between face and non-face object stimuli were assessed via activation differences in motion-sensitive areas in the STS of macaques. We identified MT/MST/FST by contrasting all moving stimuli with all static ones on even runs, setting a high significance threshold, and verified results by registration to macaque F99 atlas (Van Essen et al., 2011). For these ROIs, we contrasted moving versus static faces and moving versus static objects on odd runs to measure modulation by face motion (7.02 ± 0.43%, mean ± SEM; F(1,429) = 272.34, p < 10−47) and by object motion (9.76 ± 0.60%; F(1,429) = 263.19, p < 10−45). This suggests that motion energy in non-face movies was higher than in face movies. Thus, larger activations in a face area for facial versus object motion cannot be explained by differences in motion energy.
Eye movements controls.
To assess whether eye movements might be different across conditions, we calculated the number of saccades during the task. Eye traces were low-pass-filtered (15 Hz cutoff frequency) and underwent edge-preserving smoothing for noise removal (Santella and DeCarlo, 2004), after which a velocity threshold was applied. For each monkey, we performed a one-way ANOVA on the number of saccades during each stimulation block using different runs as repeated measures. Only Monkey M2 showed a significant effect of condition on saccade number (F(3,132) = 4.36, p < 0.01), whereas Monkeys M1 (F(3,172) = 2.57, p = 0.06) and M3 (F(3,116) = 1.51, p = 0.22) did not. A post hoc analysis in Monkey M2 showed no significant difference within motion conditions.
Results
We scanned 3 macaque monkeys and 6 human subjects. We first localized face patches by contrasting responses to static faces with responses to static non-face objects (Experiment 1, Fig. 1a). To determine the position of face areas relative to motion-selective cortex, we derived maps of general motion sensitivity from Experiment 2 by contrasting responses to moving and static objects (Fig. 1b). Motion selectivity extended throughout the fundus of superior temporal sulcus, embedding the two face patches in the fundus of the STS, MF, and AF, in motion-responsive cortex (Fig. 1b), whereas middle face patch on the STS lip (ML), anterior face patch on the STS lip (AL), and anterior face patch on the ventral surface of inferior temporal (AM) were not. A contrast of moving faces versus moving objects reproduced the known face patches (Fig. 1c).
Face and motion selectivity in the macaque temporal lobe. a, Left, Face-selective regions in one representative hemisphere (M1, right), on a flattened cortical surface. The color bar indicates the negative common logarithm of the probability of error. Sulci: sts, superior temporal; sf, sylvian fissure. Right, Time courses of fMRI signal for one representative region (ML). Colored epochs distinguish stimulation blocks. Block types are indicated with symbols below the time axis for clarity. b, Map of the strength of motion responses. Face-selective regions are represented by black outlines. The color bar indicates the magnitude of the response to motion (difference between the response to moving and static non-face objects) in units of percentage signal change. c, Face selectivity map, similar to a, but comparing moving faces to moving objects in Experiment 2. Black outlines as in b. AM falls partially outside the functional volume.
We calculated the responses to static and moving faces and objects in a group analysis (Fig. 2a) and identified the separate contributions of the factors: shape category (face vs object), motion (moving vs static), and their interaction (Fig. 2b) in a two-way ANOVA. The main effect of shape category confirmed face selectivity in all face patches (Fig. 2b; Table 1). The main effect of motion was strongest in MF and AF, weaker in posterior face patch PL, ML, and AL, and insignificant in AM (Fig. 2b; Tables 1 and 2). A subset of face patches, AL and AM, exhibited a significant interaction of motion with shape category (Fig. 2b; Table 1). Thus, in face patches in the fundus of the STS (MF, AF), responses can be understood as a linear superposition of face selectivity and general motion sensitivity, whereas in patches furthest away from the fundus of the STS (AL and AM), the impact of stimulus motion is weaker and partially selective for facial motion.
Responses to moving and static faces and objects in the macaque face patch system. a, Group analysis responses to (from left to right) static faces, static non-face objects, moving faces, and moving non-face objects in percentage signal change from scrambled stimuli baseline in temporal lobe face patches. *p < 0.05, significant differences from 0 (Bonferroni-corrected for comparisons on multiple ROIs). b, Group analysis of (from left to right) the main effects of shape category, motion condition, and their interaction. Error bars indicate SE. *p < 0.05, significant differences from 0 (Bonferroni-corrected for comparisons on multiple ROIs).
Statistics of F tests for main effects and interaction in macaque ROIs
The pattern of motion sensitivity along the fundus of the STS suggests a functional specialization of face patches by anatomical location in the STS. We tested this for two pairs of patches (ML vs MF, and AL vs AF) that differ in their location with respect to the fundus of the STS but are positioned at similar anterior–posterior positions along the STS. We performed a three-way ANOVA with ROI, motion and shape category. For the ML versus MF comparison, a significant two-way interaction between shape category and ROI (F(1,1765) = 47.7 p < 10−11), and between motion and ROI (F(1,1765) = 7.74, p < 10−2) indicated that the middle face patches differ in the strength of their selectivity to shape category (stronger in ML) and motion (stronger in MF). Nevertheless, a two-way interaction between motion (F(1,1765) = 3.73, p = 0.053) and shape category as well as a three-way interaction of ROI, motion, and shape category (F(1,1765) = 2.42, p = 0.12) did not reach significance. For the AL versus AF comparison, the two-way interaction between shape category and ROI was not significant (F(1,1765) = 0.89, p = 0.35), but the motion versus ROI interaction was (F(1,1765) = 21.0, p < 10−5), with AF being more strongly selective for motion. For this pair, there was a significant two-way interaction between motion and shape category (F(1,1765) = 7.18, p < 10−2) as well as a three-way interaction of ROI, shape category, and motion (F(1,1765) = 4.27, p < 0.05). This suggests a specialization for facial motion in AL absent in AF. We also analyzed the presence of a motion condition by shape category interaction on individual monkeys, focusing on the areas on the lip of the STS (AL and ML) and the fundus of the STS (AF and MF). The effect was significant on the lip patches of Monkey M1 (F(1,446) = 5.1, p < 0.05) and both the lip (F(1,172) = 5.1, p < 10−3) and fundus (F(1,172) = 5.1, p < 10−3) patches of Monkey M3. This indicates the strength of specialization for facial motion is subtle and not always apparent in single individuals. Together, these analyses show strong effects of shape category and motion condition according to the position of face patches with respect to the STS.
In humans, we identified face-selective regions (Fig. 3a) FFA (left in 5 of 6 subjects, right in 6 of 6 subjects), OFA (left in 5 of 6 subjects, right in 6 of 6 subjects), and STS-FA (left in 1 of 6 subjects, right in 6 of 6 subjects). In Experiment 2, we calculated separate responses to all stimulus conditions (Fig. 3b) and calculated the main effect of shape category, motion, and their interaction (Fig. 3c). OFA and FFA activation were not significantly modulated by general motion (Fig. 3c; Table 2). In contrast, the STS-FA exhibited significant modulation by motion and an interaction between shape category and motion (Fig. 3b; Table 2). Underscoring the impact of face-specific motion on STS-FA responses, in 5 of 6 subjects the left STS-FA was found contrasting responses to moving faces versus moving non-face objects, but not contrasting static faces versus static objects.
Responses to moving and static faces and objects in human face areas. a, Face-selective regions on the flattened surface of the posterior right hemisphere of a representative human subject. Color bar as in Figure 1a. Sulci: los, lateral occipital; sts, superior temporal; lots, lateral occipitotemporal; cos, collateral. b, Group analysis of responses. Conventions as in Figure 2a. c, Group analysis of main effects and interaction. Conventions as in Figure 2b.
Statistics of F tests for main effects and interaction in human ROIs
Discussion
The present results show that dynamic stimulus information is processed differentially in the face-processing networks of two primate species. In the macaque, face patches in the STS fundus were part of motion-selective cortex and exhibited enhanced responses to moving faces, which could be explained as a linear superposition of object preference and general motion sensitivity consistent with two broad scenarios. Neurons within MF and AF might respond to similar motion patterns as neurons outside MF and AF yet differ from these in their preference for faces. Alternatively, MF and AF could consist of two different populations: one motion selective, the other face-selective. Interestingly, the location and selectivity of MF raise the possibility that it might overlap with area LST (Nelissen et al., 2006), which exhibits both object and motion selectivity.
In contrast, face patches AL and AM responded selectively to facial motion. The interaction of shape and motion selectivity suggests that the two stimulus domains converge on the same population of neurons, perhaps receiving both face-selective and motion-selective inputs from AF and MF with whom they form a closed network (Moeller et al., 2008).
In humans, only the STS-FA showed a main effect of shape and motion, and a specific modulation by facial motion. This finding is consistent with results from Pitcher et al. (2011). In addition, only a contrast of moving faces versus moving objects reliably revealed the left hemisphere STS-FA, consistent with Fox et al. (2009). These results support the idea that the human STS-FA is specialized for the processing of dynamic facial information.
One motivation for the present study was to use specialization for motion to shed light on putative homologies of face areas across species. Several interpretations of our results are possible; and, contrary to our expectation, they do not lend themselves to a straightforward equalization of face areas. The most striking specialization is that of the human STS-FA for facial motion, unmatched by any of the other face areas, human or macaque. In macaques, specializations for facial motion are less prominent. It thus seems plausible that the human STS-FA might be a specialization of human or hominoid brains that other old world monkeys lack. The spatial separation of the STS-FA and its lack of connectivity with the other face areas (Gschwind et al., 2012) are compatible with this interpretation, but the positioning of macaque face areas inside or close to the STS has been suggested to imply the opposite assertion that macaque face areas might correspond to human STS-FA (Ku et al., 2011). However, considering the overall pattern of results in both humans and macaque monkeys of generally larger motion selectivity in more dorsal face areas (STS-FA in humans and fundus STS areas in macaque monkeys) than in more ventral areas, a homology of STS-FA with MF and AF is suggestive. This interpretation would be consistent with those drawn from processing of dynamic body shapes in the human and macaque brain that also found stronger motion selectivity in dorsomedial than ventrolateral STS areas (Jastorff and Orban, 2009; Jastorff et al., 2012).
For making cross-species comparisons, one needs to consider several potentially complicating and limiting factors. First, although motion processing is important, multiple functional dimensions should be considered for establishing homologies (Durand et al., 2009). Second, we focused our analysis on the face-patch system while dynamic stimuli are also processed outside of it (Nelissen et al., 2006; Furl et al., 2012; Jastorff et al., 2012). Third, contrast agents are typically used in macaques only. However, this difference is unlikely to obscure the comparison because MION and BOLD responses both ultimately tap into the same physiological mechanisms of neurovascular coupling. Fourth, the attentional state of macaques and humans is typically not fully controlled. We tried to reduce this source of variability. The use of a simple task in humans encouraged them to distribute attention evenly across stimulation blocks, whereas in macaques extensive fixation training with the same stimuli presumably minimized fluctuations in internal state.
Our results underscore the importance of using naturalistic stimuli for studying functional areas. Dynamic faces elicited enhanced responses across all face-selective areas of the temporal lobe, and the left STS face area in humans was reliably active only for moving faces, whereas the functional differentiation we found in macaques face patches is, to our knowledge, the first one revealed with fMRI in this network. Crucially, their impact differs across face areas and thus helps to reveal functional differences within macaque and human face processing networks.
Notes
Supplemental figures can be found online at http://lab.rockefeller.edu/freiwald/supplemental.
Footnotes
This work was supported by the Irma T. Hirschl/Monique Weill-Caulier Trusts, The Esther A. and Joseph Klingenstein Fund, McKnight Endowment Fund for Neuroscience, Pew Scholars Program in the Biomedical Sciences, and Alexandrine and Alexander Sinsheimer Fund to W.A.F., and by NIH (Grant 1R01EY019702) to D.Y.T. We thank Shay Ohayon for providing his stimulus presentation program and Sara Steenrod for proofreading the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Winrich A. Freiwald, Laboratory of Neural Systems, Rockefeller University, 1230 York Avenue, New York, NY 10065. wfreiwald{at}rockefeller.edu