Abstract
The cerebral cortex of humans and macaques has specialized regions for processing faces and other visual stimulus categories. It is unknown whether a similar functional organization exists in New World monkeys, such as the common marmoset (Callithrix jacchus), a species of growing interest as a primate model in neuroscience. To address this question, we measured selective neural responses in the brain of four awake marmosets trained to fix their gaze upon images of faces, bodies, objects, and control patterns. In two of the subjects, we measured high gamma-range field potentials from electrocorticography arrays implanted over a large portion of the occipital and inferotemporal cortex. In the other two subjects, we measured BOLD fMRI responses across the entire brain. Both techniques revealed robust, regionally specific patterns of category-selective neural responses. We report that at least six face-selective patches mark the occipitotemporal pathway of the marmoset, with the most anterior patches showing the strongest preference for faces over other stimuli. The similar appearance of these patches to previous findings in macaques and humans, including their apparent arrangement in two parallel pathways, suggests that core elements of the face processing network were present in the common anthropoid primate ancestor living ∼35 million years ago. The findings also identify the marmoset as a viable animal model system for studying specialized neural mechanisms related to high-level social visual perception in humans.
Introduction
The cerebral cortex of humans and macaques is specialized for the visual processing of faces and other social stimuli, which likely reflects the fact that primates, unlike most mammals, depend primarily on vision for social signaling (Leopold and Rhodes, 2010). Evidence for such specialization comes largely from fMRI experiments, which have identified cortical regions in both humans and macaques that respond more strongly to faces than to other structured objects. In humans, faces selectively activate the mid fusiform gyrus (Kanwisher et al., 1997), the lateral occipital cortex (Puce et al., 1996), the anterior and posterior regions of the superior temporal sulcus (Puce et al., 1998; Pitcher et al., 2011), and the anterior portion of the inferotemporal cortex (Kriegeskorte et al., 2007). In macaques, a similar diversity of face “patches” exists in the ventral visual stream, particularly in and around the superior temporal sulcus (Tsao et al., 2003; Pinsk et al., 2005; Bell et al., 2009; Ku et al., 2011). Electrophysiological studies performed both in macaques (Tsao et al., 2006; Gross, 2008; Tsao and Livingstone, 2008) and in humans (Bentin et al., 1996; Parvizi et al., 2012) revealed that these patches contain a functional clustering of neurons that respond selectively to the visual presentation of faces. Establishing a more precise correspondence between face patches in the two species is an active area of research (Tsao et al., 2008; Ku et al., 2011; Yovel and Freiwald, 2013).
The evolutionary origins of face networks in the brain are not well understood, in part because comparative data are limited. Among primates, face-selective responses have been measured only in macaques and humans, with few exceptions (Zangenehpour and Chaudhuri, 2005; Parr et al., 2009). Outside of primates, only one other species, the sheep, has been tested for such responses. Electrophysiological recordings from neurons in the sheep's temporal cortex revealed selective responses to faces of both sheep and humans, although the spatial organization of such responses across the cortex was not studied in detail (Kendrick and Baldwin, 1987).
The common marmoset (Callithrix jacchus) is a New World monkey of growing interest as an experimental model for systems neuroscience (Okano et al., 2012; Kishi et al., 2014; Solomon and Rosa, 2014). The organization of retinotopic cortical areas and their basic homology to areas in the macaque are well established (Rosa and Tweedale, 2005). However, to date, no study has investigated patterns of visual selectivity for faces or other complex objects in the marmoset or any New World primate species. Whether face-selective regions exist and how they are organized are important questions for at least two reasons. First, such information may shed light on the evolution of face processing and the homological relationships between face patches in macaques and humans. Second, if a similar network of face patches exists in the marmoset, researchers can use this species to study aspects of high-level social visual perception, taking advantage of its lissencephalic brain for electrophysiology (Mitchell et al., 2014), fMRI (Liu et al., 2013), and optical imaging experiments.
Materials and Methods
Subjects.
Four healthy adult male common marmosets (C. jacchus) were used in this study: two were used in the electrocorticography (ECoG) experiments, whereas the other two were used in the fMRI experiments. All experiments were approved by the Animal Care and Use Committee of the National Institute of Neurological Disorders and Stroke.
Implantation of ECoG arrays.
Each of the two marmosets was implanted with two 32 channel micro-ECoG arrays (NeuroNexus) in the right hemisphere. During surgery, the animals were positioned in a stereotaxic holder and ventilated with a mixture of air/oxygen and isoflurane anesthesia (1.5%–2%). Implantation of the ECoG arrays involved the removal of a bone flap (∼1.2 cm in length) over the occipitotemporal cortex. The anti-inflammatory corticosteroid dexamethasone (2 mg/ml, 0.04 ml) was administered intramuscularly before and after the surgery, and 3 ml of 20% mannitol solution was given intravenously during the surgery, to prevent brain swelling. The position of the arrays was determined relative to the opening of the lateral sulcus, which was visible through the dura mater. For each array, a slit was made in the dura, and the array was advanced through the slit to its end position. After positioning the electrodes, the connectors were attached to the bone using dental acrylic and the reference and ground wires were placed over the dura through a burr hole in the opposite side of the skull. The bone flap was then sutured back in place. In the same surgery, a low profile threaded polyethel ether ketone headpost base was implanted over the frontal region and held in place by nylon screws (size 0–80 × 3/32”, Plastics One) and dental acrylic. The animals were given antibiotics and analgesics daily for 5 d after surgery. Following the animals' recovery, the position of each ECoG electrode in the arrays was precisely determined based on a high-resolution (0.15 mm isotropic) T2-weighted anatomical scan. This determination was facilitated by the growth of tissue through small windows in the electrode array, which was visible as a regular matrix of small bright dots in the image. The electrodes were placed in slightly different positions in the two animals and together covered the ventral visual pathway from −10 mm to + 5 mm from the interaural line in the posteroanterior direction with 1 mm interelectrode distance.
Behavioral training.
Animals were gradually acclimated to body and head restraint in the sphinx position (see Fig. 1A) over a period of 3 weeks (Silva et al., 2011). For both ECoG and fMRI testing, animals wore a jacket that permitted stabilization to the testing cradle. After the head was fixed, testing began with an eye calibration procedure. As the techniques for fixation and video-based eye movement recording differed slightly between the ECoG and fMRI settings, these are described in more detail below. For eye calibration, the animals directed their gaze to a sequence of small dots (1° of visual angle) presented randomly at five different positions on the screen. Once the eyes were calibrated, testing began as described in the subsections below. During the experiments, the animals were rewarded with a drop of a sugary liquid reward (∼0.01–0.02 ml) for maintaining gaze within a 5° radius window of the center of the screen. Hereafter, we use the term “fixation” to refer to the periods when gaze in this relatively large window is maintained. The integration of stimulus presentation, reward delivery, and eye tracking was controlled by MonkeyLogic software (Asaad et al., 2013).
Visual stimuli.
We used five different image stimulus categories: conspecific faces, conspecific body parts, manmade objects, and two types of unstructured controls (see Fig. 1B). Both types of controls consisted of scrambled versions of the face stimuli. In the spatially scrambled version, control images were generated by first dividing the face images into 15 × 25 square tiles and then randomly shuffling the position of the tiles. In the phase-scrambled version, control images were created by permuting the phase information while preserving the amplitude information of the spectrum of the face images. All images were first histogram equalized to increase the image contrast. To minimize luminance differences across the stimulus categories, we then normalized all the images to match the total intensity to one of the face images, selected arbitrarily. Twenty exemplars from each of the five categories were used in both the fMRI and ECoG experiments. For ECoG experiments, the stimuli subtended 5°, 7°, or 10°. For the analysis described here, we observed no clear differences with stimulus size. Given the absence of stimulus size effects, we combined data from all image sizes for analyses described below. For the main fMRI experiments, the stimuli subtended 5° visual angle. We also conducted additional control fMRI experiments in which we varied the stimulus size between 3° and 7°.
ECoG testing procedures.
For ECoG testing, the animals' heads were restrained using chronically implanted headposts. The implanted ECoG electrodes were connected to a BrainAmp amplifier (Brain Products) for data recording. Local field potentials were sampled at 1000 Hz. Eye position was tracked using the EyeLink II video-based tracking (SR Research). Single images of faces, bodies, objects, and control patterns were presented briefly (350, 500, or 650 ms), with presentations separated by blank periods of variable duration (350, 500, or 650 ms). Stimuli from each of the five categories were randomly interleaved during blocks of trials consisting of 12–22 stimulus presentations. The marmoset received reward each time it successfully maintained fixation for two consecutive stimulus presentations. Trials in which the marmoset failed to fixate were discarded. Each session comprised ∼1000–1500 stimulus presentations over the course of 40–50 min, during which time the marmoset received ∼500–750 drops of reward.
fMRI testing procedures.
All MRI was performed in a horizontal 7T/30 cm MRI spectrometer (Bruker-Biospin). For fMRI testing, the animals had no implants and were constrained during the fMRI experiments using noninvasive, custom-built helmets designed to fit the contours of the individual animals' heads (Silva et al., 2011). Each of the two individualized helmets contained an embedded eight channel surface coil array to achieve whole brain coverage with high signal-to-noise ratio (Papoti et al., 2013a, b). BOLD functional images were acquired using a gradient-recalled EPI sequence with 18 axial slices (TE/TR = 26/2000 ms; FOV/slice thickness = 32 × 32/1 mm3; matrix = 64 × 64). A total of 512 volumes were acquired during each run. Eye position was tracked using the iView video-based tracking (SensoMotoric Instruments), and the face was monitored with an MR-compatible camera and infrared light source (MRC Systems). Visual stimuli from the same category were presented in 16 s blocks, with blocks of different categories randomly interleaved and separated by a fixed interval of 20 s in which the screen was uniformly gray. Within each block, individual stimuli from the same category were randomly selected and displayed for 500 ms, with no gap between stimuli. Before the beginning of each block, a fixation dot appeared, to which the marmoset was required to direct its gaze within 1.5 s, thus initiating the block sequence. If the animal failed to acquire or maintain fixation during this initial period, the block terminated and the next block began immediately. The fixation dot remained on the screen throughout each block, during which time the animal received a drop of reward every 1.5 s as long as it successfully maintained its gaze in the fixation window. Each run lasted 17 min 4 s (512 EPI volumes with TR = 2 s) and consisted of up to 28 valid blocks, depending on the marmoset's performance.
Coplanar RARE T2-weighted anatomical images (TEeffective/TR = 64/4000 ms; FOV/slice thickness = 32 × 32/1 mm3; matrix = 128 × 128) were collected each session for image registration. To visualize the highly myelinated cortical areas, T1-weighted anatomical images (MPRAGE, TE/TR/TI/TD = 3.5/12.5/1200/6000 ms) with 0.2 mm isotropic resolution were collected for each animal under anesthesia after finishing all fMRI sessions. The T1-weighted images were registered to the myelination MRI atlas (Bock et al., 2011) and transformed to the atlas space for visualization purposes.
ECoG data analysis.
The ECoG data were analyzed using EEGLAB (Delorme and Makeig, 2004) as well as custom codes written in MATLAB (MathWorks). We acquired 24 and 11 sessions from Marmoset S and Marmoset F, respectively. Local field potential signals were referenced locally by bipolar reference method. The bipolar pairs were assigned as the neighbor electrodes along the anterior–posterior direction, generating 55 rereferenced sites from 64 original electrodes. The signals were bandpass filtered using a fifth-order Butterworth filter with a window from 2 to 250 Hz. A notch filter was used to remove 60 Hz line noise. There were infrequent periods of broadband noise; these periods were automatically censored by detecting high-power artifacts in the range of 70 to 200 Hz. Epochs of 1200 ms for each stimulus presentation (200 ms before and 900 ms after the stimulus) were extracted for further analysis. As there were no clear differences in response selectivity due to the size of the stimuli, the data were collapsed across stimuli size. Spectrograms were created using Fast Fourier transforms with a running window of 200 ms throughout the 1200 ms epochs.
We focused on the mean time course of high-gamma local field potential power as a measure of neural activity. This measure was computed by first filtering the signals on each trial between 50 and 150 Hz, and then calculating their mean analytic amplitude over time (Freeman, 2004). The mean signals were then smoothed for presentation purposes using a 15 ms Gaussian kernel with SD of 6 ms (see Fig. 3). To analyze and map the high-gamma responses to different stimuli at different electrodes, we quantified both the amplitude and latency of the neural response. The average amplitude was computed within a time window from 50 to 400 ms following stimulus onset (used in Fig. 8A). The response latency was defined as the moment when the unsmoothed high-gamma amplitude time course first showed a positive or negative deflection that (1) exceeded five median absolute deviations (i.e., the median of the absolute deviations from the data's median, a robust measure of the variability of a univariate sample) from the median of the 100 ms prestimulus baseline, and (2) continued to exceed this baseline threshold for at least 30 ms. For Marmoset F, there were multiple sites in the dorsal row of the anterior array in which a very short latency (∼20 ms) appeared to be superimposed on a significantly longer latency visual response. These signals were not category selective and were presumably from the LGN, which lay directly underneath some of the contacts. To eliminate the contribution of the putative LGN response in the calculation of latency for these nine sites, we modeled the time courses with a two-peak Gaussian function and then subtracted the first Gaussian component from the raw time course. The latency was then measured in the modified time course. In 3 of the 110 sites recorded from both animals, the responses were too noisy for reliable latency measurement; thus, no latencies were reported.
fMRI data analysis.
Motion-correction and cross-session alignment were performed using AFNI (Cox and Hyde, 1997). We collected 12 and 9 sessions from Marmoset E and Marmoset B, respectively. Each session consisted of 3–6 runs depending on the animal's behavioral performance of that day. Analysis was restricted to those blocks in which the animals' gaze remained within a 5° radius window for >80% (12.8 s of 16 s) of the block duration. Details about the behavioral performance of each animal can be found in Tables 1 and 2. To further minimize the contribution of movements, we excluded volumes based on combined translations and rotations, with a threshold of 0.02 mm for Marmoset E and 0.1 mm for Marmoset B. With this threshold, we censored 5.1% and 2.0% of all the collected volumes in Marmoset E and Marmoset B, respectively. Visual responses, including visual selectivity, were based on analysis of volumes collected between 4 and 18 s after stimulus onset. Statistical tests for stimulus selectivity during this period were based on two-sample t test with correction for multiple comparisons (false discovery rate = 0.05). We calculated the responses in functional regions of interest by averaging the responses of voxels in coherent patches. Patches were operationally defined for this purpose as clusters of voxels in which responses to faces exceed those to objects by at least t > 5. The 3D brain surface was created by semimanually masking the cortical areas shown in the T1-weighted image using ITK-snap (Yushkevich et al., 2006) and Caret (Van Essen, 2012). The functional data were projected onto this surface using the Caret algorithm, which takes a weighted average of the functional data from voxels nearby each surface vertex (Van Essen, 2012). The final surfaces and the functional maps were rendered using custom code written in MATLAB.
Mapping foveal representation with fMRI.
In one of the fMRI animals (Marmoset B), we mapped the cortical foveal representation. For this experiment, we employed ultrasmall superparamagnetic iron oxide nanoparticles (USPIOs) (20 mg/kg IV, fabricated by the NIH Imaging Probe Development Center) to serve as an intravascular contrast agent. For mapping the foveal representation, the animal was presented with a block-design stimulus paradigm consisting of four 16 s blocks of different flashing checkerboard patterns displayed at the screen's periphery (3°–10° visual angle) (see Fig. 9A). To encourage the animal to restrict its gaze to the center of the screen, the fixation dot was replaced by a small (3° visual angle) movie of marmoset scenes. To estimate the central (“foveal”) cortical representation, we identified voxels with a percentage signal change >1.5% and a coefficient of variation <3.5 in all four stimulation blocks. In addition, only voxels with a maximum response difference of 30% across all four blocks were considered. The color coding of the foveal representation (see Fig. 9B) shows the consistency (range 70%–100%) of the responses across all four blocks.
Color coding the category selectivity of the fMRI and ECoG data.
Colors to indicate selectivity among faces (green), body parts (red), and objects (blue) were assigned for each voxel in the fMRI data or each electrode site in the ECoG data. We adopted the HSL (hue, saturation, and lightness) cylindrical color coordinates. The hue, the angle on the cylindrical color coordinates, was determined by stimulus preference. The saturation and lightness were used to represent the strength of the selectivity weighted by the relative responses to scrambled images.
Functional responses to faces, body parts, objects, and the two scrambled sets were first normalized to the largest response for each voxel or electrode site. The categorical selectivity strengths to faces (f), bodies (b), and objects (o) were then calculated as the difference in the response to the targeted stimuli and to the scrambled set with higher response. If the response to the scrambled set was higher than that to the target stimuli, the categorical selectivity strength would be set to 0. The point on color wheel (p) was then calculated using a vector sum as follows: where G⃗, R⃗, and B⃗ are unit vectors pointing to pure green, red, and blue, respectively, at the rim of the color wheel. In the resulting functional maps, regions with brightly saturated colors indicate strong category selectivity and relatively low responses to the scrambled images, whereas dimly unsaturated areas indicate low category selectivity or relatively high responses to the scrambled images.
Results
We measured cortical responses to faces and other object categories in four marmosets trained to direct their gaze toward images presented on a color display. The animals' heads were held firmly using implanted headposts in the electrophysiology experiments and noninvasive customized helmets in the fMRI experiments. The animals' eye position was tracked throughout all experiments using a video-based eye-tracking system. During testing, the animals were rewarded for maintaining their gaze upon the stimuli (Fig. 1A). The same stimulus set was used in the electrophysiology and fMRI experiments. The stimuli consisted of three main categories and two controls. The three main categories included 20 exemplars of conspecific faces, conspecific body parts, and manmade objects familiar to the animals. Control stimulus categories consisted of spatial- and phase-scrambled images of the faces (Fig. 1B). The following sections describe the responses in the marmoset brain to the different stimulus categories, beginning with the neuronal activity recorded using ECoG arrays.
Face-selective neuronal responses in the ventral pathway
We measured field potentials across a large swath of the occipitotemporal cortex of two marmosets (Marmoset F and Marmoset S) using implanted pairs of 32 channel ECoG arrays (Fig. 2A). The lissencephalic nature of the marmoset cortex allowed for a spatially continuous sampling of the ventral visual pathway from V1 to TE. Briefly presented images of faces, bodies, objects, and control patterns were interspersed with blank periods and randomly interleaved across trials (Fig. 2B).
We focused on category-specific changes in band-limited ECoG power changes, using bipolar referencing of each electrode to optimize measurement of local neural activity (see Materials and Methods). Most ECoG sites showed robust visual responses in the form of sustained power increases in the high-gamma band (50–150 Hz) accompanied by decreases in the beta band (15–30 Hz). An example broadband response is shown in Figure 2C, in which a site in area TEO responded most strongly to faces. As the high gamma frequency range has previously been linked to neural spiking in the vicinity of the electrode (Crone et al., 2006; Ray and Maunsell, 2011), we focus the remainder of our analysis on this signal as the basis for determining local visual selectivity.
The locations of all 110 sites from the two implanted marmosets are illustrated in Figure 3. Time courses of the high-gamma band power are shown across all stimulus categories for eight representative sites. Site 1 in V1 responded most strongly to finely scrambled stimuli. This characteristic was typical of posterior recording locations. Somewhat more anterior sites, such as Site 2 in V4, showed less of a preference for the scrambled images, especially in the second plateau phase of the response (after 200 ms). Sites 3–6 are representative face-selective sites positioned in distinct regions of temporal cortex, in this case from four different face patches. The more dorsal face-selective sites near the superior temporal sulcus (Sites 3 and 4) showed transient responses, whereas the more lateral and ventral face-selective sites (Site 5 in TEO and 6 in TE) were more sustained in their face-selective responses. The enhanced response for faces cannot be ascribed to a simple attention effect because other simultaneously recorded sites at nearby locations in TE responded with different patterns of category selectivity (e.g., Sites 7 and 8).
The spatial layout of the arrays allowed for characterization of visual response properties along a nearly continuous span of the ventral visual stream. In one analysis, we found there to be a gradual transition in the preference for stimulus structure, with responses at more posterior sites showing preference for finely scrambled stimuli and more anterior sites showing responses for structured stimuli (r = 0.61, p < 0.001; Fig. 4A). In another analysis, we found that the response latencies exhibited a coarse posterior-to-anterior gradient (r = 0.68, p < 0.001; Fig. 4B) that presumably reflected the stepwise propagation of visual signals along the occipitotemporal pathway. A few sites in anterior positions violated this trend and had very short latencies (∼50 ms; Fig. 4B). These short-latency sites were within the superior temporal sulcus (STS) and were primarily selective for faces. These sites, together with other face-selective anterior sites that had a much longer latency, are suggestive of two parallel pathways for ventral stream face processing, as has been suggested in humans and macaques (Calder and Young, 2005; Pinsk et al., 2009; Yovel and Freiwald, 2013). The coloring of the dots in Figure 4 indicates higher category selectivity at more anterior sites. The basis for coloration is described in more detail in the section “Spatial organization of visual category selectivity” below.
Face-selective areas revealed by fMRI BOLD contrast
To gain a broader picture of face selectivity across the brain, we performed fMRI experiments in the other two marmosets (Marmoset E and Marmoset B). We trained the animals to perform the task in a 7T horizontal scanner (Bruker-Biospin) while their heads were comfortably restrained using individualized, custom-built helmets that contained embedded eight channel radiofrequency receiving coils (Fig. 5A) (Silva et al., 2011; Papoti et al., 2013a, b). We presented the same sets of category-specific stimuli as before. However, now the stimuli were shown in a block design paradigm, in which each 16-s-long block consisted of a sequence of images selected randomly from a single category and presented every 0.5 s (see Materials and Methods; Fig. 5B). During the behavioral task, we acquired BOLD responses at a spatial resolution of 0.5 × 0.5 × 1.0 mm. Visual responses were observed throughout the occipitotemporal visual cortex for all categories of visual stimuli, typically reaching 0.5%–3% BOLD signal changes in individual voxels The time courses of the signals revealed a clear and sustained, block-driven hemodynamic response that reached a maximum at ∼4 s after block onset.
We asked whether these fMRI responses were selective for individual stimulus categories. To address this question, we contrasted the fMRI response magnitude between faces and objects, a common contrast used in fMRI studies in humans (Kanwisher et al., 1997) and macaques (Tsao et al., 2003). The resulting fMRI maps identified at least five circumscribed cortical regions that responded more strongly to faces than to objects. These face-selective patches, together with regions responding less to faces than to objects, are shown on a surface activity map of the right hemisphere of Marmoset E (t test, p < 0.05, corrected for multiple comparisons; Fig. 6A). This activity was also visible directly on parasagittal slices (Fig. 6B).
We focused on the temporal lobe regions in which faces elicited stronger responses (“face patches,” Fig. 6C). We labeled these areas based on their positions within the occipitotemporal cortex, approximating a naming convention applied previously in the macaque (Moeller et al., 2008). For each patch, the position within known extrastriate areas was determined based on registration with a recently published atlas of the marmoset brain (Paxinos et al., 2011). From anterior to posterior, these consisted of an AD (anterior dorsal) and a MD (middle dorsal) patches along STS, a PD (posterior dorsal) patch in area V4t/FST, a PV (posterior ventral) patch at the V4/TEO border, and an O (occipital) patch at the V2/V3 border. For reference in Figure 6C, we also mark the position of face-selective area MV (middle ventral) more ventrally in TE, which was observed in the ECoG recordings but not visible in fMRI due to basal susceptibility-induced artifacts.
In addition to the atlas registration, we further confirmed the anatomical location of the face patches by comparing the functional maps to a high-resolution cortical myelogram obtained previously from five other marmosets. The level of cortical myelination was determined with T1-weighted images (Bock et al., 2009, 2011). Primary sensory areas and visual areas MT and DM show higher myelination than other cortical areas (Fig. 6C, inset). Overlapping the face patches with the boundary drawn from cortical myelination (Fig. 6C, white dashed lines), we found that face area PD is outside and ventral to area MT, in a location consistent with the location of areas V4t and FST in the atlas (Paxinos et al., 2011) and that face area O is primarily within areas V2/V3.
Figure 7 shows BOLD fMRI signal time courses for each of the face patches. Within each patch, faces elicited higher fMRI responses, with a gradation across the other categories that differed between the patches. The fMRI time courses suggest a progression of face selectivity within the occipitotemporal pathway. Specifically, AD and MD responded almost exclusively to faces, PV and PD showed intermediate responses to bodies and objects as well, and O responded strongly to all three categories, with a small but highly significant preference for faces.
Spatial organization of visual category selectivity
To summarize the category selectivity observed in both the ECoG and fMRI data, we created spatial maps of relative responses to faces, body parts, and objects along the occipitotemporal pathway. We assigned a color to each single electrode site or voxel (see Materials and Methods) based on the category selectivity. As shown in the color wheel (Fig. 8), pure green, red, and blue indicated that a site responded exclusively to faces, body parts, or objects, respectively. Other “intermediate” colors, such as orange, cyan, and purple, indicated prominent responses to more than one category. Low category selectivity, or stronger responses to scrambled than to structured stimuli, are reflected in less saturated, grayish colors. The selectivity map of the electrophysiology data (Fig. 8A) revealed four face-selective electrode clusters in face patches PV, MV, PD, and MD, interleaved with zones of sites in V4 and TE responsive to body parts and, to some extent, objects. This category selectivity stood in sharp contrast to the stronger responses to scrambled stimuli found in the posterior sites, which appear gray. The same color scheme was used to indicate the category selectivity of the dots in Figure 4.
The functional ECoG map can be compared directly with the corresponding map of the fMRI selectivity computed using the same measure, shown in Figure 8B. In the fMRI map, it is evident that the more anterior patches also have more saturated colors, particularly the green face-selective regions. Similar to the ECoG results, these face patches were interleaved with regions in V4 and TE that responded more strongly to body parts and objects. We did not find any locations in our fMRI experiments that responded selectively only to body parts or objects. However, some areas on the margins of the face patches responded to both body parts and faces (yellow-colored voxels). This finding is broadly consistent with previous findings in macaques and humans of body-selective areas located directly adjacent to face patches (Tsao et al., 2003; Peelen and Downing, 2007; Weiner and Grill-Spector, 2013).
Finally, we investigated the relationship between the face patches and the retinotopic foveal representation in the extrastriate cortex of Marmoset B (Fig. 9). Using a technique in which the marmoset maintained its gaze on a small 3° diameter movie presented at the center of the screen, we mapped the stimulated voxels throughout the visual pathway (see Materials and Methods). The resulting map revealed a large cortical area extending from V1 to V4 that contained the expected pattern of foveal representation, with a few additional smaller islands of positive activity, including area MT (indicated by the white arrow). Superimposition of the outlines of the face patches obtained from this same animal (Fig. 10B) onto this map (dashed outlines) revealed no obvious relationship. The most posterior face patch O shared some overlap with the ventral portion of the foveal representation, but there was minimal overlap with the other face patches. This finding is similar to recent observations in the macaque (Janssens et al., 2014, their Fig. 6C) and is generally inconsistent with the idea that face patches are specializations of foveally biased extrastriate regions (Hasson et al., 2002), although clearly more work is needed on this important topic.
Consistency across animals, stimulus scale, and sessions
We conducted additional analyses and experiments to evaluate the robustness of the face-selective responses. First, we compared the fMRI maps of category-selective responses across hemispheres and across animals (Fig. 10). The face patches were more pronounced in the right hemisphere in both animals. In one of the animals, we also observed a face-selective region in the prefrontal cortex (Fig. 10B). Importantly, all five fMRI-identified patches were observed in both hemispheres of each animal. Second, we analyzed separately the odd and even sessions from different days of both the fMRI and the ECoG analysis and showed that in each case the spatial pattern of selectivity was reproducible (fMRI in Fig. 11A; ECoG in Fig. 11B). Finally, control experiments from two fMRI sessions demonstrated that the face patches were robust over more than a twofold range of stimulus sizes (3°–7°). All five face patches found within the main fMRI experiment were still visible by contrasting various sizes of faces to body parts (Fig. 11C).
Discussion
Comparing face-selective regions in marmosets, macaques, and humans
This study reveals multiple visual cortical regions specialized for processing faces in the common marmoset. To the best of our knowledge, the present study is the first to systematically map high-level visual selectivity in awake, behaving animals other than macaques and humans. Although further comparative work is necessary to determine the precise areal homology between primate species, the overall arrangement of face patches in the marmoset brain, and their approximate distribution in the occipitotemporal cortex, appears very similar to macaque and human (Tsao et al., 2008; Lafer-Sousa and Conway, 2013).
The two most anterior face patches along the marmoset STS (MD and AD) are similar in position to the macaque middle (MF/ML) and anterior (AF/AL) face patches, respectively. Their positions also bear resemblance to two face-selective patches in anterior and posterior STS of the human (Carlin et al., 2011; Pitcher et al., 2011), although the correspondence between the macaque and human is presently a matter of speculation (Yovel and Freiwald, 2013). The marmoset PV patch is also coarsely similar in position to macaque PL, although extended more posteriorly than in the macaque, into V4. The PD patch, which is located in V4t/FST, is similar in position to a recently described face-selective patch pPL in macaque V4t (Janssens et al., 2014). Based on its topological position, the most ventral face patch MV, which we discovered using ECoG recordings, is a putative homolog with the fusiform face area in the human (Kanwisher et al., 1997), although clearly more study is needed. The face patch O, whose correspondence to the macaque face system is not obvious, was a consistent feature of the fMRI data but was curiously absent in the ECoG results. Although the basis of this difference is unknown, it is interesting to speculate that it may be due to an inherent difference between the methods, with the BOLD signal better reflecting certain types of top-down modulation in the retinotopically organized visual areas (Maier et al., 2008). Direct comparison of this and other face patches with an activity-based estimation of the foveal extent did not suggest a clear retinotopic bias of face patches.
Future work will aim to gain information about potential homology of specific face patches in the different species. One important approach will be to compare the neuroanatomical projections of the different face patches in the marmoset with those of the macaque, which are presently being investigated (Grimaldi et al., 2013). Another avenue will be to compare aspects of face-selective single-unit responses. In the macaque, response properties, such as the sensitivity to face identity, viewing angle, and other attributes, differ among face patches (Freiwald and Tsao, 2010), and it is possible that analogous differences will exist among face patches in the marmoset. Finally, neural responses during free viewing paradigms can be used to infer area correspondences between species. One study applied this approach recently to study the regional similarity in the fMRI response time courses of macaques and humans watching the same video clips (Mantini et al., 2012). Although strict homology is notoriously difficult to establish, these types of comparative studies can shed light on which elements of face processing are conserved among primates, or more broadly among mammals, and which may have evolved more recently along the macaque or human lines (Leopold and Rhodes, 2010).
The dorsal and ventral streams of face processing
One similarity between our data and previous reports of face-selective responses in macaques and humans is the apparent division between dorsal and ventral face patches (Calder and Young, 2005; Pinsk et al., 2009; Yovel and Freiwald, 2013). The marmoset dorsal areas PD, MD, and AD, which run along the shallow STS, may constitute a pathway for face processing that is distinct from the more ventral areas PV and MV. Although the coverage of our ECoG only permitted us to measure responses from four of these patches, the data were consistent with the possibility of two distinct streams. Specifically, the high-gamma responses to faces in the dorsal areas PD and MD were transient, which might imply that they receive a predominance of magnocellular input from a pathway leading through MT and FST. By contrast, PV and MV showed more sustained responses with a longer delay, which may indicate a larger contribution from the parvocellular visual pathway (Ferrera et al., 1992; Schmolesky et al., 1998). Anatomical tracer studies in the macaque are consistent with the view that multiple channels pass complex visual information into the temporal cortex, including to the face-selective regions (Kravitz et al., 2013). Such a division of labor in face processing may reflect different types of information extracted from faces. In the macaque, the more dorsal pathway into the STS is commonly associated with processing dynamic and animated stimuli (Oram and Perrett, 1994, 1996; Pitcher et al., 2011; Polosecki et al., 2013), whereas face-selective responses in the ventral visual pathway are more closely associated with visual recognition (Leopold et al., 2006; Freiwald and Tsao, 2010). Although further studies are needed, the apparently similar parallelism in the face patches of marmosets, macaques, and humans suggests that this feature of the ventral stream specialization for faces emerged before the split between Old and New World primates.
The marmoset as a primate model for visual neuroscience
Our findings of robust functional brain mapping in the awake, behaving marmoset underscores the great potential of this species as an experimental model for visual and systems neuroscience (Okano et al., 2012; Kishi et al., 2014; Solomon and Rosa, 2014). Previous work has shown that the basic layout of the marmoset brain, and in particular its visual cortex, shares primate-specific features with macaques (Rosa and Tweedale, 2005; Preuss, 2007; Cheong et al., 2013; Lui et al., 2013; McDonald et al., 2014; Yu and Rosa, 2014). Our study, together with recent work showing that marmosets can be trained to perform psychophysical tasks (Mitchell et al., 2014), suggests that this species can be used as a complementary species to the macaque to probe the neurobiology of cognition.
Importantly, the awake marmoset model opens the door to areas of investigation that are either impossible or impractical in the macaque. For example, it is possible to map the uninterrupted cortex along occipitotemporal and occipitoparietal pathways not only with fMRI and electrophysiological methods, but also with optical methods. At present, molecular and viral technologies developed in the mouse are increasingly available in the marmoset (Okada et al., 2013; Susaki et al., 2014; Watakabe et al., 2014), including the production of transgenic animals (Sasaki et al., 2009). Future experiments are likely to exploit these advances to study cognition, for example, by combining optogenetic circuit manipulation and large field optical imaging in the awake marmoset. The potential to functionally dissect circuits in this way can provide important new perspectives on the relationship between cortical activity and cognition, including aspects of high-level visual social perception supported by similar circuitry in the human brain.
Footnotes
This work was supported by the Intramural Research Programs of the National Institute of Neurological Disorders and Stroke and National Institute of Mental Health. We thank Dr. R. Saunders for implanting the ECoG arrays in two animals; J. Day Cooney, L. Zhang, J. Mackel, T. Talbot, and R. Villadiego for technical assistance; and Dr. R. Berman for comments on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Afonso C. Silva, Cerebral Microcirculation Section, Laboratory of Functional and Molecular Imaging, National Institute of Neurological Disorders and Stroke, National Institutes of Health, 49 Convent Drive, MSC 1065, Building 49, Room 3A72, Bethesda, MD 20892-1065. SilvaA{at}ninds.nih.gov