Abstract
The representation of object identity is fundamental to human vision. Using fMRI and multivoxel pattern analysis, here we report the representation of highly abstract object identity information in human parietal cortex. Specifically, in superior intraparietal sulcus (IPS), a region previously shown to track visual short-term memory capacity, we found object identity representations for famous faces varying freely in viewpoint, hairstyle, facial expression, and age; and for well known cars embedded in different scenes, and shown from different viewpoints and sizes. Critically, these parietal identity representations were behaviorally relevant as they closely tracked the perceived face-identity similarity obtained in a behavioral task. Meanwhile, the task-activated regions in prefrontal and parietal cortices (excluding superior IPS) did not exhibit such abstract object identity representations. Unlike previous studies, we also failed to observe identity representations in posterior ventral and lateral visual object-processing regions, likely due to the greater amount of identity abstraction demanded by our stimulus manipulation here. Our MRI slice coverage precluded us from examining identity representation in anterior temporal lobe, a likely region for the computing of identity information in the ventral region. Overall, we show that human parietal cortex, part of the dorsal visual processing pathway, is capable of holding abstract and complex visual representations that are behaviorally relevant. These results argue against a “content-poor” view of the role of parietal cortex in attention. Instead, the human parietal cortex seems to be “content rich” and capable of directly participating in goal-driven visual information representation in the brain.
SIGNIFICANCE STATEMENT The representation of object identity (including faces) is fundamental to human vision and shapes how we interact with the world. Although object representation has traditionally been associated with human occipital and temporal cortices, here we show, by measuring fMRI response patterns, that a region in the human parietal cortex can robustly represent task-relevant object identities. These representations are invariant to changes in a host of visual features, such as viewpoint, and reflect an abstract level of representation that has not previously been reported in the human parietal cortex. Critically, these neural representations are behaviorally relevant as they closely track the perceived object identities. Human parietal cortex thus participates in the moment-to-moment goal-directed visual information representation in the brain.
Introduction
Revising the traditional view that nonspatial visual representation is exclusively the function of the ventral processing stream (Ungerleider and Mishkin, 1982), dorsal areas along the primate intraparietal sulcus (IPS) have also been shown to participate in such representation (Sereno and Maunsell, 1998; Konen and Kastner, 2008; Bettencourt and Xu, 2015). One important aspect of parietal visual feature representation is its ability to track task and intention (Snyder et al., 2000; Toth and Assad, 2002). For example, fMRI response amplitude in an area expanding the human superior IPS (henceforward referred to as superior IPS) tracks behavioral visual short-term memory (VSTM) capacity for a variety of task-relevant object features such as color and shape (Todd and Marois, 2004, 2005; Xu and Chun, 2006; Xu, 2010; Jeong and Xu, 2013; see also Xu and Jeong, 2015 and related monkey neurophysiology studies, Freedman and Assad, 2006).
In everyday vision, task-relevant visual information varies drastically across tasks, ranging from simple features, such as color and shape, to high-level representations, such as abstract object identity, invariant to changes in view point, size, and other nonessential features. Although abstract object identity representation is fundamental to primate vision, it is believed to reside in occipital and temporal cortices, and its existence in IPS regions has never been shown. If IPS regions are to play a critical role in task-driven visual representation, it must be capable of representing a great variety of visual information, including abstract object identity. Moreover, such representation must be directly linked to behavior. Here, we provide evidence showing both in the human parietal cortex.
Among the many object identities we extract in everyday vision, face identity is perhaps the most challenging one to form, owing to the greater number of changes associated with a face without altering its identity, such as changes in viewpoint, expression, hairstyle, and age. Reflecting this computational challenge, a specialized brain network has been dedicated to face processing, and its damage can lead to significant face-processing deficits (De Renzi et al., 1994; Haxby et al., 2000). Previous imaging studies have reported the decoding of face-identity representation in regions surrounding the right fusiform face area (FFA) and in anterior temporal cortex (ATL; Kriegeskorte et al., 2007; Nestor et al., 2011; Anzellotti et al., 2014). As the face stimuli used in these decoding studies tended to covary with low-level features, the decoding of low-level features, rather than face identity per se, could account for some of these findings. Additionally, success in a simple binary decoding between two faces, as was done previously, could also be driven by high-level features unrelated to face-identity representation, such as familiarity, attractiveness, and trustworthiness. Last, none of these decoding studies directly correlated neural representations with behavior. Thus, despite evidence provided by neuropsychological studies, fMRI decoding studies have not firmly established the presence and functional significance of abstract face-identity representations in ventral regions.
Given the importance of face-identity representation in everyday life, to provide the most challenging test of the ability of parietal cortex to represent abstract object identity, in this study we tested its ability to represent face identity. To avoid the pitfalls of the previous face-decoding studies, we used real-world face images varying freely in viewpoint, expression, hairstyle, and age. To generalize our findings, we also tested the representation of a nonface everyday object. Additionally, we tested whether or not parietal face representations track behaviorally perceived face identity similarities. Using fMRI multivoxel pattern analysis (MVPA), we examined responses from superior IPS that were previously shown to exhibit strong task-related visual processing, face- and object-selective regions in ventral occipital cortex, and task-activated prefrontal and parietal regions.
Materials and Methods
Participants
Thirteen (nine females; mean age, 28.6 ± 4.7 years), 13 (eight females; mean age, 28.2 ± 4.9 years), and 11 (eight females; mean age, 28.64 ± 3.5 years) observers participated in Experiments 1, 2, and 3, respectively. Of these observers, three observers (all female) participated in all three experiments, five (two females) participated in both Experiments 1 and 2, two (all female) participated in both Experiments 1 and 3, and two (one female) participated in both Experiments 2 and 3. In addition to these observers, three additional observers from Experiment 1, two from Experiment 2, and two from Experiments 3 were tested but later excluded from data analysis due to excessive head motion during the experiment (>3 mm), a failure to localize all regions of interest (ROIs), or the observer's failure to stay awake during the experiment. All observers were right handed and had normal or corrected-to-normal visual acuity. They were recruited from the Harvard University community, gave informed consent before participation, and received payment. The experiments were approved by the Harvard University Committee on the Use of Human Subjects.
Experimental design
Main fMRI experiments.
In Experiment 1, face images of two well known actors, Leonardo DiCaprio and Matt Damon, were used as stimuli. We constructed two unique face sets for each actor, with each containing five frontal and five profile/three-quarter view faces of the actor. In addition to faces, we also constructed two name sets for each actor, each containing their last names printed in 10 unique fonts (for a total of 20 unique fonts). The face and name images subtended ∼11.5° × 8.5° and 10.5° × 3.0°, respectively.
The 10 images from a given set of faces or names were presented sequentially in an 8 s block, with each image appearing for 300 ms followed by a blank display for 500 ms. The presentation of the face and name blocks was randomly intermixed within each run. Eight-second-long fixation blocks were presented at the beginning and end of each run and following each face/name block. Observers viewed the face or name images and detected the presence of an oddball face or name drawn from one of eight other actors (James Dean, Daniel Day-Lewis, Robert DeNiro, Gerard Depardieu, Johnny Depp, Matt Dillon, Michael Douglas, and Robert Downey Jr.). Two face images and two name images were used for each of the oddball actors. All target and oddball actors' last names started with the letter “D” in an effort to discourage observers from attending only to the first letter of each last name instead of the entire last name in the oddball-name task.
Each run contained four face blocks and four name blocks with no oddballs, and one or two face or name blocks, each containing a single oddball. Blocks containing an oddball were excluded from further data analysis to remove any effects of oddball detection from the analysis. When only one oddball block was present in a run, a dummy block containing no oddball was added to ensure that all runs had the same length whether or not the run contained one or two oddball blocks. The dummy block was randomly chosen from one of the face or name blocks and was removed from further analysis. Each observer completed 10 runs, each lasting 2 min and 45 s.
In Experiment 2, the same oddball detection paradigm was used with the images and names of two distinctive car models, BMW Mini and Volkswagen Beetle. Cars were shown in different sizes, from different viewpoints, and on different background scenes to more closely resemble how they would naturally appear in everyday visual perception. The car images and car names subtended ∼11.5° × 7.5° and 8.0° × 4.0°, respectively. The oddball stimuli were drawn from 16 other car models (Honda Accord, Nissan Altima, Toyota Camry, Honda Civic, Toyota Corolla, Chevrolet Cruze, Nissan Cube, Volkswagen Golf, Chevrolet Malibu, Ford Mustang, Honda Odyssey, Nissan Pathfinder, Toyota Prius, Land Rover Range Rover, Mercedes-Benz Roadster, and Hyundai Sonata). One car image and one name image were used for each of the oddball cars.
In Experiment 3, face images of eight famous actors (Leonardo DiCaprio, Matt Damon, Brad Pitt, George Clooney, Tom Cruise, Tom Hanks, Nicolas Cage, and Russell Crowe) were used as stimuli. These actors were the top 8 from among a total of 40 actors that an independent group of observers rated in a behavioral familiarity rating test. As in Experiment 1, two sets of unique face images were constructed for each actor, with five frontal and five three-quarter view faces in each face set. As no profile view images were used in this experiment, to ensure that face images from all actors were easily recognizable, the face images of both DiCaprio and Damon included some used in Experiment 1 and some new ones. The oddball stimuli were frontal and three-quarter view face images from 16 other famous actors (Christian Bale, Daniel Craig, Jude Law, Michael Douglas, Jack Nicholson, Colin Firth, Robert DeNiro, Bruce Willis, Orlando Bloom, Richard Gere, Mel Gibson, Ashton Kutcher, Ben Stiller, Joseph Gordon-Levitt, Benedict Cumberbatch, and Robert Downey Jr.). One frontal view and one three-quarter-view image were used for each of the oddball actors. The name blocks were not included in this experiment. Each run contained 16 stimulus blocks, 2 for each of the target actors, 1 or 2 oddball blocks, and 1 dummy block when there was only 1 oddball block present in the run. Each observer completed 16 runs, each lasting 5 min and 4 s. All other aspects of the design were identical to those of Experiment 1.
Superior IPS localizer.
Superior IPS was localized following the methods used by Todd and Marois (2004) and Xu and Chun (2006). In an event-related object VSTM experiment, observers viewed, in the sample display, a brief presentation of one to four everyday objects and, after a short delay, judged whether the probe object (a new object) shown in the test display matched the category of the object that appeared in the same location in the sample display. A match occurred in 50% of the trials. Gray-scaled photographs of objects from four categories (shoes, bikes, guitars, and couches) were used. Objects could appear above, below, to the left, or to the right of the central fixation. The locations of the objects were marked by white rectangular placeholders that were always visible during the trial. The placeholders subtended 4.5° × 3.6° and were 4.0° away from the fixation (center to center). The entire display subtended 12.5° × 11.8°. Each trial contained the following: fixation (1000 ms), sample display (200 ms), delay (1000 ms), test display/response (2500 ms), and feedback (1300 ms). With a counterbalanced trial history design (Todd and Marois, 2004; Xu and Chun, 2006), each run contained 15 trials for each set size and 15 fixation trials in which only the fixation dot was present for 6 s. Two filler trials were added at the beginning and one at the end of each run for practice and trial history balancing purposes. They were excluded from data analysis. Each observer was tested with two runs, each lasting 8 min.
Lateral occipital region/FFA/parahippocampal place area localizer.
Lateral occipital (LO) region, FFA, and parahippocampal place area (PPA) were localized following the methods used by Kourtzi and Kanwisher (2000), Kanwisher et al. (1997), and Epstein and Kanwisher (1998), respectively. Observers viewed blocks of sequentially presented face, scene, object, and scrambled object images (all subtended ∼12.0° × 12.0°). The images used were photographs of grayscale male and female faces, common objects (e.g., cars, tools, and chairs), indoor and outdoor scenes, and phase-scrambled versions of the common objects. Observers monitored the images for a slight spatial jitter, which occurred randomly once in every 10 images. Each run contained four blocks each of scenes, faces, objects, and phase-scrambled objects. Each block lasted 16 s and contained 20 unique images, each appearing for 750 ms followed by a 50 ms blank display. Eight-second-long fixation blocks were included at the beginning, middle, and end of each run. Each observer completed two runs, each lasting 4 min and 40 s.
Behavioral visual search experiment.
In the visual search experiment performed outside the scanner, all the observers from Experiment 3 searched for a target actor face embedded among the faces of a distractor actor. Each observer completed eight blocks of trials, with each of the eight actors in Experiment 3 serving as the target actor for one block and the remaining seven actors serving as the distractor actors for that block. Each block began with an instruction showing the name of the target actor for that block. Observers then viewed six faces, three in frontal view and three in three-quarter view, appearing in a circular array around the fixation (see Fig. 4C) and made a speeded target present/absent judgment. A target actor face was present in 50% of the trials and was shown equally often in each of the six possible locations across the different trials. In the target-present trials, the target image was randomly chosen from the six possible face images of the target actor, and the five distractor images were chosen from the six possible face images of the same distractor actor. The distractor actor for each trial was chosen from the seven possible distractor actors with equal probability. In the target-absent trials, all the face images were from the same distractor actor. In a given block, because observers were only told to search for the target actor identity, and not a specific face image of that actor, they had to form an abstract identity representation of that actor to perform the search efficiently, similar to what was done during the oddball detection task in the fMRI part of the experiment.
Each block of trials contained 28 practice and 84 experimental trials (seven distractor actors × six locations × two target appearances). When observers made an incorrect response, a red unhappy face flickered at fixation for 5 s. Incorrect trials were repeated at the end of each block until correct responses were obtained for all the trials in that block. Search speed was calculated from all the correct trials, and only search speed, and not accuracy, was included in further analysis.
fMRI methods
fMRI data were acquired from a Siemens Tim Trio 3 T scanner at the Harvard Center for Brain Science (Cambridge, MA). Observers viewed images back-projected onto a screen at the rear of the scanner bore through an angled mirror mounted on the head coil. All experiments were controlled by an Apple MacBook Pro laptop running MATLAB with Psychtoolbox extensions (Brainard, 1997). For the anatomical images, high-resolution T1-weighted images were acquired (repetition time, 2200 ms; echo time, 1.54 ms; flip angle, 7°; 144 slices; matrix size, 256 × 256; and voxel size, 1 × 1 × 1 mm). Functional data in the main experiments and in the LO/FFA/PPA localizers were acquired using gradient-echo echoplanar T2*-weighted images (repetition time, 2000 ms; time to echo, 30 ms; flip angle, 90°; 31 slices; matrix size, 72 × 72; voxel size, 3 × 3 × 3 mm; 88 volumes for Experiments 1 and 2, 152 volumes for Experiment 3, and 140 volumes for the LO/FFA/PPA localizer). Functional data in the superior IPS localizer were acquired using gradient-echo echoplanar T2*-weighted images with slightly different parameters than in the main experiments (repetition time, 1500 ms; time to echo, 29 ms; flip angle, 90°; 24 slices; matrix size 72 × 72; voxel size, 3 × 3 × 5 mm; 320 volumes). All functional slices were oriented near horizontal to optimally cover parietal, occipital, and ventral cortices. This resulted in the partial exclusion of anterior temporal and orbitofrontal cortices.
Data analyses
fMRI data were analyzed in each observer's native space with BrainVoyager QX 2.0 (http://www.brainvoyager.com). Data preprocessing included 3D motion correction, slice acquisition time correction, and linear trend removal. No spatial smoothing or other data preprocessing was applied.
ROI definitions.
fMRI data from the localizer runs were analyzed using general linear models (GLMs). The superior IPS ROI was defined, following Todd and Marois (2004) and Xu and Chun (2006), as the collection of voxels that tracked each observer's behavioral VSTM capacity. To localize these voxels, we first obtained each observer's behavioral VSTM capacity using the K formula (Cowan, 2001). We then performed a multiple regression analysis on the fMRI data for each individual observer with the regression coefficient for each set size weighted by that observer's behavioral VSTM capacity for that set size. The superior IPS ROI was defined as voxels in bilateral parietal cortex showing significant activations in the regression analysis [false discovery rate (FDR), q < 0.05, corrected for serial correlation]. More details of this analysis can be found in the studies of Todd and Marois (2004) and Xu and Chun (2006). We did not find in any of the experiments a significant main effect of hemisphere (F values <1.3, p values >0.26) or an interaction between identity decoding and hemisphere (F values <2.3, p values >0.16). We therefore combined the left and right superior IPS into one single ROI.
As in the studies by Grill-Spector et al. (1998) and Kourtzi and Kanwisher (2001), LO was defined as the collection of continuous voxels in the bilateral lateral occipital cortex, showing higher activation in response to objects than to noise (FDR, q < 0.05, corrected for serial correlation). The right FFA was defined, as in the study by Kanwisher et al. (1997), as the collection of continuous voxels in the right fusiform gyrus showing higher activation for faces than for scenes and objects (FDR, q < 0.05, corrected for serial correlation). We included only the right FFA in our main analyses because face processing has been shown to be right lateralized (De Renzi, 1994). As such, including only the right FFA would provide us with a better chance of obtaining a significant decoding result [we were able to localize the left FFA in the majority of our participants, 11 of 13 in Experiment 1, and 10 of 11 in Experiment 3; but, just like the right FFA, the left FFA did not show face-identity decoding in either Experiment 1 (within-identity vs between-identity correlation, t < 1, p > 0.38) or Experiment 3 (t < 1, p > 0.60)]. Following Epstein and Kanwisher (1998), PPA was defined as the collection of continuous voxels in the bilateral collateral sulcus and parahippocampal gyrus showing higher activations for scenes than for faces and objects (FDR, q < 0.05, corrected for serial correlation).
Similar to Cohen et al. (2002), the visual word form area (VWFA) was localized using the data from the oddball detection task. It was defined as the collection of continuous voxels in the left middle fusiform gyrus that showed higher activation for names than images (faces names/images in Experiment 1, car names/images in Experiment 2; FDR, q < 0.05, corrected for serial correlation).
In Experiment 3, to define the regions in lateral prefrontal cortex (LPFC), and additional regions in posterior parietal cortex (PPC) and ventral occipital/temporal cortices (VOTC) activated during the main task, we first selected the continuous set of voxels showing a higher response to the face stimuli than to fixation in the main task (FDR, q < 0.05) in the respective regions of the brain. We then excluded voxels from the frontal eye field, insular and anterior cingulate cortex from LPFC, superior IPS from PPC, and LO region, the right FFA, and early visual areas (localized by the contrast of the scrambled objects greater than the intact objects in the LO localizer task) from VOTC. To further remove the contribution of superior IPS from the PPC ROI and examine whether or not parietal regions aside from superior IPS would show a response profile similar to that of superior IPS, we constructed a second PPC ROI and removed superior IPS plus all the voxels surrounding it that were approximately the size of superior IPS. This was done by first relaxing our threshold to define superior IPS until an ROI twice the size of the original one and encompassing the original one was obtained. We then subtracted this larger superior IPS ROI from the original PPC ROI to define a parietal ROI without superior IPS.
MVPA.
MVPA was performed with custom-written MATLAB code. In each observer, we overlaid the ROIs onto the data from the main experiments, applied a GLM, and extracted β-weights for each stimulus set in each voxel of each ROI. To decode the identity representation in each ROI, we compared the correlation coefficients between voxel response patterns from stimulus sets that shared the same identity (within-identity correlation) with those from stimulus sets that did not share an identity (between-identity correlation; Figs. 1A, 2A). Thus, for face-identity representation, the within-identity correlation would be the correlation between Damon face set 1 and Damon face set 2, and between DiCaprio face set 1 and DiCaprio face set 2; and the between-identity correlation would be the correlation between Damon face set 1 and DiCaprio face set 1, between Damon face set 1 and DiCaprio face set 2, between Damon face set 2 and DiCaprio face set 1, and between Damon face set 2 and DiCaprio face set 2. If the average of all the within-identity correlations was higher than the average of all the between-identity correlations in an ROI, we would infer that abstract identity information could be decoded, and thus represented, in that brain region. Correlation coefficients were Fisher transformed to ensure normal distribution of the values before statistical comparisons were made. All t tests were two-tailed. When an ANOVA was performed at the group level, the Greenhouse–Geisser correction was applied when the sphericity assumption was violated.
In additional analyses, to ensure that voxel number differences among the different ROIs could not account for the differences in the results, we limited the total number of voxels in each ROI by selecting the 50 most active voxels based on their average response amplitudes across all the stimulus conditions. In the larger ROIs, including superior IPS, LO region, and PPA, we were able to obtain 50 voxels in each observer in each ROI that fulfilled this selection criterion. In the smaller ROIs, including the right FFA and VWFA, we were able to select 50 voxels in the majority of the observers in each ROI that fulfilled this selection criterion (Table 1).
Number of voxels in each ROI
Behavioral and neural similarity measures of face identity.
To construct the behavioral similarity measure of face identity across the eight actors in Experiment 3, we obtained the search speed for each actor paired with each of the other seven actors. We then averaged all trials containing the pairing of the same two actors, regardless of which actor was the target and which was the distractor, as search speed from both types of trials reflected face-identity similarity between the two actors. This left us with a total of 28 actor pairs. Additionally, as search speed for the target present and absent trials were highly correlated with each other in each observer (p values <0.026), these two types of trials were also combined (the behavioral face similarity measure did not differ if only the target present or absent trials were included). Search speeds were extracted separately from each observer and then averaged across observers to form the group-level behavioral similarity measure of face identity across the eight actors.
To construct the neural similarity measure of face identity across the eight actors in each ROI, we Fisher transformed the correlation coefficient of the neural response pattern correlation between each of the 28 pairs of actors. These correlations were performed separately for each observer and then averaged across observers to form the group-level neural similarity measure of face identity across the eight actors separately for each ROI.
The behavioral and neural similarity measures of face identity across the eight actors were then directly correlated in each ROI. If neural response patterns in a brain region reflected perception, then we should see a high correlation between the behavioral and neural measures of similarity (Kriegeskorte et al., 2008). The significance of such a correlation was evaluated using a permutation test in which the values within the behavioral and neural measures of similarity were randomly shuffled and then correlated for 10,000 iterations to derive the mean and the SD of the baseline correlation value distribution. We then compared the real correlation value with this baseline correlation value to assess whether the real correlation value was significantly above chance (i.e., at the tail end of the distribution).
Comparing representational structures using multidimensional scaling.
To compare the representational structures of the different brain regions examined, we performed classical multidimensional scaling (MDS) on the data from Experiment 3 (Shepard, 1980; Edelman, 1998; see also Kriegeskorte et al., 2008). To do so, we first extracted the representational structure within each ROI by constructing a correlation matrix with the neural response patterns among the 16 face sets (with two sets for each of the eight actors) for each of the five ROIs (i.e., superior IPS, LO region, the right FFA, LPFC, and PPC). Using the full 16 sets of faces provided us with a richer dataset from each ROI that included both between and within identity correlations. We then constructed a correlation matrix by calculating the pairwise correlations among the representational structures from the different ROIs and performed MDS based on the resulting correlation matrix. The distance between the ROIs in the resulting 2D MDS plot reflected how similar (or dissimilar) the representational structures of the different ROIs were projected onto a 2D plane.
It was possible that PPC voxels near superior IPS were functionally similar to those of superior IPS. To test this, we performed MDS with two types of PPC ROIs. In the first analysis, we only excluded superior IPS voxels from PPC. In the second analysis, in addition to the voxels from superior IPS, we also excluded the voxels near superior IPS (for more details, see ROI definitions).
To rule out any contribution from a difference in the ROI sizes, we also conducted the same MDS analysis with the top 50 most responsive voxels from each ROI.
Results
Abstract face-identity representation in the human superior IPS, and ventral and lateral occipital regions
In Experiment 1, we used face images of Leonardo DiCaprio and Matt Damon, two well known actors matched in overall appearance. To encourage the formation of real-world abstract face-identity representations, we varied viewpoint, hairstyle, facial expression, and age of the faces, and constructed two unique face sets for each actor. While lying in an MRI scanner, observers viewed the sequential presentation of the images in each face set multiple times and detected an occasional presence of an oddball face drawn from one of eight other male actors (Fig. 1A; responses from the oddballs were excluded from further analyses, see Materials and Methods). We thus relied on participants knowing a face to evoke an abstract identity representation when they viewed different face images associated with the same identity. The formation of abstract face-identity representations for the two actors was therefore necessary to ensure successful task performance.
Experiment 1 stimuli, trial structure, and ROIs. A, Example images from a block of face trials. Face images from two well known actors, Leonardo DiCaprio and Matt Damon, were used. Within a block of trials, observers viewed a sequential presentation of 10 face images sharing the same identity but differed in viewpoint, hairstyle, facial expression, and age. Observers were asked to detect the occasional presence of an oddball face chosen from one of eight other actors. James Dean's face is shown here as the oddball face among Leonardo DiCaprio's faces. B, Example images from a block of name trials. Actor names were written in different fonts, and observers were again asked to detect an oddball. James Dean's name is shown here as the oddball name among Leonardo DiCaprio's names. An oddball occurred rarely, and blocks containing the oddball were removed from further analyses. C, Example ROIs from one representative observer.
We obtained averaged fMRI response patterns in each observer for each face set for each actor in independently defined brain ROIs (Fig. 1C; see Materials and Methods). We targeted our investigation in parietal cortex to superior IPS due to its role in the encoding and storage of simple visual features in VSTM in a task-dependent manner (Todd and Marois, 2004; Xu and Chun, 2006), making it a promising parietal candidate where task-driven abstract face-identity representations may reside. In addition to superior IPS, we also examined representations in the following three lateral and ventral occipital regions: the LO region, the right FFA, and the VWFA. These brain regions have been shown to participate in the processing of object shapes, faces, and letter strings, respectively (Malach et al., 1995; Kanwisher et al., 1997; Cohen et al., 2000). The inclusion of LO region and the right FFA allowed us to examine the existence of abstract face-identity representation in higher-level visual processing regions; and the inclusion of VWFA enabled us to examine the representation of face names coactivated with the presentation of the face images. Here the term “visual” refers to information, representations, and attributes extracted from visual inputs that may not be described in conceptual or semantic terms, and the term “representation” only refers to the presence of certain information in a brain region and does not imply that this region necessarily contributes to the initial formation of this information.
To decode face-identity representations in a brain region, similar to the approach used by Haxby et al. (2001), we correlated fMRI response patterns obtained from different face sets in each ROI and Fisher transformed the resulting correlation coefficients (see Materials and Methods). In superior IPS, two face sets from the same actor elicited a higher correlation than two from two different actors (Fig. 2A, paired-samples t test, two-tailed, t(12) = 2.86, p = 0.014; this applies to all subsequent analyses except where noted). Thus, despite large variations in face appearance, two distinctive face sets sharing an identity were represented more similarly than two that differed in identity, indicating the decoding and representation of abstract face identities in superior IPS. Such representation, however, was not found in LO region, the right FFA, or VWFA (t values <1.13, P values >0.27; Fig. 2B). Further pairwise comparisons revealed that superior IPS differed significantly from the other ROIs in abstract face-identity representation (region-by-identity interaction, F values >8.71, p values <0.012). The differences among the brain regions could not be attributed to voxel number differences, as both superior IPS and LO region contained similar number of voxels (Table 1). When the number of voxels in each ROI was limited to no more than the 50 most responsive ones, the same results were obtained (superior IPS, t(12) = 3.55, p = 0.004; other regions, t values <1, p values >0.438; region-by-identity interaction, F values >9.09, p values <0.011).
Experiment 1 design and results. A, Schematic illustration of the key comparisons made in the experiment. To evaluate the existence of abstract face-identity representation in each ROI, we examined whether the within-identity correlation was greater than the between-identity correlation. The within-identity correlation referred to the correlation of the fMRI voxel response patterns between two face sets (or two name sets) from the same actor, whereas the between-identity correlation referred to the pattern correlation between two face sets (or two name sets), each from a different actor. B, Fisher-transformed correlation coefficients (z) between the face sets in superior IPS, LO region, the right FFA, and VWFA ROIs. Only superior IPS showed a higher within-identity than between-identity correlation, indicating the presence of abstract face-identity representation in this brain region despite large variations in viewpoint, hairstyle, facial expression, and age of the face images used. C, Fisher-transformed correlation coefficients between the name sets in the same brain regions. None of the regions showed a higher within-identity than between-identity correlation, indicating the absence of identity representation in these brain regions when name stimuli were used. Gray and white bars indicate within-identity and between-identity correlations, respectively. Error bars indicate the within-subject SEMs. *p < 0.05.
The face-identity representation in superior IPS reflected an abstract visual code and not a phonological code generated by the rehearsal of the actors' names when the observers were viewing the face images. This is because, although phonological codes would be automatically activated during word reading (Van Orden, 1991), when we showed the written names of each actor in different fonts in the same oddball task (Fig. 1B), no name identity decoding was found in any of the ROIs examined (i.e., there was no difference in correlation between two sets that shared an identity and those that did not; t(12) < 1, p = 0.56 in superior IPS; t values <1, p values >0.84 in LO region and VWFA; and t(12) = −2.06, p = 0.066 in the right FFA in the opposite direction; Fig. 2C). Furthermore, in superior IPS, identity representation was greater for faces than for names (task-by-identity interaction, F(1,12) = 4.06, p = 0.067 for all the voxels; F(1,12) = 7.92, p = 0.016 for the top 50 voxels included in the ROI).
Thus, among the brain regions examined, superior IPS was the only one that showed sensitivity to face identity across large variations in face appearance. To our knowledge, this is the first time that real-world abstract face-identity representations have been decoded in the human parietal cortex.
Abstract object identity representation in the human superior IPS, and ventral and lateral occipital regions
Abstract object identity representations exist not only for faces but also for visual objects in general. To replicate and generalize our findings, in Experiment 2 photographs of two familiar car models (BMW Mini and Volkswagen Beetle) were used (Fig. 3A). Images from these car models were shown in different viewpoints, sizes, and background scenes, similar to the way they would naturally appear in everyday visual perception. As with the faces in Experiment 1, the written names of the cars were also shown in different fonts. Using the same oddball detection task, the existence of abstract car identity representations was examined in superior IPS, LO region, and VWFA. As the cars were shown embedded in background scenes, responses were also examined in the PPA, a brain region that has been shown to be specialized for scene processing (Epstein and Kanwisher, 1998).
Experiment 2 design and results. A, Schematic illustration of the key comparisons made in the experiment. Similar to the faces and the names in Experiment 1, we examined whether the within-identity correlations were greater than the between-identity correlations for car images and car names. B, Fisher-transformed correlation coefficients (z) between the car image sets in superior IPS, LO region, PPA, and VWFA ROIs. Replicating the results for faces in Experiment 1, only superior IPS showed a higher within-identity than between-identity correlation for car images, indicating the presence of abstract car identity representation in this brain region despite large variations in viewpoint, size, and the background scene in which the cars appeared. Note that in both LO and VWFA, the between-identity correlation was greater than the within-identity correlation, possibly reflecting a greater between-set than within-set similarity, which would have worked against the finding of an identity decoding in superior IPS. C, Fisher-transformed correlation coefficients for the car name sets in the same brain regions. As with the face names in Experiment 1, none of the regions showed a higher within-identity than between-identity correlation, indicating the absence of identity representation in these brain regions for the name stimuli. Gray and white bars indicate within-identity and between-identity correlations, respectively. Error bars indicate the within-subject SEMs. *p < 0.05.
Replicating the results for faces in Experiment 1, only superior IPS showed abstract car identity representations (Fig. 3B), with higher correlation in fMRI response patterns between two sets of car images with the same identity than between two sets with different identities (t(12) = 2.26, p = 0.043). Such real-world abstract car identity representations, however, could not be decoded in the other ROIs examined (t(12) < 1, p > 0.62 in PPA; t(12) = −1.96, p = 0.073 in LO region; and t(12) = −2.8 p = 0.016 in VWFA; the latter two were both in the opposite direction and possibly indicating a greater between-set than within-set similarity, which would have worked against the finding of identity decoding in superior IPS). Further pairwise comparisons revealed that superior IPS differed significantly from all the other ROIs in decoding car identity representations (region-by-identity interaction, F values >7.87, p values <0.016). As with the faces in Experiment 1, differences among the brain regions could not be attributed to voxel number differences, as voxel numbers were similar in superior IPS and LO region (Table 1), and the same results were obtained when the number of voxels in each ROI was limited to the top 50 most responsive ones (superior IPS, t(12) = 2.4, p = 0.033; LO and PPA, t values <1, p values >0.426; VWFA, t(12) = −2.5, p = 0.027, but in the opposite direction; region-by-identity interaction, F values >3.83, p values <0.074).
As in Experiment 1, no car name decoding could be found in any of the ROIs examined (Fig. 3C; t values <1, p values >0.13 in superior IPS, LO region, and VWFA; t(12) = −1.96, p = 0.073 in PPA in the opposite direction). Comparison between tasks revealed that identity decoding was greater for the car images than for the car names in superior IPS (task-by-identity interaction, F(1,12) = 9.23, p = 0.01). These results thus replicated those from Experiment 1 and showed that abstract identity representations exist in superior IPS for both faces and nonface objects such as cars.
Multiple abstract face-identity representations in human ventral, parietal, and prefrontal regions
In Experiment 1, the decoding of face identity was examined between two individuals. To generalize our findings beyond the classification of just two specific individuals, in Experiment 3, the same oddball detection task paradigm was applied to the face images from eight famous actors (Fig. 4A). Replicating the results from Experiment 1, when all the pairwise comparisons between the eight actors were averaged, face-identity decoding was again observed in superior IPS (t(10) = 2.58, p = 0.027, Fig. 4B), but not in LO region or the right FFA (t values <1.17, p values >0.266). Moreover, face-identity decoding was stronger in superior IPS than LO region (region-by-identity interaction, F(1,10) = 8.13, p = 0.017) and was marginally stronger in superior IPS than in the right FFA when the number of voxels in a region was limited to no more than the 50 most responsive ones (region-by-identity interaction, F(1,10) = 4.07, p = 0.071). Face identity decoding in superior IPS was not driven by the decoding of the best face pairs, as removing the two best face pairs for each observer did not change the results (t(10) = 2.231, p = 0.049). Similarly, removing the two worst pairs for each observer did not improve decoding in either LO or the right FFA; decoding was still at chance levels in both (t values <1.324, p values >0.215).
Experiment 3 stimuli and results. A, Example images of the eight actors used. B, Fisher-transformed correlation coefficients (z) between the face image sets in superior IPS, LO region, and the right FFA ROIs. Superior IPS again showed a higher within-identity than between-identity correlation, whereas LO region and the right FFA did not. C, An example face visual search display. Observers performed a speeded visual search for the presence of the face of a target actor among the faces of a distractor actor. A target face appeared in 50% of the trials. In the example shown, Leonardo DiCaprio is the target actor and Russell Crowe is the distractor actor. D, The correlation between the behavioral (as measured by visual search speeds) and the neural (as measured by fMRI pattern correlations) similarity measures of face identity in each ROI. This correlation reached significance only in superior IPS, indicating that the face representations formed there closely tracked the behaviroally perceived face identities. Gray and white bars in B indicate within-identity and between-identity correlations, respectively. Error bars in B indicate the within-subject SEMs. Error bars in D indicate the SDs of the baseline correlation value distributions from the permutation tests (see Materials and Methods). *p < 0.05.
In addition to our functionally defined ROIs, the face oddball detection task also activated extensive regions in LPFC, PPC, and VOTC. LPFC, in particular, has previously been shown to be involved in working memory and object categorization tasks (Miller et al., 2003), and thus could potentially hold object identity representations as well. Nevertheless, none of these additional brain regions showed above chance face-identity decoding (t(10) = 1.27, p = 0.23 in LPFC; t(10) = 1.61, p = 0.137 in PPC excluding superior IPS; and t(10) = 1.01, p = 0.336 in VOTC excluding LO region, the right FFA, and early visual areas). While PPC did show a weak trend toward significant decoding of face identity, this was likely due to the inclusion of voxels near superior IPS. When a larger superior IPS ROI (approximately twice the number of voxels as the original ROI) was excluded from PPC, the weak trend seen in PPC became less significant (t(10) = 1.38, p = 0.197). These analyses indicated that the task-activated LPFC and VOTC voxels did not contain robust abstract face-identity representations. Furthermore, not all task-activated voxels in PPC carried face-identity representations, highlighting the unique role of superior IPS in representing object identities.
Behavioral relevance of face-identity representation in the human superior IPS
In addition to identity, two faces may differ in other abstract properties, such as familiarity, attractiveness, trustworthiness, and so on. To test whether or not decoding in superior IPS reflects the existence of face-identity representations and not any of the other abstract properties associated with face perception, in Experiment 3 we also compared the neural measure of face similarity in superior IPS with a behavioral measure of face similarity.
To obtain the behavioral measure of face-identity similarity, we asked the same observers who participated in the fMRI study of Experiment 3 to perform a speeded visual search task outside the scanner. Specifically, the observers searched for a target actor face among the faces of a distractor actor (Fig. 4C). All the possible pairings among the eight actors were used, with one actor serving as the target and the other as the distractor actor or vice versa. As target–distractor similarity has been shown to govern visual search efficiency, such that the greater the similarity between the target and distractors the slower the search time (Duncan and Humphreys, 1989), search speed can serve as a good behavioral measure of face-identity similarity among the different actors.
From all the possible pairings of the eight actor face identities, we constructed a behavioral face-identity similarity matrix. We then constructed a neural face-identity similarity matrix using the fMRI correlation coefficient values from the eight actor faces obtained in Experiment 3 for superior IPS, LO region, and the right FFA, and calculated the correlations between the behavioral and neural measures of face-identity similarity in each region (Kriegeskorte et al., 2008). This analysis revealed a significant correlation between the behavioral and neural face-identity similarity measures in superior IPS (p < 0.013, permutation test), but not in LO region or the right FFA (p = 0.112 and p = 0.317, respectively; Fig. 4D). Moreover, this correlation was greater in the superior IPS than in the right FFA (p = 0.028, permutation test; this correlation did not differ between superior IPS and LO, p = 0.254). These results remained the same when only up to 50 of the most responsive voxels were included in each ROI and when the search data were truncated to remove outliers 3 SDs away from the mean (permutation test, p = 0.028 in superior IPS; p values >0.293 in LO region and the right FFA).
Thus, the perceived face-identity similarity in our speeded visual search task was reflected in the neural response patterns of superior IPS, but not LO region or the right FFA. These results provided unambiguous and strong support that goal-driven face-identity information can be directly represented in the human superior IPS.
Comparing representational structures of parietal, prefrontal, and ventral regions using MDS
Among the ROIs examined, if superior IPS was the only region that contained robust abstract object identity representations, then the representational structure of superior IPS should be distinct from those of the other brain regions. To directly visualize the representational structures of the different ROIs, in Experiment 3 classical MDS (Shepard, 1980; Edelman, 1998; see also Kriegeskorte et al., 2008) was applied to the pairwise neural correlations of face-identity representation similarity among superior IPS, LO region, the right FFA, LPFC, and PPC (excluding superior IPS). The resulting MDS plot showed that the representational structure of superior IPS was quite distinct from those of the other ROIs (Fig. 5A). The distance between superior IPS and PPC further increased when voxels near superior IPS were excluded from PPC (Fig. 5B), suggesting that superior IPS was functionally dissociable from the rest of PPC. As before, the size difference of the ROIs could not explain these results as we obtained virtually the same results when the number of voxels in each ROI was limited to the top 50 most responsive voxels. These results further indicated that the neural representation contained in superior IPS was qualitatively distinct from those in the other brain regions examined in the present study.
Comparing representational structures of brain regions using MDS in Experiment 3. The distance between the different brain regions reflected face-identity representation dissimilarity among these regions. A, MDS results for the five ROIs with superior IPS voxels excluded from PPC. Superior IPS was separated from both the ventral visual regions (LO region and the right FFA) and the frontoparietal regions (LPFC and PPC), suggesting that face-identity representation in superior IPS differed from those in the other brain regions. B, MDS results for the five ROIs with both superior IPS and its nearby voxels excluded from PPC. Superior IPS became more separated from PPC in this analysis, showing distinct face-identity representations in these two brain regions. SIPS, Superior IPS.
The impact of perceptual differences among the sets
As our goal was to present face and car images as they appeared in the real world, minimal image processing was applied. Each of our face sets contained a similar range of variations in viewpoint, hairstyle, facial expression, and age; and each of our car sets contained a similar range of variations in viewpoint, size, and background. However, lower-level perceptual differences, such as luminance and image spatial frequency distribution, were not controlled for among the sets. Below, we present three analyses showing that perceptual differences among the sets could not account for the response patterns seen in superior IPS for abstract object identity representation across all three experiments.
Luminance difference among the sets
In this analysis, we calculated the average luminance for each image in a set and compared whether sets differed in overall luminance. In Experiment 1, for the face images, while sets that shared an identity were different from each other (t values >2.26, p values <0.036, independent samples t test, two tailed), sets that did not share an identity were not significantly different from each other (t values <1.62, p values >0.12), except for one pair (Damon set 2 vs DiCaprio set 2, t(18) = 3.51, p = 0.002). Thus, the luminance difference seemed to be greater within than between face identities, and face images were more similar when they did not share an identity than when they did, working against the identity effect found in superior IPS. For face names, no significant difference in luminance was found between all the possible comparisons (t values <1, p values >0.72). In Experiment 2, for both car images and car names, there was no luminance difference between the sets, regardless of whether or not they shared an identity (t values <1.67, p values >0.11). In Experiment 3, no luminance difference was found between any of the face pairs (for all possible pairwise comparisons, t values <1.25, p values >0.22).
Spatial frequency distribution differences among the sets
In this analysis, we calculated the spatial frequency distribution profile (i.e., the power at each spatial frequency) for each image in a set. We then used a support vector machine, a linear classifier, to classify the images between the different sets based on this information. In Experiments 1 and 2, across all the comparisons made, the following yielded above chance level classification performance (t values >2.44, p values <0.037, one-sample t test, two tailed); for face images, two of the four between-identity comparisons; for face names, three of the four between-identity comparisons; for car images, none; and for car names, three of the four between-identity comparisons. These results showed that the spatial frequency distribution envelope differed somewhat among the sets. Critically, although the name sets were more similar in spatial frequency distribution when they shared an identity than when they did not, this similarity was not reflected in the response patterns of superior IPS for either the face names or the car names (Figs. 2, 3). Additionally, although no difference was found in spatial frequency distribution between the car image sets, the superior IPS response pattern still tracked car identity representation. Thus, differences in the spatial frequency distribution envelope did not seem to contribute to the decoding of identity representation in superior IPS.
In Experiment 3, we performed the same spatial frequency distribution analysis with the face image sets from eight identities. For within-identity comparisons, we compared image set 1 of one actor with image set 2 of the same actor, resulting in a total of eight comparisons. For between-identity comparisons, we compared image set 1 or 2 of one actor with image set 1 or 2 of another actor, resulting in a total of 112 comparisons. We obtained an above chance classification performance in 1 of the 8 within-identity comparisons (t(9) = 3, p = 0.015) and in 41 of the 112 between-identity comparisons (t values >0.244, p values <0.037). Although these analyses seemed to suggest that spatial frequency distribution differed more between sets with different identities than those sharing the same identity, these differences were not registered by sensory regions as both LO region and the right FFA showed similar correlations for both within-identity and between-identity set comparisons. Given that these sensory regions showed sensitivity to other perceptual differences among the image sets (see the analyses below in the section Comparing sets sharing an identity), the insensitivity of these sensory regions to spatial frequency distribution differences between the sets suggested that these differences were unlikely to have contributed to the decoding of face-identity representations in superior IPS.
Together, although there were some spatial frequency distribution differences among the images in the different sets, these differences by themselves could not consistently account for the decoding of object identity representations found in superior IPS across all three experiments.
Comparing sets sharing an identity
One way to evaluate whether or not a brain region is sensitive to perceptual differences among the sets is to compare its response pattern correlation between the same set of images across odd and even runs, with that between two different sets sharing the same identity across odd and even runs. In other words, when set identity was held constant, because unique images were used in each set, if perceptual differences among the images were encoded by a brain region, then the response pattern of that region should be more similar to the same set of images than to different sets of images across odd and even runs, even though they all shared an identity. Across the three experiments, as shown in Figure 6, the following ROIs showed a significantly higher correlation between identical sets than between different sets sharing an identity: for the face images in Experiment 1, none; for the face names in Experiment 1, the right FFA (t(12) = 2.45, p = 0.03); for the car images in Experiment 2, both PPA and VWFA (t values >2.25, p values <0.044); for the car names in Experiment 2, none; and for the face images in Experiment 3, both LO region and the right FFA (t values >2.61, p values <0.026).
The effect of perceptual/image differences on neural response patterns across sets sharing an identity in the three experiments. A, B, Results for face images and face names in Experiment 1. C, D, Results for car images and car names in Experiment 2. E, Results for face images in Experiment 3. While holding the identity constant, the correlation between odd and even runs of the same set was compared with the correlation between two different sets sharing an identity across odd and even runs. A higher within-set than between-set correlation indicates the encoding of the perceptual or image differences between the sets in a brain region. Depending on the experiment and the stimuli used, different ventral object-processing regions showed different amounts of sensitivity to the perceptual/image differences between the sets. Importantly, when identity was held constant, superior IPS never differentiated between sets of images that were identical or different, providing further support that the perceptual/image differences among the sets did not modulate the response pattern in this brain region. Error bars indicate the within-subject SEMs. *p < 0.05.
Thus, depending on the stimulus used, different ventral object-processing regions showed different amounts of sensitivity to perceptual/image differences between the sets. Importantly, when identity was held constant, superior IPS never differentiated between sets of images that were identical and those that were different. This provided further support that perceptual differences among the sets did not modulate response patterns in superior IPS.
Discussion
In the present study, by using photographs of famous actor faces and car models, we showed in three experiments that real-world face and car identity information invariant to large changes in appearance could be robustly represented in the human superior IPS, a region previously shown to dynamically track the capacity of VSTM. The same results were obtained whether two or eight object identities were compared against each other. These results could not be attributed to observers rehearsing the names of the faces or cars as the same results could not be obtained when face or car names, instead of images, were shown. Our results demonstrated the existence of abstract object identity representations in the human parietal cortex along the IPS. Critically, using representation similarity measures, we further showed that face-identity representations formed in superior IPS reflected the behaviorally perceived similarity among the face identities. Meanwhile, neither the task-activated regions in prefrontal and parietal cortices (excluding superior IPS) nor the ventral and lateral visual object-processing regions examined exhibited such representations. Our MDS results further confirmed that the representations contained in superior IPS were distinct from those of the other regions examined.
Monkey LIP neurons exhibit categorical responses to motion or object categories after training (Freedman and Assad, 2006; Fitzgerald et al., 2011). A similar process is likely at work here in the human parietal cortex. Through learning and experience, an observer may associate different images of the same face as containing the same identity. Alternatively, an observer may mentally rotate different images of a face into the same template. However, mental rotation cannot be a general mechanism mediating category responses in parietal cortex, as it cannot explain responses to shape categories containing arbitrary shapes, as reported by Fitzgerald et al. (2011).
Although names could evoke identity representations, in Experiments 1 and 2, we failed to observe successful decoding in superior IPS with names. This was likely because our name task could be performed by activating just the orthography and phonology associated with the names. This was an intended manipulation and allowed names, but not identity, to be fully activated. The failure of name decoding indicated that image decoding in superior IPS reflected the representation of abstract object identity and not the names associated with, and possibly coactivated with, the identities.
Although the decoding of low-level visual features, such as color, shape, orientation, and motion, has previously been reported in the human parietal cortex (Liu et al., 2011; Christophel et al., 2012; Bettencourt and Xu, 2015; Xu and Jeong, 2015), the abstract object identity decoding reported here is unlikely to have been driven by the decoding of such features. This is because, for faces, by varying the viewpoint, hairstyle, age, and expression of a face, we varied the low-level features of a face and made them unreliable in linking different face images to a unique identity. The same applies to the car images used. Additionally, as participants would not know a priori which of the oddball objects would be shown (some of which shared similar low-level features), a strategy based on low-level features would be ineffective. Meanwhile, some low-level perceptual differences that did exist between the image sets could not account for the decoding results. If common low-level visual features indeed contributed to face-identity decoding, then we would expect to find identity decoding in perceptual regions such as the right FFA and LO, but none was found. Moreover, we would expect to find greater within-identity–within-set correlations than within-identity–between-set correlations across odd and even runs of the trials (since the former was the correlation between exactly the same set of images). While some perceptual regions examined indeed showed this result, superior IPS never did (Fig. 6). Thus, what drove decoding in superior IPS could not have been low-level feature differences, but rather differences in abstract object identity.
Previous imaging studies have reported the decoding of face-identity representation in regions surrounding the right FFA and in ATL (Kriegeskorte et al., 2007; Nestor et al., 2011; Gratton et al., 2013; Anzellotti et al., 2014). The face stimuli used in these studies, however, tended to covary with low-level features. For example, Nestor et al. (2011) used faces from four male Caucasian individuals, each shown with four different expressions but appearing in the same exact view. Thus view-specific low-level feature differences among the faces could contribute to identity decoding in the fusiform area. Gratton et al. (2013) morphed between two Caucasian front view faces that differed in age, identity, and gender. Again, information other than identity could contribute to the decoding success. Anzellotti et al. (2014) used faces from five male Caucasian individuals, each appearing in five different views (including two pairs of mirror image views). Although this represented a significant improvement over the other studies, the face image variations included were still limited compared with those of real-world faces. Additionally, success in a simple binary decoding between two faces, as was done previously, could also be driven by high-level features unrelated to face-identity representation, such as familiarity, attractiveness, and trustworthiness. Importantly, none of these previous studies directly correlated neural and behaviorally perceived face-identity representations. As such, previous decoding studies have not firmly established the presence and functional significance of abstract face-identity representations in ventral regions.
To overcome these problems, in the present study we examined the representations of abstract face identities extracted from real-world face images varying freely in viewpoint, expression, hairstyle, and age. Critically, we also tested whether or not neural face representations track behaviorally perceived face-identity similarities. With these more stringent measures, while we found significant face-identity representations in superior IPS and a sensitivity of the ventral region to perceptual features, we failed to find face-identity representations in the fusiform region. Anzellotti et al. (2014) and Nestor et al. (2011) both used a sophisticated feature-selection decoding algorithm to reveal face-identity representation in the fusiform area. Although the application of such an algorithm may improve decoding, given that our simple correlation procedure was capable of revealing strong face-identity representations in superior IPS but not in the right FFA, it demonstrates the robustness of these representations in the human parietal cortex. Overall, our results show that face-identity representation in the fusiform region may not be as robust as those in superior IPS under the current task context, and may be driven more by the perceptual features associated with the face images than abstract face identities. Future studies are needed to thoroughly evaluate the role of the fusiform region in face-identity representation.
Our MRI slice coverage in the present study precluded us from evaluating the role of ATL in face-identity representation, another ventral brain region implicated by previous face-decoding studies (Kriegeskorte et al., 2007; Nestor et al., 2011; Gratton et al., 2013; Anzellotti et al., 2014). Given that others have reported abstract object representations in ATL (Peelen and Caramazza, 2012), it is likely that real-world object identities are initially computed in more anterior ventral regions such as ATL, and are uploaded into superior IPS when they become task relevant. Superior IPS thus may not be involved in the direct computation of abstract object identities, but rather may be a downstream region that uses such representations in a goal-directed manner.
Because the human parietal cortex also participates in attention-related processing (Culham and Kanwisher, 2001; Corbetta and Shulman, 2002), it has been unclear whether VSTM activities in superior IPS reflected the representation of visual information or simply attention-related processing with no representational content. Here, using fMRI MVPA, we showed that robust, abstract, and behaviorally relevant visual representations could be directly decoded from superior IPS, indicating a direct involvement of this brain region in the on-line representation of task-driven visual information (Xu and Jeong, 2015; Bettencourt and Xu, 2016).
Our results have significant implications for the neural mechanisms supporting object representation and visual attention. First, by showing that a region in the dorsal pathway is competent in holding abstract and complex object identity information, our results provide the strongest evidence showing that the ability to represent visual objects is not the exclusive privilege of the ventral visual pathway. Second, by showing the existence of object representation in human parietal cortex and its relevance to behavior, our results argue against a “content-poor” view of the role of the parietal cortex in attention. Rather, in addition to its role in directing attentional resources and assigning priorities to representations formed in posterior regions (Colby and Goldberg, 1999; Corbetta and Shulman, 2002; Gottlieb, 2007), the primate parietal cortex is “content rich” and capable of directly representing a great variety of task-relevant visual stimuli ranging from simple features such as color and shape (Bettencourt and Xu, 2015; Xu and Jeong, 2015) to complex ones such as abstract object identity reported here. This could constitute an alternative yet equally adaptive and efficient way by which parietal cortex can guide attention and support the moment-to-moment goal-directed visual information processing in the brain.
Notes
Supplemental material for this article is available at http://visionlab.harvard.edu/Members/Yaoda/Supplementary_Information. This material contains all the stimuli used in the study. This material has not been peer reviewed.
Footnotes
This research was supported by National Institutes of Health Grant 1R01-EY-022355 to Y.X. We thank members of the Harvard Vision Laboratory for their support on this research, Katherine Bettencourt for her comments on an earlier draft of this article, and Sarah Cohan for proofreading the final draft of the article.
The authors declare no competing financial interests.
- Correspondence should be addressed to Yaoda Xu, 33 Kirkland Street, Room 780, Cambridge, MA 02138. yaodaxu{at}fas.harvard.edu