Abstract
Humans' ability to recognize objects is remarkably robust across a variety of views unless faces are presented upside-down. Whether this face inversion effect (FIE) results from qualitative (distinct mechanisms) or quantitative processing differences (a matter of degree within common mechanisms) between upright and inverted faces has been intensely debated. Studies have focused on preferential responses to faces in face-specific brain areas, although face recognition also involves nonpreferential responses in non–face-specific brain areas. By using dynamic causal modeling with Bayesian model selection, here we show that dissociable cortical pathways are responsible for qualitative and quantitative mechanisms in the FIE in the distributed network for face recognition. When faces were upright, the early visual cortex (VC) and occipital and fusiform face areas (OFA, FFA) suppressed couplings to the lateral occipital cortex (LO), a primary locus of object processing. In contrast, they did not inhibit the LO when faces were inverted but increased couplings to the intraparietal sulcus, which has been associated with visual working memory. Furthermore, we found that upright and inverted face processing together involved the face network consisting of the VC, OFA, FFA, and inferior frontal gyrus. Specifically, modulatory connectivity within the common pathways (VC-OFA), implicated in the parts-based processing of faces, strongly correlated with behavioral FIE performance. The orientation-dependent dynamic reorganization of effective connectivity indicates that the FIE is mediated by both qualitative and quantitative differences in upright and inverted face processing, helping to resolve a central debate over the mechanisms of the FIE.
- dynamic causal modeling
- face inversion effect
- fMRI
- inhibition
- object recognition
- visual working memory
Introduction
Inverted faces are not recognized and memorized as effectively as upright faces, even though they are identical visual objects (Yin, 1969). The face inversion effect (FIE) has been assumed to result from qualitative differences in shape processing; upright and inverted faces are processed by distinct mechanisms (e.g., holistic vs piecemeal processing) (Tanaka and Farah, 1993). However, the mechanisms underlying the FIE remain controversial. Recent behavioral and computational evidence has demonstrated that upright and inverted faces are processed by common mechanisms and that quantitative differences in these mechanisms result in the FIE (Sekuler et al., 2004; Jiang et al., 2006; Gold et al., 2012).
Face processing involves a distributed cortical network, the so-called face network, which includes the occipital face area (OFA), fusiform face area (FFA), superior temporal sulcus (STS), amygdala (AMG), and inferior frontal gyrus (IFG) (Haxby et al., 2000; Fairhall and Ishai, 2007). Although neuroimaging studies have reported that the FFA is a key neural locus for face identification (Kanwisher et al., 1997), its involvement in the FIE is unclear. One study found a greater response to upright faces than to inverted faces (Yovel and Kanwisher, 2005); however, most other studies found little or no change in the magnitude of activity in the FFA (Aguirre et al., 1999; Haxby et al., 1999; Epstein et al., 2006). This discrepancy raises the possibility that upright and inverted faces are processed similarly in the FFA, and that the FIE is not sufficiently explained by FFA activation alone, but is also mediated by other areas (Steeves et al., 2006). Regardless, these activation studies do not address the fundamental question of whether differences in processing upright and inverted faces are qualitative or quantitative. Hence, rather than focusing only on regional activations, it is crucial to characterize the functional architecture of the multiple regions involved in face processing tasks, and evaluate effective connectivity in the networks that mediate the FIE.
Apart from the face network, object-selective areas, such as the lateral occipital cortex (LO), show greater responses to inverted faces than to upright faces, which is thought to reflect the recruitment of object processing systems during inverted face recognition (Haxby et al., 1999; Epstein et al., 2006). In addition, activation during face processing tasks has been found in the intraparietal sulcus (IPS), which is attributed to the involvement of visual working memory (VWM) and/or attention in these tasks (Davies-Thompson and Andrews, 2012). The mechanisms responsible for these activations remains unclear, but these non–face-specific areas likely mediate the FIE in a manner that is different from face-specific areas.
Here, we examined the hypothesis that the FIE is mediated by both qualitative and quantitative connectivity differences in networks consisting of face-specific and non–face-specific areas. Specifically, the activations of non–face-specific areas during face processing tasks led us to hypothesize that qualitative mechanisms involve interactions between the face and nonface networks, such that upright or inverted faces recruit and/or affect non–face-specific processing. Furthermore, given that face-specific areas show similar responses to upright and inverted faces, we hypothesized that quantitative mechanisms involve fine-tuned regulations of effective connectivity within the face network. Using dynamic causal modeling (DCM) with Bayesian model selection (BMS), a biophysically validated neuronal modeling that identifies models with the strongest evidence (Friston et al., 2003; Penny et al., 2004), we captured how processing mechanisms differ as a function of face orientation.
Materials and Methods
Participants.
Twenty-two young adults participated in this study. All had normal or corrected-to-normal visual acuity and normal color vision. All participants received information on fMRI and reported no history of psychiatric or neurological disorders. Each participant gave written informed consent after being apprised of the procedure, which had been approved by the Committee of Ethics of the National Institute for Physiological Sciences, Japan. Data from two participants, one with excessive head motion during the scan and another who failed to perform the task correctly, were excluded from the analysis. Data from the remaining 20 participants (10 females, 10 males; mean age 27.7 years, range 20–39 years) were analyzed.
Stimuli.
Stimuli consisted of grayscale images of upright, inverted, and 9 × 7 grid-scrambled faces (Fig. 1A), provided by the Max Planck Institute for Biological Cybernetics (Tuebingen, Germany) (Troje and Bülthoff, 1996). Images subtended ∼4 × 5° of the visual angle, and the mean brightness was normalized across images. The sex of the face shown was kept constant within each trial. Each stimulus was used only twice as a sample stimulus across runs (once appeared during the first three runs, then again during the final three runs), which minimized the influence of long-term memory on face recognition.
Design and procedure.
For each trial, a central fixation point was displayed from 0 to 1500 ms, followed by sample displays containing a sequence of one, two, three, or four faces that appeared for 500 ms at a time, followed by a 1200 ms delay interval, and then presentation of a probe face for 2000 ms (Fig. 1B). The sum of the duration of the fixation and sample displays was constant (2000 ms). A probe face matched one of the sample faces for half of the trials and did not match for the other half. Stimulus types (upright, inverted, and scrambled) were identical between the sample and test displays. Participants were required to indicate whether a probe image matched any images presented during the sample displays. We controlled overall task difficulty by varying (rather than fixing) the number of sample faces across trials to definitely detect the behavioral FIE while preventing participants from ignoring scrambled face images. A smaller number (one or two) of upright and inverted faces are so easy to remember that the behavioral FIE is often hard to detect because of a ceiling effect, whereas larger numbers (three or four) of scrambled faces are difficult to remember as it may encourage participants to abandon encoding scrambled faces for upright and inverted faces. The across-trial fluctuations in the number of sample faces allowed us to capture the FIE while encouraging participants to encode all stimulus types. If a probe image matched any sample image, half of the participants were required to press a button with their left thumb; if a probe image did not match any sample image, they pressed a button with their right thumb. Button mapping was reversed for the other half of participants. Participants completed six functional runs, each including 16 trials per face image condition.
Trial order and intertrial intervals (from 2300 to 6800 ms) were optimized in terms of the efficiency of the design matrix (Dale and Buckner, 1997; Friston et al., 1999). We maximized the efficiency of activation detection for the three face conditions and their differential effects (upright vs inverted, upright vs scrambled, inverted vs scrambled).
MRI acquisition.
A Siemens Allegra 3T scanner (Siemens), equipped with a single-channel head coil, was used to measure blood oxygenation level-dependent cortical activity. Functional images were taken with a gradient-echo EPI pulse sequence [repetition time (TR) = 1.5 s, echo time (TE) = 30 ms; flip angle = 70°]. Twenty-six 5-mm-thick oblique slices (3 mm × 3 mm in-plane resolution) were acquired for 289 volumes in each run (1734 volumes per participant for six runs). Following the acquisition of functional images, anatomical 3D T1-weighted images (Magnetization-Prepared Rapid-Acquisition Gradient-Echo sequence, TR = 2.5 s; TE = 4.38 ms; flip angle = 8°; field of view = 230 mm; matrix size 256 × 256; slice thickness = 1 mm; total 192 transaxial images) were collected.
Preprocessing.
Image data were analyzed with SPM8 (DCM8) software (Wellcome Department of Imaging Neuroscience, London) implemented in MATLAB (MathWorks). Preprocessing of functional images consisted of slice acquisition time correction, 3D head motion correction (realignment), spatial normalization to the EPI template defined by the MNI, and spatial smoothing (3D 8 mm full-width at half maximum Gaussian kernel).
Statistical parametric mapping (SPM) and ROI selection.
SPM analysis was performed as follows: individual task-related activation was evaluated and task-unrelated activation was modeled as nuisance covariates; data from each individual, obtained from first level analysis, were incorporated into a group-level analysis using a random-effects model.
The design matrix for the first-level analysis consisted of three regressors, each representing the upright, inverted, and scrambled face conditions, with sample display duration convolved with a canonical hemodynamic response function. Left hand response, right hand response, and 6 parameters of head movements were included in the design matrix as effects of no interest. The time series were high-pass filtered (cutoff 128 s) and adjusted for serial correlations by using a first-order auto regressive model. The signal was grand mean scaled by setting the whole-brain mean value to 100 arbitrary units. A random-effects analysis (RFX) on the three face conditions was performed to make inferences at the population level. Statistical threshold was set at p < 0.001 (peak level) with a cluster threshold of 10 voxels. This relatively liberal threshold was used to include ROIs with lesser significance, such as the AMG and STS (Ishai et al., 2005).
We aimed to define eight ROIs for DCM, each implicated in face processing, visual object recognition, and VWM/attention, from the group-level SPM analysis (Friston and Henson, 2006; Friston et al., 2006). ROIs included the following: the early visual cortex (VC), for early visual processing; the LO, for visual object processing (Grill-Spector et al., 1998; Larsson and Heeger, 2006); the IPS, for VWM and/or attention (Culham and Kanwisher, 2001; Todd and Marois, 2004; Xu and Chun, 2006; Matsuyoshi et al., 2012); and five areas within the face network (Haxby et al., 2000; Ishai et al., 2005; Fairhall and Ishai, 2007). Face-related ROIs were as follows: the OFA, which is involved in featural (parts-based) face processing (Rotshtein et al., 2005; Pitcher et al., 2011); the FFA, which is involved in the identification of individual faces (Kanwisher et al., 1997; Kanwisher and Yovel, 2006; Schiltz and Rossion, 2006); the STS, which processes gaze and social aspects, such as facial expression (Calvert et al., 1997; Hoffman and Haxby, 2000; Hooker et al., 2003); the AMG, where emotional facial expressions are processed (Breiter et al., 1996; Vuilleumier et al., 2004); and the IFG, which processes social and semantic aspects of faces, such as familiarity (Leveroni et al., 2000; Hadjikhani et al., 2007).
These eight ROIs were defined from contrasts between the three face conditions, using references to these contrasts which had previously reported activation of these areas. Areas specific to upright face processing (the AMG and STS) should be derived from an upright face > inverted face and/or upright face > scrambled face contrast (Epstein et al., 2006), and the area specific to inverted face processing (the LO) should be derived from an inverted face > upright face and/or inverted face > scrambled face contrast (Yovel and Kanwisher, 2005; Epstein et al., 2006). In addition, conjunction analyses using a conjunction-null hypothesis (Nichols et al., 2005) were performed to identify areas (the IPS and VC) involved in the delayed recognition task (by extracting common areas of activation across the upright, inverted, and scrambled face conditions) (Todd and Marois, 2004; Davies-Thompson and Andrews, 2012), and areas (the OFA, FFA, IFG) involved in the processing of faces (by extracting common areas of activation across the upright face > scrambled face and inverted face > scrambled face contrasts) (Kanwisher et al., 1997; Yovel and Kanwisher, 2005; Bookheimer et al., 2008). ROIs were only selected from the right hemisphere, as previous studies have consistently shown that responses to faces are stronger on this side (Sergent et al., 1992; Kanwisher et al., 1997; Haxby et al., 1999; Ishai et al., 2005) and that the right hemisphere plays a critical role in the perception of faces such that developmentally early input to it is necessary to develop face processing expertise (Le Grand et al., 2003). The coordinates of these ROIs were based on local maxima in the group-level analysis of SPM. The first eigenvariate extracted from a 4 mm sphere, centered on the coordinates and adjusted for effects of interests, was used as the ROI time series data for DCM.
DCM.
We used DCM, a biophysically validated model which includes transformation of neuronal activity into hemodynamic responses (Friston et al., 2003), to investigate alterations in effective connectivity induced by upright and inverted faces. DCM is a dynamic input-state-output model, which evaluates effective connectivity between brain regions in terms of hidden neuronal dynamics. It models causal interactions at the neuronal level by combining: (1) intrinsic coupling between brain regions; (2) context-dependent modulatory coupling by experimental manipulations; and (3) inputs that drive the network.
We created a new design matrix, which was different from that of the SPM, for DCM. The first regressor, the main effect of faces, included the upright, inverted, and scrambled face conditions as the driving input to the VC. We assumed that faces directly activated the VC, which then propagated this information to other connected areas. The second and third regressors, representing the bilinear modulatory effects of face orientation, included the upright face and inverted face conditions, respectively. Each regressor convolved boxcar functions representing sample display duration with a canonical hemodynamic response function. In addition to the effects of interest, the effects of no interest (left hand response, right hand response, and six parameters of head movements) were included in the design matrix, as in the SPM analysis.
Intrinsic connectivity was defined as follows (Fig. 2A): information received by the VC is directly forwarded to all ROIs except the IFG (Fairhall and Ishai, 2007; Rossion, 2008; Dima et al., 2011; Greenberg et al., 2012). Reciprocal connections were assumed within the face network (OFA, FFA, STS, AMG, and IFG) based on the models proposed by Haxby et al. (2000) and Ishai (2008) (see also Kim et al., 2006; Fairhall and Ishai, 2007; Rossion, 2008; Herrington et al., 2011), between the OFA/FFA and LO (Haxby et al., 2000; Ishai et al., 2005; Saygin et al., 2011; Nagy et al., 2012; Yeatman et al., 2013, 2014), between the OFA/FFA and IPS (Uddin et al., 2010; Davies-Thompson and Andrews, 2012; Yeatman et al., 2013, 2014), and between the LO and IPS (Konen and Kastner, 2008; Uddin et al., 2010; Bray et al., 2013; Yeatman et al., 2013, 2014).
BMS and effective connectivity.
BMS allowed us to determine qualitative connectivity differences in the FIE by comparing a set of competing DCMs (Penny et al., 2004; Stephan et al., 2009); then, we examined correlation between effective connectivity parameters of the best model and behavioral performance to confirm quantitative connectivity differences in the FIE.
To examine qualitative differences in cortical pathways responsible for processing upright and inverted faces, we set three families of bilinear modulatory connectivity (VC to LO, OFA/FFA to LO, and OFA/FFA to IPS) in three face conditions (upright face, inverted face, or both upright and inverted face) and tested the resulting 27 (33) combinations/models for BMS (Fig. 2B) (Penny et al., 2004; Friston et al., 2007; Stephan et al., 2009). If processing differences between upright and inverted faces are qualitative, effective connectivity of the best model should be modulated by either upright or inverted faces, but not by both upright and inverted faces. Given that the OFA and FFA are reported to show similar response profiles to upright and inverted faces (Haxby et al., 1999), and be closely intertwined (Rossion, 2008), we grouped them together for BMS to avoid a combinatorial explosion in the number of model comparisons. In addition, we did not test the modulatory connectivity from the VC to IPS in BMS because we aimed to examine how face representations, not low-level visual representations, were translated into VWM representations.
We specifically tested which face condition/s resulted in the frequently reported activation of the LO by investigating modulatory connectivity (1) from the VC to LO, and (2) from the OFA/FFA to the LO. This was to determine whether connectivity to the LO increased during inverted face processing and/or decreased during upright face processing, and examine the level of interaction (i.e., cortical route). The LO has consistently been shown to be active during inverted face perception (Aguirre et al., 1999; Haxby et al., 1999; Yovel and Kanwisher, 2005; Epstein et al., 2006). Although this consistency is likely to reflect qualitative processing differences between upright and inverted faces, the contribution of object processing to the FIE still remains unclear. If connectivity to the LO increases under the inverted face condition, this may indicate that inverted faces are actively processed like objects in the LO (Haxby et al., 1999). By contrast, if connectivity to the LO decreases under the upright face condition, this may indicate that accurate upright face recognition is mediated by suppression of object processing that is irrelevant to face processing. Thus, we examined whether modulatory connectivity to the LO differs between the face conditions. Modulatory connectivity from the VC to LO likely reflects bottom-up object processing, and connectivity from the OFA/FFA to the LO indicates an interaction between face and object processing.
In addition, we examined differences in processing load between upright and inverted faces by examining modulatory couplings from the OFA/FFA to the IPS. The IPS has been known to reflect VWM and/or attentional load (Culham and Kanwisher, 2001; Todd and Marois, 2004). However, whether upright faces require less processing and/or inverted faces lead to a greater processing load is largely unknown. The holistic processing view typically posits that, if upright faces are processed holistically and represented compactly (Curby and Gauthier, 2007), the visual information load of face representation sent from the OFA/FFA to the IPS should decrease when faces are upright. On the other hand, if inverted faces are processed inefficiently (Gold et al., 2012), the information load sent from the OFA/FFA to the IPS should increase when faces are inverted because their representations are redundant.
Because the behavioral FIE has consistently been shown to be robust in the normal population (Maurer et al., 2002), we assumed that the FIE is hard-wired in the human brain such that inversion leads to qualitative changes in neural face processing. Thus, we performed BMS analysis using fixed-effects analysis (FFX), assuming that the optimal model constituted of distinct pathways between upright and inverted face processing is the same across participants. However, because FFX can be affected by outliers (Stephan et al., 2009), we also performed BMS using RFX to address this concern (see also Rowe et al., 2010; Ewbank et al., 2011). Other than the effective connectivity tested by the BMS, connections (1) from the VC to all areas except the LO and IFG, (2) within the face network (the OFA, FFA, STS, AMG, and IFG), and (3) between the LO and IPS were conservatively assumed to be modulated by both upright and inverted face conditions for all models, based on previous findings (Haxby et al., 1999; Epstein et al., 2006), whereas those from the LO or IPS to the OFA or FFA were not assumed (Fig. 2B). This is because projections of regions for object or VWM processing to face processing regions were not thought to be modulated by upright or inverted faces.
Because of the large number of models included in the analysis, we also compared the three conditions (upright, inverted, and both upright and inverted) within each connectivity family (VC to LO, OFA/FFA to LO, and OFA/FFA to IPS) of models to confirm whether the best model (i.e., the model with the strongest evidence) was also appropriate at the family level (Penny et al., 2010). If upright and inverted face processing is mediated by qualitatively different pathways, the best model should include modulatory connections that are significant under upright or inverted face condition (distinct pathways), rather than under both conditions (common pathways).
After a winning model was determined using BMS, the connectivity parameters of the best model were evaluated to examine quantitative differences in cortical pathways that process upright and inverted faces. To examine the quantitative relationship between behavior and connectivity, we performed a correlation analysis across participants between the behavioral FIE performance (accuracy for upright − inverted) and the modulatory connectivity FIE (the modulatory connectivity parameter for upright − inverted). If upright and inverted face processing is mediated by quantitatively different pathways, the modulatory connectivity FIE should correlate with the behavioral FIE performance.
Results
Behavior
A repeated-measures ANOVA of accuracy revealed the main effects of the number of sample faces (F(3,57) = 61.626, p < 0.0001, ηp2 = 0.764) and face condition (F(2,38) = 17.639, p < 0.0001, ηp2 = 0.481), but no interaction between these two factors (F(6,114) < 1). Consistent with previous studies (Yin, 1969; Curby and Gauthier, 2007), there was a strong behavioral FIE (Fig. 1C): overall accuracy was significantly higher in the upright (79.7%) compared with the inverted (71.8%) and scrambled face conditions (68.9%) (t(19) = 5.804, p < 0.0001, d = 1.298; and t(19) = 5.053, p < 0.0001, d = 1.130, respectively). No significant difference was observed between the inverted and scrambled face conditions (t(19) = 1.425, not significant). It is not surprising that there was no significant behavioral difference between the inverted and scrambled face conditions because images were changed in an all-or-none manner (almost all pixels changed or exactly identical) between the sample and test displays in the present study (Olsson and Poom, 2005; Awh et al., 2007).
SPM and ROIs for DCM
SPM successfully revealed regions implicated in face processing, visual object processing, and VWM/attention during the face recognition task, and eight ROIs were defined from the contrasts between the three face conditions (Tables 1 and 2; Fig. 3). A conjunction analysis using a conjunction-null hypothesis (Nichols et al., 2005) was performed to identify common areas of activation (Fig. 3A) across the upright face > scrambled face contrast (Fig. 3B) and inverted face > scrambled face contrast (Fig. 3C). This analysis yielded face-related OFA, FFA, and IFG ROIs. Other face-related (STS and AMG) and object-related (LO) ROIs were defined from contrasts between the upright, inverted, and scrambled face conditions. The STS ROI was defined from a contrast between the upright face > scrambled face conditions (Fig. 3B). The facial expression-related AMG ROI was defined from a contrast between the upright face > inverted face conditions (Fig. 3D). The object-related LO ROI was defined from a contrast between the inverted face > upright face conditions (Fig. 3E). A conjunction analysis of the upright, inverted, and scrambled face conditions was performed to identify regions involved in the face recognition task in general. This analysis defined the VC, which was assumed to receive visual inputs and propagate these inputs to other connected regions, and the VWM- and/or attention-related IPS ROI (Fig. 3F).
Figure 4 shows mean BOLD responses in the ROIs as a function of face orientation. Although the IFG, FFA, OFA, STS, and VC showed comparable responses to upright and inverted faces, the AMG showed a stronger response to upright faces than to inverted faces (uncorrected peak level p < 0.001), and the LO and IPS showed stronger responses to inverted faces than to upright faces (family-wise error-corrected peak level p < 0.05, and uncorrected peak level p < 0.001, respectively).
Qualitative connectivity differences: DCM and BMS
We found that model 23 outperformed all other models both in posterior and exceedance probabilities (Fig. 5). The winning model consisted of modulatory connections from the VC to LO under the upright face condition, from the OFA/FFA to LO under the upright face condition, and from the OFA/FFA to IPS under the inverted face condition. The same model won BMS at subject-specific posterior probabilities as well (model 23 outperformed the others in 18 of 20 participants). In addition, the appropriateness of this model was confirmed by the family-level comparison (Penny et al., 2010) (Fig. 6) of three face conditions (upright face, inverted face, or both upright and inverted face) within three connectivity families (modulatory connections from the VC to LO, from the OFA/FFA to LO, and from OFA/FFA to IPS). Thus, BMS confirmed that the best model included connections that were modulated exclusively by either upright or inverted faces. However, although family-level BMS showed very strong evidence for connectivity from the VC to LO under the upright face condition in both FFX and RFX, it showed moderate evidence in RFX for connectivity from the OFA/FFA to the LO under the upright face condition. This may indicate that influences over the LO are primarily mediated by bottom-up modulatory connectivity.
Effective connectivity of the winning model
Analysis of the effective connectivity parameters of the winning model (model 23) showed that the LO and IPS are intrinsically connected with the OFA and FFA (Fig. 7A). Modulatory connectivity between the upright and inverted face conditions within the face network (OFA, FFA, STS, AMG, and IFG) were almost completely common (Fig. 7B,C), and differences were not significant (Fig. 7D). Connections to the STS and AMG were significantly modulated under the upright face (p values <0.05), but not inverted, face condition; however, differences between the upright and inverted face conditions were not significant. In addition, although modulatory connectivity from the STS to IFG was significant under the inverted face condition (p < 0.05), no significant difference was observed between the upright and inverted face conditions.
Significant differences between the upright and inverted face conditions were found in the modulatory connections between face and nonface processing areas (i.e., the OFA/FFA and the LO, and the OFA/FFA and the IPS; Fig. 7D). Connectivity to the LO from the OFA, FFA, IPS, and VC was significantly decreased under the upright face condition (p values <0.01; Fig. 7B). In addition, although evidence for connectivity from the OFA/FFA to IPS in the inverted face condition was moderate in RFX BMS, connectivities to the IPS from the OFA, FFA, LO, and VC were significantly increased in the inverted face condition (p values <0.01; Fig. 7C).
Quantitative connectivity differences: correlation between effective connectivity and behavior
To examine whether differences in cortical pathways that process upright and inverted faces are quantitative, we performed correlation analysis between the modulatory connectivity FIE (the connectivity parameter for upright − inverted) and the behavioral FIE (accuracy for upright − inverted). This analysis showed that there was a strong negative correlation between the modulatory connectivity FIE from the VC to OFA and the behavioral FIE (r = −0.692, p < 0.001; Fig. 8). This correlation was significant even after Bonferroni correction for multiple comparisons (p = 0.023, corrected for all 32 modulatory connections). The modulatory connectivity from the VC to OFA was significant under both upright and inverted face conditions (i.e., common between upright and inverted face processing; Fig. 7B,C). No significant correlations were found in the other modulatory connections (all uncorrected p values >0.05).
Discussion
The present study indicated that the cortical pathways responsible for qualitative and quantitative mechanisms in the FIE are dissociable within the distributed cortical network for face recognition. Instead of supporting the hypothesis that the FIE reflects either qualitative or quantitative processing differences, our results suggest that both forms of differences exist. That is, there are qualitative differences in cortical pathways for upright and inverted face processing between the face and nonface networks, and there are quantitative differences in cortical pathways for upright and inverted face processing within the face network (for schematic illustrations, see Fig. 9).
To investigate the mechanisms underlying the FIE, we identified the distributed network that included regions that are common (OFA, FFA, IPS, and IFG) and specific (STS and AMG for upright faces, and LO for inverted faces) to upright and inverted face processing. We then examined the effective connectivity between the regions using DCM with BMS (Friston et al., 2003; Stephan et al., 2009). We found distinct pathways for upright and inverted face processing, indicating that the FIE is mediated by qualitative differences in effective connectivity. Upright face recognition induced isolated activation of the OFA/FFA with concurrent lateral inhibition of the LO, which is involved in object processing (Malach et al., 1995; Grill-Spector, 2003). Inverted face recognition failed to inhibit the LO and increased couplings to the IPS, which has been attributed to VWM (Todd and Marois, 2004; Xu and Chun, 2006; Matsuyoshi et al., 2012). Furthermore, we found that upright and inverted faces were also processed by common cortical pathways in a quantitatively different manner. DCM showed that upright and inverted face processing involve almost completely common effective connectivity within the face network (Haxby et al., 2000; Fairhall and Ishai, 2007) and that the degree of effective connectivity within the common pathways (from the VC to OFA) predicted individual differences in behavioral FIE. Thus, both qualitative connectivity differences in distinct pathways and quantitative connectivity differences in common pathways mediate the FIE.
Qualitative versus quantitative views of the FIE
The FIE has long been considered the result of holistic processing of upright faces and not an effect that can be explained by the sum of facial parts processing (Maurer et al., 2002; Rossion and Gauthier, 2002; Tsao and Livingstone, 2008). However, recent findings have indicated that upright face perception is not necessarily a special perceptual process. Studies have demonstrated that whole faces are not recognized better than the sum of their parts (predicted by an optimal Bayesian integrator) (Gold et al., 2012) and that inversion induces quantitative, rather than qualitative, changes in face processing (Sekuler et al., 2004). These findings have challenged the assumption that upright faces involve holistic processing by suggesting that upright faces are simply processed in the same way as inverted faces, but just more efficiently (Konar et al., 2010).
Our findings provide a clue to explain this apparent discrepancy by dissociating qualitative and quantitative differences in cortical pathways for upright and inverted face processing. We found qualitative connectivity differences between cortical pathways for upright and inverted face processing, and suggest that each pathway independently contribute to the FIE. Upright faces elicited isolated activation of the OFA/FFA with concurrent lateral inhibition of irrelevant processing (the LO) that could otherwise disrupt intact face recognition (Amedi et al., 2005; Munakata et al., 2011). On the other hand, inverted face processing failed to suppress the LO and had modulatory connectivity to the IPS. The result, that couplings to the IPS are increased under the inverted face condition, but not decreased under the upright face condition, may indicate that inverted faces form redundant representations in the OFA/FFA (Gold et al., 2012), rather than that upright faces form compact representations, for further VWM processing (Curby and Gauthier, 2007). These findings suggest that an absence of object-processing suppression and an increased load on VWM underlie deteriorations in inverted face recognition. In other words, whether the face network is isolated from the nonface network may be a crucial factor for accurate upright face recognition.
Furthermore, our results showed that there is a significant quantitative relationship between the VC-OFA modulatory connectivity FIE and behavioral FIE (i.e., individuals with stronger VC-OFA modulatory connectivity FIE showed smaller behavioral FIE, and vice versa). Given that the OFA has been shown to be involved in the perception of facial parts (Rotshtein et al., 2005; Pitcher et al., 2011), our results are consistent with a previous study that showed the behavioral FIE could be reduced by inducing participants to encode faces in terms of parts (Farah et al., 1995). Farah et al. (1995) demonstrated that the FIE was present only when participants were instructed to encode faces as a whole, not when encoding them piecemeal. It is likely that the bottom-up modulatory connectivity from the VC to OFA contributes to the behavioral FIE by regulating parts-based encoding of faces. The VC to OFA modulatory connectivity may reflect individual differences in focus on facial parts; thus, in a quantitative manner, stronger or weaker connectivity may result in a smaller or larger behavioral FIE.
As shown in Figure 7D, differences in modulatory connectivity between the upright and inverted face conditions were significant between the VC/OFA/FFA and the LO/IPS, although these differences were comparable within the VC/OFA/FFA. This may indicate that a significant portion of the FIE is mediated by interactions in the former distinct pathways for processing upright and inverted faces, and that, based on these interactions, effective connectivity within the latter common pathways, such as from the VC to OFA, modulates the magnitude of the behavioral FIE. The common cortical pathways for processing upright and inverted faces, together with the distinct pathways, may each contribute to the FIE.
FIE in the face and object areas
Yovel and Kanwisher (2005) showed a larger response in the FFA to upright faces compared with inverted faces; however, most previous studies found weak, or no, changes in FFA response (Kanwisher et al., 1998; Aguirre et al., 1999; Haxby et al., 1999; Maurer et al., 2002; Leube et al., 2003; Epstein et al., 2006). Although they suggested that this inconsistency was caused by a strong behavioral FIE in their study (Yovel and Kanwisher, 2005), we did not find a larger response to upright faces in the FFA (Figs. 3D and 4), even though we observed a strong behavioral FIE (Fig. 1C). This finding suggests that a strong behavioral FIE does not necessarily induce the FIE in the FFA, and that FFA activation alone is not sufficient to explain the behavioral FIE. Although the causes of this discrepancy are not yet clear, the area or connectivity in the face network that has a significant impact on behavioral performance may depend on the particular task demands (Atkinson and Adolphs, 2011).
On the other hand, consistent significant increases in LO activity have been observed when faces are inverted (Aguirre et al., 1999; Haxby et al., 1999; Yovel and Kanwisher, 2005; Epstein et al., 2006). These studies found strong effects of stimulus inversion on the LO; however, there were minimal effects on the OFA/FFA. Haxby et al. (1999) suggested that the failure of face processing systems with inverted faces results in the recruitment of object processing systems. Our results expand on previous research by suggesting that object processing areas are not just activated when faces are inverted but are actively suppressed when faces are upright. The inhibitory connectivity to the LO may help sharpen the activity profiles of the OFA/FFA and isolate them from disruptive object processing areas (Desimone and Duncan, 1995; Gazzaley et al., 2005; Reddy et al., 2009; Munakata et al., 2011; Franconeri et al., 2013). This would resolve the competition between regions with different visual preferences and contributes to efficient processing of upright rather than inverted faces (Sekuler et al., 2004; Gold et al., 2012).
In conclusion, the present study showed that distinct and common cortical pathways for upright and inverted face processing mediate the FIE. Upright and inverted faces are processed by distinct couplings between face, object, and VWM-related areas in qualitatively different ways. In addition, upright and inverted face processing involves common cortical pathways within the so-called face network (Haxby et al., 2000; Fairhall and Ishai, 2007), and quantitative differences in the network's connectivity mediate the FIE in a quantitative manner. These results suggest that the FIE is not solely due to the face network but is mediated by multiple networks, including areas that are not necessarily specific to faces. Our findings not only clarify the dynamic cortical interactions that underlies the FIE but help to resolve the debate over the underlying mechanisms. As inversion effects are multifaceted phenomena (Maurer et al., 2002), a mechanistic understanding of the range of inversion effects (e.g., Thatcher illusion; Thompson, 1980) will require further investigation. Nevertheless, we suggest that effective connectivity is dynamically coordinated among distinct and common networks for processing upright and inverted faces and results in the variety of inversion effects.
Footnotes
- Received September 23, 2014.
- Revision received January 30, 2015.
- Accepted February 5, 2015.
This study was supported in part by the Japan Society for the Promotion of Science Grant-in-Aid for Scientific Research 26540061 to D.M., 21220005 to N.S., and 20119001 to R.K., Core Research for Evolutional Science and Technology, Japan Science and Technology Agency to D.M., the Osaka University Global Center of Excellence Program Center of Human-Friendly Robotics Based on Cognitive Neuroscience to D.M., and the Ministry of Education, Culture, Sports and Technology of Japan Strategic Research Program for Brain Sciences Development of Biomarker Candidates for Social Behavior to N.S. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Daisuke Matsuyoshi, Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan. matsuyoshi{at}fennel.rcast.u-tokyo.ac.jp
- Copyright © 2015 the authors 0270-6474/15/354268-12$15.00/0