Elsevier

NeuroImage

Volume 54, Issue 2, 15 January 2011, Pages 1715-1734
NeuroImage

High-resolution imaging of the fusiform face area (FFA) using multivariate non-linear classifiers shows diagnosticity for non-face categories

https://doi.org/10.1016/j.neuroimage.2010.08.028Get rights and content

Abstract

Does the “fusiform face area” (FFA) code only for faces? This question continues to elude the neuroimaging field due to at least two kinds of problems: first, the relatively low spatial resolution of fMRI in which the FFA was defined and second, the potential bias inherent in prevailing statistical methods for analyzing the actual diagnosticity of cortical tissue. Using high-resolution (1 mm × 1 mm × 1 mm) imaging data of the fusiform face area (FFA) from 4 subjects who had categorized images as ‘animal’, ‘car’, ‘face’, or ‘sculpture’, we used multivariate linear and non-linear classifiers to decode the resultant voxel patterns. Prior to identifying the appropriate classifier we performed exploratory analysis to determine the nature of the distributions over classes and the voxel intensity pattern structure between classes. The FFA was visualized using non-metric multidimensional scaling revealing “string-like” sequences of voxels, which appeared in small non-contiguous clusters of categories, intertwined with other categories. Since this analysis suggested that feature space was highly non-linear, we trained various statistical classifiers on the class-conditional distributions (labelled) and separated the four categories with 100% reliability (over replications) and generalized to out-of-sample cases with high significance (up to 50%; p < .000001, chance = 25%). The increased noise inherent in high-resolution neuroimaging data relative to standard resolution resisted any further gains in category performance above ~ 60% (with FACE category often having the highest bias per category) even coupled with various feature extraction/selection methods. A sensitivity/diagnosticity analysis for each classifier per voxel showed: (1) reliable (with S.E.<3%) sensitivity present throughout the FFA for all 4 categories, and (2) showed multi-selectivity; that is, many voxels were selective for more than one category with some high diagnosticity but at submaximal intensity. This work is clearly consistent with the characterization of the FFA as a distributed, object-heterogeneous similarity structure and bolsters the view that the FFA response to “FACE” stimuli in standard resolution may be primarily due to a linear bias, which has resulted from an averaging artefact.

Research Highlights

► The FFA shows a heterogeneous and distributed selectivity code for 4 categories. ► Faces do not have an advantage according to non-linear classification analysis. ► Multivariate non-linear classifiers detect information that standard analyses do not. ► A linear bias for faces can explain past contradictions about the nature of the FFA.

Introduction

The debate over the existence of modules in the human visual system that are specialized for a particular category continues to evolve, with a multitude of studies supporting each side of the dispute. One side states that specialized modules perform exclusive processing of one type of object category (the most famous example being a face-processing module), and the other side argues that object representation is a widespread overlapping or combinatorial code, and that there are no modules exclusively devoted to a particular category. Early fMRI studies investigating face and object recognition (Kanwisher et al., 1997) showed that the fusiform face area (FFA) appears to exclusively respond to faces. The body of literature supporting the existence of specialized modules also extends beyond faces; modules for other objects have also been reported, including bodies (Downing et al., 2001, Downing et al., 2006) and the parahippocampal place area (PPA) for places (Epstein and Kanwisher, 1998). In direct contrast to this modular hypothesis, Haxby et al. (2001) and Hanson et al. (2004) showed that the voxel response code across large sections of inferior temporal cortex (including FFA, as well as surrounding areas) appears to be distributed rather than modular; and that submaximal voxel responses are an essential part of this code.

A possible methodological reason for the conflicting results is the type of analysis performed. The original modular studies used an analysis approach based on the general linear model (GLM) (Kanwisher et al., 1997, Downing et al., 2006), whereas those supporting a distributed representation have used multivariate classifiers (Haxby et al., 2001, Hanson et al., 2004, Hanson and Halchenko, 2008). Subsequent literature using the multivariate correlation method, despite some exceptions (e.g. the first study to repeat this method (Spiridon and Kanwisher, 2002) was unable to discriminate between two non-preferred categories in a selective module; this negative result was likely due to low statistical power), seems to have reached an agreement in that category representation is distributed to some extent, and that purported modules do contain information about non-preferred object categories (O'Toole et al., 2005, Reddy and Kanwisher, 2007). Other studies using even more powerful multivariate classifiers were able to detect additional reliable, diagnostic information about multiple non-preferred object categories in purported modules (Hanson et al., 2004, Hanson and Halchenko, 2008).

If different methods of analysis yield entirely different conclusions about the nature of face and object representation in the human visual system, then using the best available analysis is of utmost importance. Some methods may simply be inappropriate to accurately address questions about cortical representation using fMRI. The standard GLM has several limitations: it treats each voxel as independent from the others; it is restricted to detecting linear relationships; and is typically not cross-validated, which leaves open the possibility that the model has merely reflected sample bias, rather than learning diagnostic patterns that are generalizable and truly diagnostic of a category. Haxby's correlation method (Haxby et al., 2001) is an improvement in that it can detect submaximal activity that the GLM cannot; however it also has shortcomings that have been discussed elsewhere (Hanson et al., 2004) and are summarized again in the Discussion. More powerful classifiers can offer even more improvement, in terms of both specifying individual voxels' contribution to category classification, and obtaining more generalizable and stable results as long as over-fitting is concurrently controlled.

Using these tools, the question of the FFA's specificity can be thoroughly explored to address the remaining disagreement on whether faces are “special” in the FFA (Reddy and Kanwisher, 2007, Hanson and Halchenko, 2008). It is possible that subtle information exists in the FFA for multiple object categories, for example in voxels that are submaximally responsive and in patterns of activation across voxels. Even though this information is difficult or impossible to detect with standard methods, it is nevertheless informative if the BOLD signal is consistently differentially active across different categories; non-linear and submaximal information is information nevertheless, and is a crucial component of a combinatorial code. Using non-linear and multivariate classifiers to explore the nature of voxel sensitivity in the FFA to multiple categories is therefore crucial for our understanding of the cortical representation of faces and objects.

The debate continues today on another methodological issue as well: the resolution of fMRI. At standard resolution (3 mm × 3 mm × 4 mm) and with standard analyses, the FFA appears to be broadly responsive across all object categories but maximally responsive for the category of “face”; however in sharper focus (high-resolution 1 mm3 voxels), the selectivity may look quite different. If the modular hypothesis is true, the high-resolution selectivity may mirror the low-resolution response as one large cluster exclusively and uniformly responsive to “face”; what Kanwisher (2006) has called the “blueberry sized ‘face’ module” within inferior temporal lobe. At the opposite end of the modular-distributed spectrum, however, it could consist of smaller clusters of sharply tuned neurons, each highly selective for its preferred category, but with “face”-selective clusters being larger or more numerous.

Grill-Spector et al. (2006) investigated this question, first defining the FFA in standard resolution, and re-imaging the response to 4 categories (full-body animals, cars, faces, and abstract sculptures), and defined a selectivity index to measure the category preference of each voxel. The original results found that the FFA is highly distributed and heterogeneous, consisting of many small voxel clusters that each exhibit strong selectivity for one preferred category, and containing clusters selective for all 4 of their tested object categories.

However, the results of that paper were called into question (Baker et al., 2007) due to an error in the original analysis. Essentially, the authors used the same voxels (and TRs) for localization of the category types that they used for their selectivity measure. Consequently, they created an upward bias, in selecting voxels (and TRs) that already responded to a given category and then testing those same voxels (and TRs) again for selectivity of those same categories (this unfortunate error has recently fallen under the category of so-called “Voodoo Statistics”, which can include many types of cross-validation error; see Vul et al., 2009). Hence, they were forced to retract their original claim that FFA voxels are highly selective for both face and non-face objects (Grill-Spector et al., 2006).

This turn of events has left the nature of the FFA and its object specificity ambiguous at best. Is the FFA actually composed of patches that are only strongly selective for “face” stimuli? Or is it more heterogeneous, involving strongly selective patches for both “face” and non-“face” stimuli? Our goal in this study is twofold. First, we attempt to resolve this current controversy by for the first time combining high-resolution imaging and cutting-edge statistical learning methods specifically developed for fMRI (Hanson and Halchenko, 2008, Hanke et al., 2009), which offer dramatic improvements over the more standard methods of analysis. Second, beyond presenting yet another study to add to the numerous existing ones concerning this debate, we also endeavor to explain why this controversy exists at all. Specifically, we will look at the information in the FFA that is utilized by various classifiers, and whether the amount of relative information across classes differs over different types of classifiers, i.e. whether “face” is special at some or any of these levels. Through a detailed look at the analytical methods used and the characteristics of the data, we investigate how two apparently contradictory results can be drawn.

In order to examine these questions, we will perform a series of specific analyses. First, we will visually explore the data, both to check for obvious evidence of “face”-modularity in the high-resolution data, and to determine appropriate methods of subsequent analysis based on the data characteristics (for example, a clear visual separation between categories would indicate that linear classification methods would be appropriate). Next, in order to test the amount of diagnostic information available in the FFA for each category, we train and cross-validate several types of classifiers, and compare the best linear case with the best non-linear case for each subject. In drawing conclusions from the results, we will look for the overall level of classification accuracy, as well as the relative performance of the classifiers by category: for example, if “face” has a consistent and large advantage over the other categories, this would indicate that faces are truly special in the FFA. Subsequently, we illustrate the high-resolution spatial selectivity in each subject's FFA as determined by the most accurate classifiers and show the voxels' selectivities per category, in order to conclusively address the question of FFA selectivity in high-resolution fMRI. To test whether these spatial selectivity maps are stable, we also present maps showing a measure of reliability of these maps over many iterations of the classifiers. Finally, we investigate the possibility of a linear bias for face: if relatively more information is available for “face” at a linear level (leading to results of face selectivity by standard linear methods), and no such advantage exists at a non-linear level (leading to multivariate studies supporting a distributed representation), this could explain the contradictory results in the literature regarding whether or not the FFA is truly “face”-selective.

Section snippets

Methods

In the present study we use high-resolution imaging collected with the same stimuli and similar imaging parameters as Grill-Spector et al., 2006.1

Exploratory raw data visualization

An increase in the high-end tail of the raw BOLD distribution for “face” was necessarily present in the standard-resolution data in order to have localized the FFA; yet the exploratory raw data visualization shows no obvious visual evidence for a modular nature of either “face” or any other category. Fig. S1 (Supplementary materials) shows raw BOLD activity that is extremely similar across categories; however, it is possible a more complex mixture of voxels in this distribution could reflect a

Summary

The FFA localized independently using standard resolution (3 mm) shows non-face discriminative responses at higher resolution (1 mm) fMRI. This observation appears to be in conflict with past observations that the FFA is homogeneous in object selectivity. In fact, it is consistent with past research in that (a) we are using a more sensitive method for determination of pattern diagnosticity and (b) higher resolution voxels afford another potential increase in pattern sensitivity of voxel response.

Conclusions

We hypothesize that the overall finding of heterogeneity at high-resolution fMRI, with respect to object selectivity in the FFA, will also extend to other regions in the “object-selective” cortex of the visual stream. Of all the apparently object-selective visual areas, the FFA has always shown the strongest selectivity for its preferred category when analyzed at standard resolution with a GLM; therefore heterogeneity would have been the most difficult to find with this area. Since we have

Acknowledgments

We would like to thank K. Grill-Spector and R. Sayres for collecting the high-resolution BOLD data and Catherine Hanson and other members of the RUMBA Lab for providing comments and discussion which has improved our analysis and this paper. We thank NSF and McDonnell foundation for support.

References (52)

  • C.I. Baker et al.

    Does the fusiform face area contain subregions highly selective for nonfaces?

    Nat. Neurosci.

    (2007)
  • Bell A.H., Ungerleider L.G., submitted for...
  • A.H. Bell et al.

    Object representations in the temporal cortex of monkeys and humans as revealed by functional magnetic resonance imaging

    J. Neurophysiol.

    (2008)
  • C. Bruce et al.

    Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque

    J. Neurophysiol.

    (1981)
  • X. Chen et al.

    Exploring predictive and reproducible modeling with the single-subject FIAC data set

    Hum. Brain Mapp.

    (2006)
  • R. Desimone et al.

    Stimulus-selective properties of inferior temporal neurons in the macaque

    J. Neurosci.

    (1984)
  • P.E. Downing et al.

    A cortical area selective for visual processing of the human body

    Science

    (2001)
  • P.E. Downing et al.

    Domain specificity in visual cortex

    Cereb. Cortex

    (2006)
  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • R. Epstein et al.

    A cortical representation of the local visual environment

    Nature

    (1998)
  • I. Gauthier et al.

    Activation of the middle fusiform ‘face area’ increases with expertise in recognizing novel objects

    Nat. Neurosci.

    (1999)
  • K. Grill-Spector et al.

    High-resolution imaging reveals highly selective non face clusters in the fusiform face area

    Nat. Neurosci.

    (2006)
  • C.G. Gross et al.

    Visual properties of neurons in inferotemporal cortex of the macaque

    J. Neurophysiol.

    (1972)
  • M. Hanke et al.

    PyMVPA: a Python toolbox for multivariate pattern analysis of fMRI data

    Neuroinformatics

    (2009)
  • S.J. Hanson et al.

    What connectionist models learn: learning and representation in connectionist networks

    Behav. Brain Sci.

    (1990)
  • S.J. Hanson et al.

    Brain reading using full brain support vector machines for object recognition: there is no “face” identification area

    Neural Comput.

    (2008)
  • Cited by (30)

    • P-curving the fusiform face area: Meta-analyses support the expertise hypothesis

      2019, Neuroscience and Biobehavioral Reviews
      Citation Excerpt :

      Using our exclusion criteria, we excluded one paper on the basis that they found expertise effects in the right FFA of a patient with agnosia (Behrmann et al., 2005), and another two that partially reused data from prior studies (Gauthier and Tarr, 2002; McGugin et al., 2016). We excluded a further five papers that found FFA non-face effects that were not directly testing expertise (Adamson and Troiani, 2018; Çukur et al., 2013; Grill-Spector et al., 2006; Hanson and Schmidt, 2011; Slotnick et al., 2013), and another paper as it found right FFA Greeble effects that were associated with their similarity to faces, rather than expertise (Brants et al., 2011). From the remaining papers, two were excluded as they did not report the exact statistical values required (Gauthier et al., 2000a; Martens et al., 2018), and another two studies that appeared to test expertise for unusual faces (James and James, 2013; McGugin et al., 2017).

    • Fine-grained stimulus representations in body selective areas of human occipito-temporal cortex

      2014, NeuroImage
      Citation Excerpt :

      Electrophysiological recordings in face patches reported much higher face selectivity and weaker responses to other categories (Issa and DiCarlo, 2012; Ohayon et al., 2012; Tsao et al., 2006). Also, monkey and human imaging results indicated weak classification performance of face selective voxels for classifying non-preferred categories (Spiridon and Kanwisher, 2002; Tsao et al., 2003) (but see Hanson and Schmidt, 2011). Along the same line, Liu et al. (2013) reported a category cluster for faces but not bodies or inanimate objects in monkey face patches.

    • The fusiform face area responds equivalently to faces and abstract shapes in the left and central visual fields

      2013, NeuroImage
      Citation Excerpt :

      An alternative method of analysis is based on the pattern of activity across voxels in a region, referred to as multi-voxel pattern analysis (MVPA). A recent MVPA study revealed that voxels throughout the FFA were sensitive to faces, animals, cars, and sculptures (Hanson and Schmidt, 2011). Based on the present findings, we predict that an MVPA analysis of the right FFA will reveal equivalent sensitivity to faces and shapes in the left and central (but not right) visual fields.

    • Describing functional diversity of brain regions and brain networks

      2013, NeuroImage
      Citation Excerpt :

      We highlight that our approach allows us to characterize and quantify functional properties of regions and networks without attribution of a unique function. A growing number of studies are finding evidence for the activity of brain regions across multiple task domains (Anderson, 2010; Hanson and Schmidt, 2011; Poldrack, 2006). In many cases, this has led to the attribution of a generic function to the local circuit, such as “multi-modal integration” (Kurth et al., 2010).

    • Defining face perception areas in the human brain: A large-scale factorial fMRI face localizer analysis

      2012, Brain and Cognition
      Citation Excerpt :

      Moreover, object-related activation in the FFA appears to be functional, since large repetition suppression, or adaptation, effects are also found for nonface objects in the FFA (Avidan, Hasson, Hendler, Zohary, & Malach, 2002; Dricot et al., 2008; Grill-Spector & Malach, 2001). The larger response to faces may be due to all voxels in the FFA responding more but not exclusively to faces than objects, or to the FFA being composed of a subpopulation of voxels responding selectively to faces interspersed with other clusters responding equally strongly to all object categories (Grill-Spector, Sayres, & Ress, 2007; Grill-Spector et al., 2006a; Grill-Spector et al., 2006b; Hanson & Schmidt, 2011). Since electrophysiological recordings in a putative homologous area in the monkey brain reveals neurons that respond exclusively to faces (Tsao et al., 2006), it is the latter interpretation that is probably correct.

    View all citing articles on Scopus
    View full text