Abstract
Neuroimaging studies have revealed strong selectivity for object categories in high-level regions of the human visual system. However, it is unknown whether this selectivity is truly based on object category, or whether it reflects tuning for low-level features that are common to images from a particular category. To address this issue, we measured the neural response to different object categories across the ventral visual pathway. Each object category elicited a distinct neural pattern of response. Next, we compared the patterns of neural response between object categories. We found a strong positive correlation between the neural patterns and the underlying low-level image properties. Importantly, this correlation was still evident when the within-category correlations were removed from the analysis. Next, we asked whether basic image properties could also explain variation in the pattern of response to different exemplars from one object category (faces). A significant correlation was also evident between the similarity of neural patterns of response and the low-level properties of different faces, particularly in regions associated with face processing. These results suggest that the appearance of category-selective regions at this coarse scale of representation may be explained by the systematic convergence of responses to low-level features that are characteristic of each category.
Introduction
Neuroimaging studies have shown that discrete regions of the ventral visual pathway are specialized for different categories of objects. For example, some regions are more responsive to faces than to images of nonface objects (Kanwisher et al., 1997), whereas other regions are selective for images of places (Epstein and Kanwisher, 1998), body parts (Downing et al., 2001), visually presented words (Cohen et al., 2000), and inanimate objects (Malach et al., 1995). This selectivity has been regarded as characteristic of a modular organization in which distinct areas are responsible for processing functionally distinct categories of the visual stimulus (Kanwisher, 2010). Despite the evidence for category selectivity in the ventral visual pathway, specialized regions have only been reported for a limited number of object categories (Downing et al., 2006; Op de Beeck et al., 2008). Other studies provide support for a more distributed categorical organization. For example, it has been shown that the spatial pattern of response across the entire ventral stream can distinguish a much greater range of object categories (Haxby et al., 2001; Spiridon and Kanwisher, 2002; Cox and Savoy, 2003; Kriegeskorte et al., 2008). Indeed, the ability of the pattern to discriminate particular object categories is still evident when the most category-selective voxels were removed from the analysis.
Although these studies show robust and reliable patterns of response to different object categories in the ventral visual pathway, it is not clear if more basic principles underpin these patterns of response. Evidence for larger scale patterns of response across the ventral visual pathway have been found for the animacy (Chao et al., 1999), real-world size (Konkle and Oliva, 2012), and the location in the visual field (Levy et al., 2001; Brewer et al., 2005; Arcaro et al., 2009) of visual objects. However, it is unclear how these factors explain the stronger category-selective patterns of response (Op de Beeck et al., 2008). A potential answer to this question was proposed by O'Toole et al. (2005), who showed that the capacity to discriminate neural responses to images in one category from those of another could be predicted by corresponding discriminations based on image properties. These results suggest that a purely category-based account of these patterns is incomplete. However, the ability to discriminate one category from another can depend on local features of image or brain activity rather than the pattern as a whole. To understand the relationship between low-level image properties and patterns of neural response it is necessary to compare the similarity of entire patterns. Moreover, this relationship should still be evident in comparisons between images that do not share a category.
The aim of the current study was to compare global low-level image properties with patterns of neural response across the whole ventral visual pathway. If patterns of response are based on categorical properties of the stimulus, any similarities in responses to different objects should only reflect high-level relationships between categories to which they belong. On the other hand, if image properties play an important role, the response to different objects should be better explained in terms of their low-level characteristics.
Materials and Methods
Participants
Data were collected from 40 participants (30 females; mean age, 23). Twenty participants took part in Experiment 1 and 20 participants took part in Experiment 2. All observers had normal or corrected-to-normal vision. Written consent was obtained for all participants and the study was approved by the York Neuroimaging Centre Ethics Committee. All images (approximately 8° × 8°) were presented in grayscale and were back projected onto a screen located inside the bore of the scanner, ∼57 cm from the participants' eyes.
Experiment 1 had five different object categories: bottle, chair, face, house, and shoe. Examples of images from each condition are shown in Figure 1. Images were presented on a mid-gray background. Face images from Experiment 1 were taken from the Radboud Face Database. Stimuli were presented with a blocked design. There were six images from one stimulus condition in each block. Each image was presented for 800 ms and followed by a 200 ms blank screen, resulting in a total block length of 6 s. Stimulus blocks were separated by a 9 s gray screen with a central fixation cross. There were eight runs in the scan. In each run, each of the five categories was shown in a pseudorandomized order to ensure that each condition was presented evenly across the scan. Different exemplars of each stimulus condition were used in all blocks. The stimulus conditions for Experiment 2 included exemplars from one category (faces). A face image was taken from each of the following familiar identities: Brad Pitt, David Beckham, Gary Lineker, Rowan Atkinson, Tom Cruise, and Tom Hanks. In all other respects, the design and procedure was identical to the first experiment.
Examples of images from the different object categories: bottles, chairs, faces, houses, and shoes.
fMRI data acquisition and analysis
The experiment was performed using a GE 3 T HD Excite MRI scanner at the York Neuroimaging Centre at the University of York. An 8-channel, phased-array head coil (GE) tuned to 127.4 MHz was used to acquire MRI data. A gradient-echo EPI sequence was used to collect data from 38 contiguous axial slices. (TR = 3 s, TE = 25 ms, FOV 28 * 28 cm, matrix size = 128 * 128, slice thickness 3 mm). These were coregistered onto a T1-weighted anatomical image (1 × 1 × 1 mm) from each participant. To improve registrations, an additional T1-weighted image was taken in the same plane as the EPI slices. Finally, individual participant data were registered to the standard brain (MNI 152).
Statistical analysis of the fMRI data was performed using FEAT in the FSL toolbox (http://www.fmrib.ox.ac.uk/fsl). The first three volumes (9 s) of each scan were removed to minimize the effects of magnetic saturation, and slice-timing correction was applied. Motion correction was followed by temporal high-pass filtering (cutoff, 0.01 Hz) and spatial smoothing 6 mm (Gaussian, FWHM). Regressors for each condition in the general linear model were convolved with a gamma hemodynamic response function. This generated parameter estimates for each condition in each voxel across the entire scan.
The reliability of the neural patterns of response was determined using the correlation-based multivoxel pattern analysis (MVPA) method devised by Haxby et al. (2001). To determine whether the pattern of response to different object categories generalized across individuals, we adapted the method so that the pattern of response to each condition in an individual participant was compared with the pattern of response from the group (Fig. 2). The parameter estimate for each regressor was taken as a measure of response relative to baseline in each voxel across the entire scan. The mean response in each voxel across all conditions was subtracted from the response to each condition. The group pattern was derived by entering the remaining 19 participants' data into a higher level group analysis (mixed effects, FLAME; http://www.fmrib.ox.ac.uk/fsl). The group response was then compared with the data from the individual who was omitted from the group. For each comparison, this “leave one participant out” (LOPO) method was repeated 20 times with a different participant being compared with the group each time. If a given stimulus category evoked a distinct pattern of activity, then independent observations of the response to that category should be more similar to each other than to responses to different categories. Pearson correlation was used to determine the similarity of the patterns across all combinations of object categories.
Schematic diagram of pattern analysis procedure. A LOPO method was used to measure patterns of response to different stimulus conditions. In this analysis, the pattern of response elicited by one participant is compared with the pattern generated by a group analysis of all remaining participants. This procedure is repeated for all combinations of stimulus conditions and participants. This example shows the response to faces from an individual participant and the group. This cross-validation analysis was used to ask whether the patterns of response to different object categories are consistent across participants.
To avoid any assumptions about the functional organization of the ventral stream, we used anatomical masks defined by the Harvard Oxford Atlas. These masks included: lateral occipital cortex– inferior, middle temporal– temporal occipital, inferior temporal– temporal occipital, fusiform–occipital, fusiform–temporal occipital, fusiform–posterior, parahippocampal–posterior, lingual, superior temporal– posterior, superior temporal–anterior, middle temporal–posterior, middle temporal–anterior, inferior temporal–posterior, inferior temporal–anterior, fusiform–anterior, parahippocampal–anterior, and the temporal pole. The location of the individual masks is shown in Figure 3. The ventral visual pathway mask was a concatenation of these individual anatomical masks.
Location of regional masks in the ventral visual pathway. The ventral stream mask was based on a concatenation of the individual anatomical masks.
The ability to discriminate patterns of response to specific object categories or to specific exemplars of an object category was calculated by determining whether the within-condition correlations were greater than the between-condition correlations. An arcsin square root transformation was performed on correlation values before they were entered into repeated-measures ANOVAs.
Correlation between patterns of neural response and image properties.
Finally, we asked whether the patterns of neural response in Experiments 1 and 2 could be explained by the image statistics of the visual objects. The image statistics of each object were computed using the GIST descriptor (http://people.csail.mit.edu/torralba/code/spatialenvelope/). For each image, a vector of 512 values was obtained by passing the image through a series of Gabor filters across eight orientations and four spatial frequencies, and windowing the filtered images along a 4 × 4 grid (Fig. 4). Each vector represents the image in terms of the spatial frequencies and orientations present at different positions across the image. A cross-validation procedure was used to determine how similar individual objects were to the average of each object category. GIST descriptors were averaged across all but one of the images within each category of object. These average descriptors were then compared with each unique image creating within- and between-category correlations for each combination of object category.
Schematic illustration of the calculation of a GIST descriptor for an example image. A series of Gabor filters across eight orientations and four spatial frequencies are applied to the image. Each of the resulting 32 filtered images is then windowed along a 4 × 4 grid to give a final GIST descriptor of 512 values (right).
The correlation values for the GIST descriptor across different object categories were then compared with the corresponding correlation values in the fMRI pattern of response to different object categories. This generated a group r value. The significance of the correlation was assessed by determining the reliability of the r value across all subjects. To determine whether the correlations depended on the difference between within-category and between-category values, we removed the within-category correlations and repeated the analysis. This analysis provided a stronger test of whether the low-level image properties influenced the pattern of response to different objects, because the pattern of response cannot be predicted by category.
Results
Experiment 1
In Experiment 1, we determined the relationship between low-level properties of objects and the patterns they elicit in the ventral visual pathway. First, we measured the patterns of response to different object categories across the ventral visual pathway. Figure 5 shows distinct topographic patterns of response to each object category across the ventral visual pathway. We then compared the similarity of these topographic patterns across participants using MVPA. Figure 6A shows the correlation matrix for all combinations of object category. To quantify the reliability of these patterns of activation, a two-way ANOVA with Comparison (within-category, between-category) and Category (bottle, chair, face, house, and shoe) was run on the correlations from the ventral visual pathway. We found that the within-category (e.g., face–face) patterns of fMRI response were significantly more positively correlated than the between-category (e.g., face–house) patterns of response (F(1,19) = 283.3, p < 0.001). However, we also found a significant Comparison * Category interaction (F(4,76) = 18.6, p < 0.001). This reflected differences between the within-category and between-category correlations for each object category (within-between: Bottle: t(19) = 7.83, p < 0.001; Chair: t(19) = 7.04, p < 0.001; Face: t(19) = 15.05, p < 0.001; House: t(19) = 16.43, p < 0.001; Shoe: t(19) = 9.20, p < 0.001). This suggests that there was systematic variation in the magnitude of the within-category and between-category correlations across different object categories.
Topographic patterns of response to different object categories (left) in the ventral visual pathway. Red/yellow and blue/light blue colors represent positive and negative fMRI responses relative to the mean response across all objects. The patterns or response are restricted to the combined ventral visual pathway mask (see Fig. 3). Average image properties from each object category were described by contour plots of the Fourier power spectra across different spatial locations in the image.
Relationship between fMRI response and low-level image properties. Correlation matrices showing the within-category and between-category correlations in (A) fMRI response across the ventral visual pathway and (B) the image properties of different object categories. C, Scatter plot showing a strong positive correlation (r = 0.79) between the correlation matrices in A and B, demonstrating that patterns of fMRI response are closely linked to low-level image properties.
To determine whether the variance in the patterns of neural response could be explained by differences in the image statistics of different object categories, we measured the low-level properties of all images in the fMRI experiment. The image statistics of each object were computed using the GIST descriptor (Oliva and Torralba, 2001). Using the GIST descriptor, we calculated the average orientation energy at different spatial frequencies and spatial positions within each object category. We then determined the within- and between-category correlations in GIST values across images. Figure 6B shows that there were higher correlations in the image properties for within-category compared with between-category images, but the magnitude of this difference appears to vary for different object categories. We then compared the within-category and between-category correlations for the fMRI response with the corresponding correlations in the image properties. We found a strong positive correlation across the ventral visual pathway (r = 0.79, p < 0.001; Fig. 6C). Importantly, the correlation between low-level image properties and the neural pattern of response was still evident when the within-category comparisons were removed from the correlation (r = 0.53, p < 0.001). This finding shows that low-level image properties can explain systematic variation in the patterns of fMRI response regardless of category label.
Next, we asked whether this relationship between image properties and fMRI response varied across different anatomical regions within the ventral stream. Figure 7 shows that there was a significant positive relationship between the patterns of response and the image properties in the Temporal Pole (r = 0.54, p < 0.008), Middle Temporal Gyrus–Temporal Occipital (0.67, p < 0.001), Inferior Temporal Gyrus–Posterior (r = 0.58, p < 0.006), Inferior Temporal Gyrus–Temporal Occipital (r = 0.63, p < 0.001), Lateral Occipital (r = 0.66, p < 0.001), Parahippocampal Gyrus–Posterior (r = 0.61, p < 0.001), Lingual (r = 0.72, p < 0.001), Fusiform Gyrus–Posterior (r = 0.59, p < 0.001), Fusiform Gyrus–Temporal Occipital (r = 0.66, p < 0.001), and Fusiform Gyrus–Occipital (r = 0.74, p < 0.001) regions. When the within-category comparisons were removed from the analysis, the Inferior Temporal Gyrus–Temporal Occipital (r = 0.64, p < 0.001), Lateral Occipital (r = 0.57, p < 0.001), Lingual (r = 0.66, p < 0.001), Fusiform Gyrus–Temporal Occipital (r = 0.35, p < 0.01), and Fusiform Gyrus–Occipital (r = 0.63, p < 0.001) regions showed significant positive correlations.
Correlation between fMRI response to different object categories and GIST description across subregions of the ventral visual pathway. The scatter plots show variation in the way that low-level image properties described by the GIST can explain the pattern of response across the ventral visual pathway.
Finally, we asked whether the neural patterns of response to complex images in early visual cortex can also be predicted by the low-level properties of the image. We found significant correlations were evident for the intracalcarine (r = 0.69, p < 0.001), supracalcarine (r = 0.37, p < 0.01), and occipital pole (r = 0.64, p < 0.001) regions.
Experiment 2
In Experiment 2, we asked whether low-level image properties could also explain variation in the pattern of response to individual exemplars from one object category (faces). We measured fMRI patterns of response to six different face images. Figure 8A shows the correlation matrix for all combinations of face images. To quantify the reliability of the patterns of fMRI response to each face image, a two-way ANOVA with Comparison and Face as the main factors was used. We found a main effect of comparison that was due to larger within-exemplar compared to between-exemplar correlations (F(1,19) = 21.01, p < 0.001). This shows that pattern of response across the ventral visual pathway can discriminate different facial identities (Kriegeskorte et al., 2007; Natu et al., 2010; Nestor et al., 2011). We also found a significant Comparison*Face interaction (F(5,95) = 13.83, p < 0.001). This reflected differences between the within-category and between-category correlations for each facial identity (within-between: Brad Pitt: t(19) = 0.28, p = 0.782; David Beckham: t(19) = 1.81, p = 0.086; Gary Lineker: t(19) = 2.09, p = 0.050; Rowan Atkinson: t(19) = 8.75, p < 0.001; Tom Cruise: t(19) = 2.79, p = 0.012; Tom Hanks: t(19) = −1.33, p = 0.199).
Relationship between fMRI response and low-level image properties of exemplars from one category (faces). Correlation matrices showing the within-category and between-category correlations in (A) the fMRI response across the ventral visual pathway and (B) the image properties of different faces. C, Scatter plot showing the correlation between the correlation matrices in A and B. These results show that the low-level image properties are able to predict the topographic pattern of response to exemplars of a single object category (r = 0.44), although the relationship is not as strong when compared with exemplars from different object categories (Fig. 6).
Next, we determined whether low-level differences between faces described by the GIST description could account for the differences in the neural patterns of response. Figure 8B shows the differences in the image properties between the different face exemplars. Finally, we correlated the fMRI response correlations with the corresponding image properties. We found significant correlations between the similarity of neural patterns of response and the image properties for different face images in the ventral pathway (r = 0.44, p < 0.001; Fig. 8C). However, no correlation was found between image properties and neural responses when the within-category correlations were removed from the analysis (r = −0.03, p = 0.95). These results suggest that a finer scale of cortical organization may be necessary to discriminate exemplars of a single category such as faces.
To determine whether there were regional differences, we measured the relationship between fMRI response and image properties across different regions of the ventral visual pathway (Fig. 9). Significant correlations were evident in the Lateral Occipital (r = 0.27, p < 0.005), Lingual Gyrus (r = 0.53, p < 0.001), Fusiform Gyrus–Temporal Occipital (r = 0.49, p < 0.001) and Fusiform Gyrus–Occipital (r = 0.62, p < 0.001) regions. However, when the within-category correlations were removed from the analysis, only the Fusiform Gyrus–Occipital region showed a significant correlation between image properties and fMRI response (r = 0.24, p < 0.05).
Correlation between fMRI response to different faces and GIST description across subregions of the ventral visual pathway. The scatter plots show variation in the way that low-level image properties described by the GIST can explain the pattern of response to different faces across the ventral visual pathway. It is interesting to note that significant positive correlations were only evident in the Fusiform (Temporal–Occipital, Occipital) and Lingual gyri.
Finally, we asked whether the neural patterns of response to complex images in early visual cortex can also be predicted by the low-level properties of the image. We found significant correlations were only evident in the intracalcarine (r = 0.53, p < 0.001) and occipital pole (r = 0.47, p < 0.001)
Discussion
The aim of this study was to determine whether more basic principles and dimensions than category could underlie the topographic organization of the ventral visual pathway. We found that the patterns of response to images from the same object category were more similar than the patterns of response to objects from different categories. However, there were differences in the magnitude of both the within-category and between-category correlations. Next, we investigated the extent to which this variation in the neural response to different objects could be explained by systematic differences in low-level image properties. We found a strong, linear relationship between the pattern of neural response in the ventral visual pathway and the image statistics of different object categories.
These results have important implications for understanding how the ventral visual cortex is organized. A dominant perspective on the organization of this region is that it encodes information based on object category (Kanwisher, 2010). This organization contrasts with the continuous, topographic maps found in early stages of visual processing, which are tightly linked to the properties of the visual image (Hubel and Wiesel, 1968; Wandell et al., 2007). Until now, it has proved difficult to explain how selectivity for object categories suddenly emerges from these low-level representations (Op de Beeck et al., 2008). The strong, linear relationship between low-level image properties and large-scale, distributed patterns of neural response, we report, suggests that no explanation may be necessary. Patterns of response at this coarse scale of representation can be explained by low-level properties of the image. These findings are consistent with previous studies that used principal component analysis to show that neural responses to different object categories in inferior temporal cortex can be predicted by variance in the principal components of the images (O'Toole et al., 2005; Baldassi et al., 2013). However, this need not be counter to a categorical representation, given that a category typically contains objects that are visually similar. The key finding from this study is that the correlation between the pattern of neural response and the low-level properties was still evident when the within-category correlations were removed from the analysis. If the organization of the ventral visual cortex is solely dependent on categorical principles, then the linear relationship between neural and image properties should not extend to between-category correlations when the within-category correlations are removed from the analysis.
Our results provide a novel framework in which to consider the topographic organization of the ventral visual pathway. Previous studies have shown that category-selective patterns of response are robust and reliable within and between individuals (Kanwisher, 2010; Haxby et al., 2011). However, it has remained a substantial challenge to describe the strong categorical specificity that exists in ventral visual pathway by simpler properties that can be continuously mapped across the cortical surface. A fundamental problem in this endeavor is that category membership is a discontinuous variable. It is possible to change continuously between exemplars from one category (morphing between two faces, for example), because the corresponding features are obvious. However, the corresponding features of exemplars from different categories (e.g., face and house) are less clear. The low-level image properties of the GIST descriptor reflect variation in spatial position, orientation, and spatial frequency across the image. With this lower level framework of stimulus representation, it is more straightforward to determine how a continuous map could emerge (Op de Beeck et al., 2008). It is important to note that these representations need not be similar to the way information is represented in lower level regions. Indeed, it seems likely that there will be an over-representation of low-level image properties that are more commonly found in natural images (Op de Beeck et al., 2001; Kayaert et al., 2003). The appearance of discrete regional selectivity may thus emerge from the characteristic combinations of low-level image properties that co-occur in natural stimuli. This conclusion is consistent with other work showing low-level biases in responses of high-level visual regions. For example, spatial frequency (Rajimehr et al., 2011), orientation (Nasr and Tootell, 2012), and shape (Wilkinson et al., 2000) biases, along with visual field representations (Levy et al., 2001; Brewer et al., 2005; Arcaro et al., 2009), have been reported in these regions.
We found that neural patterns of response to different objects in lower level visual areas can also be predicted by the low-level properties of the image. This finding is consistent with previous work that has shown that activity in lower level visual areas (V1, V2, V3) could be decoded to correctly identify natural images, using a model derived from estimates of each voxel's selectivity for location, spatial frequency, and orientation (Kay et al., 2008). However, previous attempts to characterize the topographic properties of visual areas beyond this early stage of visual processing have needed to include categorical or semantic information about the images (Kriegeskorte et al., 2008; Naselaris et al., 2009). The key result from our study is that, even within these higher level regions, patterns of activity are parametrically related to low-level image properties. This applies even in between-category comparisons, which suggests that such image properties play a fundamental role in organizing the topography of the ventral stream.
Regional variation was evident in the correlation between image properties of different object categories and fMRI responses across different anatomical regions. Stronger correlations were found in posterior and inferior regions of the ventral visual pathway. Regional variation was also evident when we compared fMRI responses and image properties for different exemplars of one object category (faces). For example, the fusiform gyrus, a region associated with face processing, showed a significant positive correlation between image properties and fMRI responses even when the within-exemplar comparisons were removed from the analysis. This pattern of results was not evident in other anatomical masks, such as the parahippocampal gyrus. These findings are consistent with previous work that showed that patterns of response across the ventral visual pathway are not equipotential in the ability to discriminate different object categories (Spiridon and Kanwisher et al., 2002). Rather, the ability of the pattern to discriminate a specific object category is higher in regions that are more responsive to that category.
Our results show that the patterns of fMRI response generalize across participants. Using a modified cross-validation analysis (Haxby et al., 2001), we compared the pattern of response in one participant with the pattern from a group analysis in which that participant was left out (Shinkareva et al., 2008; Poldrack et al., 2009; Haxby et al., 2011). This LOPO approach showed that the topographic patterns of response to different object categories were consistent across individuals. These observations are significant in that they suggest that our findings reflect the operation of consistent, large-scale topographical organizing principles, rather than an arbitrarily distributed representation in each individual.
In conclusion, previous neuroimaging studies have revealed strong selectivity for object categories, such as faces, in the human visual system. However, it has never been clear whether this regional selectivity is driven solely by tunings to discrete object categories or whether it reflects tunings for continuous low-level features that are common to images from a particular category. Here, we show a clear link between patterns of response in higher level visual cortex and the image statistics characteristic of each category that cannot be explained solely in terms of discrete categorical organization.
Footnotes
This work was supported by a grant from the Wellcome Trust (WT087720MA). We thank Bryony James for her help on the early stages of this project. We also thank Alex Wade, Andy Young, Tony Morland, and two anonymous reviewers for helpful comments on this manuscript.
The authors declare to financial competing interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Timothy J. Andrews, Department of Psychology and York Neuroimaging Centre, University of York, York YO10 5DD, UK. timothy.andrews{at}york.ac.uk
This article is freely available online through the J Neurosci Author Open Choice option.