The visual system generates our perception of the world's three spatial dimensions using information from two-dimensional (2D) retinal images—a task that is essential for recognizing and interacting with objects and our environment. Computationally, this feat could be achieved through a series of intermediate representations that involve the extraction of 2D features (a primal sketch), the processing of depth cues, and finally an object-centered (view-invariant) 3D model (Marr, 1982). How does the functional organization of the visual system facilitate such computations? The 2D organization of the retinal input is most clearly preserved in occipital cortex, where retinotopically organized visual areas display a spatial correspondence between the visual field and its neural representation on the cortical surface. In contrast, cortical regions further along the dorsal and ventral pathways of the primate visual system transform visual information into egocentric and view-invariant reference frames that support action guidance and object recognition, respectively (Orban et al., 2014).
Spatial coding of the world affects nearly all stages of visual processing. Even in the ventral visual stream, where neural responses in localized regions show selectivity for specific visual categories of objects such as faces, bodies, and scenes (Lafer-Sousa and Conway, 2013), there is evidence of coarse retinotopic organization (for review, see Orban et al., 2014). Furthermore, the relationship between neural representations of an object's position in the visual field and its identity may be shaped by the natural statistics of visual experience, such as the tendency for primates to direct their gaze toward faces, and for scenes to occupy the entire visual field. Depth information, which is applied in different ways to foveal and peripheral stimuli, also shapes important aspects of our cognition, such as our attention to objects in the foreground or perceptual completion of occluded objects. While sensitivity to visual depth cues such as binocular disparity has been observed throughout the primate visual system, it remains unclear how, when, and where visual depth information contributes to object representations in the ventral stream.
A recent study by Verhoef, Bohon, et al. (2015) investigated the relationship between the processing of depth information and other visual features in inferior temporal (IT) cortex. The authors used fMRI to localize regions in the macaque visual cortex that are responsive to binocular disparity—a source of depth information that arises as a result of the horizontal separation of the two eyes. The disparity stimuli contained “checkerboards” with individual tiles appearing randomly staggered in depth relative to the fixation plane (both near and far, flat, near only, and far only; see Fig. 1A). The stimuli were similar to those used in a previous macaque fMRI study that reported strong activations of dorsal visual areas V3A and the caudal intraparietal area, but a conspicuous absence of activation in regions of the ventral stream (Tsao et al., 2003). That result was somewhat surprising given prior neurophysiological evidence of disparity-tuned neurons in IT (Janssen et al., 2000), and could have reflected either poor fMRI measurement sensitivity, homogeneous distribution of disparity selectivity, or specificity for more complex disparity-defined shapes in IT. Verhoef, Bohon, et al. (2015) therefore specifically targeted IT using methods for improved signal sensitivity in this region.
A, Visual stimulus contrasts used to obtain fMRI maps of feature-related activity. For disparity stimuli, schematic diagrams illustrate the depth structure that was present in the random dot stereograms that were presented. B, A simplified schematic summary of regions along the lateral surface of macaque IT that were responsive for the stimulus contrasts listed in A. Face-selective patches (orange) and anatomical subdivisions are labeled (PL, posterior lateral; ML, middle lateral; MF, middle fundus; AL, anterior lateral; AF, anterior fundus; AM, anterior medial; FST, fundus of the STS).
The results revealed that, in addition to the expected activity in early visual areas, three discrete patches within IT responded more strongly to stimuli with disparity content (where the checkerboard tiles appeared staggered in depth relative to one another) than to those containing no disparity (which appeared as a flat surface). One patch was located anterior to ventral V4 on the inferior temporal gyrus and two patches were in the fundus and ventral bank of the superior temporal sulcus (STS). Four additional regions that were biased toward near disparities (where all tiles appeared closer to the observer than the fixation plane) compared with far disparities were also observed on the crest of the STS, while no voxels showed significant biases for far disparities over near (Fig. 1B).
The presence of multiple disparity-sensitive regions distributed throughout IT bears some resemblance to the organization of patches of cortex that are sensitive to faces and color in IT. Might there be a connection between the observed biases for near disparity and the link between category selectivity and retinal eccentricity previously mentioned? To examine this prospect, the authors compared the locations of disparity-biased regions to those that responded preferentially to color, faces, or scenes, and different retinotopic eccentricities, which had been previously measured in the same animals (Lafer-Sousa and Conway, 2013). The results revealed that all four identified face-selective regions showed biases toward near disparities, as well as toward the central visual field. This suggests that disparity information might contribute to face representations in IT, and the authors hypothesized that the bias for near disparities might reflect the tendency for primates to fixate on the eyes of others, bringing other facial features slightly closer to the observer than the point of fixation. In contrast to face-selective regions, scene-selective regions showed biases toward disparity stimuli containing mixed near and far disparities over those with no disparity, and the more ventral disparity-biased region showed a bias toward scenes.
Responses to different visual features, such as color and faces, have previously been observed in discrete, non-overlapping patches of IT cortex. Similarly, the results of Verhoef, Bohon, et al. (2015) revealed virtually no overlap between disparity-biased and color-biased regions. Such examples of independent (“modular”) processing of visual features typically raise the question of how these properties remain perceptually coupled to the objects from which they arise. The preservation of a common reference frame (e.g., retinotopy) in cortical regions that process specific features might play an important role in solving this feature-binding problem by facilitating spatially specific recursive feedback to earlier visual areas. More specifically however, a previous study reported that for each of four face-selective regions along the macaque STS there is an adjacent color-biased region located ventrolaterally (Lafer-Sousa and Conway, 2013). The four regions shown by Verhoef, Bohon, et al. (2015) to be biased toward near disparities showed a similar distribution along IT in the caudal–rostral axis, approximately consistent with IT's anatomical subdivisions (Kravitz et al., 2013), raising the possibility that these correspond to distinct functional processing stages. In support of this speculation, Verhoef, Bohon, et al. (2015) noted that the degree of selectivity across these regions (for near disparities, faces, and color) followed an approximate gradient along the posterior–anterior axis of the lateral surface. While this is consistent with the classical hierarchical feedforward model of object processing, it also fits with refined models that propose a more parallel and recursive network structure based on convergent evidence from anatomical connectivity and functional studies in macaque IT (Kravitz et al., 2013).
The design of any experiment constrains the conclusions that can be drawn from it, and there are three such limitations that are important to note. First, unlike a previous study using similar disparity stimuli (Tsao et al., 2003), the present study did not include control experiments to test whether the disparity-biased regions observed were actually responding to disparity content and not the checkerboard texture and edges (which were only present in the disparity stimuli that contained depth and not the flat stimuli). Evidence from both macaques and humans suggests that some cortical areas may respond to depth configurations defined by various depth cues, as well as 2D textures (Liu et al., 2004; Orban, 2011). Second, as Verhoef, Bohon, et al. (2015) themselves note, several features of the stimuli used in their study are likely suboptimal for driving responses in object-selective cortical areas, including the frontoplanar geometry, the relatively large stimulus size (28 × 21°), and the rectangular 2D contour. Neurophysiology evidence suggests that neurons in the lower bank of the STS that are selective for curved and slanted disparity-defined surfaces are also modulated by 2D contour (Janssen et al., 2000; Liu et al., 2004; Yamane et al., 2008), and IT neurons also reflect active perceptual discrimination of finer disparities than those used here (Uka et al., 2005). However, the stimuli used by Verhoef, Bohon, et al. (2015) were specifically designed to test for disparity responses in the absence of volumetric depth or surface shape. The relationship between 3D surface shape and object category representations in IT presents an avenue for future research. One final limitation of the study was that vergence eye movements, which are strongly driven by binocular disparity, were not controlled for. Recent fMRI evidence suggests that dorsomedial areas of the macaque STS respond to vergence eye movements (Ward et al., 2015). While this raises the possibility that the more posterior and dorsal disparity-biased region reported by Verhoef, Bohon, et al. (2015) might be related to changes in vergence, it also suggests that the other disparity-biased regions observed are unlikely to have resulted from eye movements.
Our understanding of neural processing of depth information, and the functional organization of the primate brain in general, have benefited greatly from neurophysiological recordings in macaques and noninvasive functional imaging in humans. However, to bridge the gap between these two sources of evidence, fMRI mapping of the macroscopic functional organization of the macaque brain is essential, and can serve to guide physiology and facilitate comparative study (Vanduffel et al., 2014). Recent work has demonstrated that high-field-strength fMRI even has the power to resolve the clustered topography of disparity-tuned populations within an area of human extrastriate cortex (Goncalves et al., 2015) that was not previously identifiable (Tsao et al., 2003). The study by Verhoef, Bohon, et al. (2015) adds to our understanding of functional topography in the ventral visual processing network and provides an important foundation for future mapping of its functional macrostructure at finer scales.
Footnotes
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
A.P.M. is funded by the National Institutes of Health intramural program. I thank David Leopold for helpful discussion and comments. The authors of the original study would like to note that the first and second authors contributed equally to that work.
- Correspondence should be addressed to Aidan P. Murphy, Room B1C60, 49 Convent Drive, Bethesda, MD 20832. murphyap{at}mail.nih.gov