The past decade has seen a dramatic increase in our knowledge of the neural basis of stereopsis. New cortical areas have been found to represent binocular disparities, new representations of disparity information (e.g., relative disparity signals) have been uncovered, the first topographic maps of disparity have been measured, and the first causal links between neural activity and depth perception have been established. Equally exciting is the finding that training and experience affects how signals are channeled through different brain areas, a flexibility that may be crucial for learning, plasticity, and recovery of function. The collective efforts of several laboratories have established stereo vision as one of the most productive model systems for elucidating the neural basis of perception. Much remains to be learned about how the disparity signals that are initially encoded in primary visual cortex are routed to and processed by extrastriate areas to mediate the diverse capacities of three-dimensional vision that enhance our daily experience of the world.
Linking psychology and physiology in binocular vision
2007 is the 150th anniversary of the birth of Charles Sherrington, who spent most of his life as a physiologist studying the motor system. At one critical point, Sherrington engaged the study of vision through an experiment to probe how the brain puts together information from the left and right eyes. His experiments compared the effects of flickering lights presented to left and right eyes, when those lights were flickering in-phase (synchronously) in the two eyes and when they were out-of-phase (desynchronously) (Sherrington, 1902, 1904). Sherrington found that there was only a slight improvement in the detectability of binocular flicker when it was presented in-phase compared with out-of-phase.
He found this result surprising and somewhat disappointing. The idea had been that he could probe central mechanisms of physiology after the stages of input from each eye alone. In finding that there was only a slight difference between the two conditions, he concluded that each eye was independently capable of generating a fully elaborated sensation of flicker and brightness without input from the other eye. This led him to conclude that there was little of interest to physiologists in the binocular combination beyond what each eye could do alone, the “synthesis [lying] obviously more within the province of study of the psychologist than of the physiologist” (Sherrington, 1904).
Although Sherrington never returned to the study of binocular phenomena, his goals and aims have been the source of inspiration for many others. Perhaps most notably in modern times, Julesz took up the same themes with his identification of a central location at which purely binocular phenomena could be studied (Julesz, 1971). With his perfection of the random-dot stereogram (RDS), he was able to generate figures that were visible only after binocular fusion, and he then embarked on a large set of studies that investigated the phenomena of binocular vision after the stage of fusion of information from left and right eyes.
This overview of the current state of binocular vision is motivated by many of the same concerns, but one aim the four authors have in common is to bridge the gap between physiology and psychology that Sherrington found to be so challenging. We are primarily concerned with identifying the paths that are important for the conscious perception of visual depth. Equally, we are concerned to find out those areas of the brain that are involved in preliminary processing before the emergence of fully developed sensations of binocular depth and binocular shape and form. This chapter is a review of a symposium presented at the Annual Meeting of Society for Neuroscience.
Creating depth perception in the brain
Binocular inputs are used by the human visual system to judge object depth in the three-dimensional (3D) world. This depth percept is created by the integration of two views of the world received by the two eyes. Stimuli nearer or farther from the fixation point produce disparities between the left and right eyes, which are dominated by a horizontal shift in the position of a feature in one eye with respect to the other. In the pathway from retina to thalamus to cortex, input from each eye remains segregated in the thalamus [the lateral geniculate nucleus (LGN)] and is first combined in primary visual cortex (V1).
The physiological and anatomical pathways responsible for analyzing disparity have been steadily revealed over the past few decades. It has been known for some time that the primary visual pathway into V1 is an important stage of binocular combination (Holmes, 1945; Hubel and Wiesel, 1962; Barlow et al., 1967). In V1, disparity-selective cells have been characterized by their response to bars presented in different locations in the two eyes [“tuned excitatory,” “tuned inhibitory,” “near,” and “far” cells (Poggio and Fischer, 1977)], response to differential phase of sinusoidal gratings (Ohzawa and Freeman, 1986), and response to absolute disparity of random-dot stereograms (Cumming and Parker, 1997).
Absolute versus relative
Early studies (Poggio and Fischer, 1977; Bishop and Pettigrew, 1986) described the tunings of V1 neurons in terms of absolute disparity. However, depth perception is clearly dominated by relative disparity. That is, if the background disparity moves forward, a relative disparity tuned cell will shift its tuning forward; if the background moves backward, a relative disparity tuned cell will shift its tuning backward. In contrast, the response of an absolute disparity cell would remain unchanged with changing background disparities.
Much of the excitement in the study of binocular mechanisms beyond V1 has been driven by the search for mechanisms sensitive to the relative depth between pairs of visual features. Suppose that the eyes look toward a pair of visible points at different distances from the observer: the disparity of the points can be specified in two ways. Consider first the projection of one of the points onto the retina: the angle between the projection line of the point and line of sight of the fovea is different for left and right eyes (Fig. 1). The absolute disparity of a point is defined as the difference in the angles subtended in the left and right eyes. This corresponds to the differences in the distance between each eye's fovea and the retinal image of the point. The relative disparity between two points is the difference between their absolute disparities, which eliminates the role of the fovea as a reference point. In other words, the relative disparity is the difference between the angles subtended by the two points viewed from the positions of the left and right eyes (Fig. 1).
The interest in how the brain processes relative disparity is driven primarily by the observation that human binocular depth judgments rely much more on relative disparity (Westheimer, 1979). If we are to discover brain sites at which the neuronal signals support perceptual processes, identification of sites that process relative disparity is an important component of this general aim. Furthermore, there is a well founded computational theory of how to exploit the sources of this information in the visual inputs (Koenderink and Van Doorn, 1976). The more recent focus in visual neuroscience has been to identify and characterize the responses of neurons and brain areas responsible for processing this information and to develop neuron-based models of the visual computations required.
The energy model
What principles might determine the neural connections that lead to relative disparity? A mechanism selective for relative disparity must have different responses to a particular depth, depending on the context in which that depth is presented. This requirement can be fulfilled by adopting the energy model, as originally advanced for the detection of absolute disparity by cat V1 neurons (Ohzawa et al., 1990) or visual motion. The economy of this proposal is attractive: the energy model implies a certain organization of the cortical circuitry to implement it, so the same pattern of cortical circuitry that performs the useful work of detecting absolute disparity in the cortical area V1 can be exploited again at a second level (V2 or elsewhere) to extract relative disparity.
The essence of the energy model is that, if we have a single quantity L that depends linearly on two variables (x, y), then the energy model computes H = [L(x + y)]2, with the result that H is sensitive to the interaction between x and y [i.e., L(x)(y)]. For the motion energy model, x and y represent spatial position and time, whereas, for the original disparity energy model, x and y are horizontal image coordinates in the left and right eyes. Sensitivity to the interaction of x and y is optimized by processing through linear filters that are in a quadrature phase relationship with each other. Within the bandpass of the linear filter, all possible spatial distributions of the input are efficiently detected.
The top panel of Figure 2 uses this scheme for computing relative disparity in V2 (Thomas et al., 2002). Pairs of matched receptive fields, each sensitive to absolute disparity, summate onto a complex cell, from which an estimate of the monocular inputs (M) is subtracted, before a squaring output nonlinearity. Relative disparity (R) is computed from the outputs of two disparity energy models (Ohzawa et al., 1990; Read et al., 2002), each of these individual models being sensitive to absolute disparity. Although these interactions are shown in stages, they could all be generated in a single step (Archie and Mel, 2000), as has been proposed for the organization of complex-cell receptive fields in V1.
The spatial structure underlying the model is shown in the bottom panel. This shows the interactions arising from two pairs of neurons sensitive to absolute disparity: one pair is even symmetric, and the other is odd symmetric. Summation of these two responses gives relative disparity, a constant difference between two absolute disparities indicated by the small white line on the far right plot. The interactions provided by the energy model provide for a consistent selectivity to relative disparity over a range of different absolute disparities.
Roles for extrastriate cortex: dorsal and ventral streams
Although the basic receptive field mechanisms for coding disparity in the early stages of visual cortex have been established, we now realize that this is only the beginning as far as the elaboration of the perceptual side of binocular depth is concerned (Cumming and DeAngelis, 2001; Parker, 2007). For example, neurons in V1 that are selective for binocular disparity also show selective binocular interactions for stimuli that fail to lead to a perception of binocular depth (Cumming and Parker, 1997). Neurons in this area are activated by simple forms of stimulus correspondence between features on the left and the right retinas: they respond reliably to binocular features, independently of the perceptual context in which those features appear (Cumming and Parker, 2000).
It has been suspected for a long time that the secondary visual area, V2, has a major influence on the processing of binocular disparity (Hubel and Wiesel, 1970). V2 neurons have been variously described as “obligatory binocular” (Hubel and Wiesel, 1970), selective for disparity-defined contours (Bakin et al., 2000; Qiu and von der Heydt, 2005), and tuned for relative disparity (Thomas et al., 2002). Interestingly, lesions of V2 are almost as disruptive as lesions of V1 in the macaque monkey when the animal is tested with a binocular depth task (Cowey and Wilkinson, 1991). V2 is of course a gateway to many areas of the extrastriate visual cortex (Shipp and Zeki, 1985; Rockland, 1995). From there, signals pass through to the dorsal areas, of which V5/middle temporal area (MT) has been most widely studied for its role in binocular depth perception (Maunsell and Van Essen, 1983; DeAngelis et al., 1998). Signals from V2 also form important parts of the ventral visual pathway passing via V4 (Ghose and Tso, 1997; Watanabe et al., 2002) and to the inferotemporal areas (Janssen et al., 1999; Uka et al., 2000; Tanaka et al., 2001).
Current evidence suggests that these two streams of visual processing may have distinct roles in binocular depth perception, as they do in several other aspects of visual perception (Neri et al., 2004; Parker, 2007). For example, the neurons in the dorsal visual pathway respond to the disparity of extended visual surfaces in V5/MT (Nguyenkim and DeAngelis, 2003), MST (medial superior temporal) (Eifuku and Wurtz, 1999), and parietal (Shikata et al., 1996) areas and rarely show responses to relative depth suitable for the analysis of 3D shape (Uka and DeAngelis, 2006). In contrast, in the ventral visual pathway, there are neurons specifically sensitive to the shape of three-dimensional stimuli defined by binocular disparity (Janssen et al., 2000). The binocularly anticorrelated signals that activate V1 neurons (Cumming and Parker, 1997) also activate many neurons in the dorsal visual pathway (Takemura et al., 2001; Krug et al., 2004), thus dissociating neuronal activation of dorsal structures from the perception of binocular depth. In the ventral visual pathways, these signals tend to be eliminated (Janssen et al., 2003; Tanabe et al., 2004), suggesting a closer link between ventral areas and the perceptual experience of depth. It is worth noting that disparity-selective neurons have also been reported in portions of parietal and frontal cortex that are thought to be involved in visuomotor control, including areas CIP (caudal intraparietal) (Sakata et al., 2005), LIP (lateral intraparietal) (Gnadt and Mays, 1995; Genovesio and Ferraina, 2004), VIP (ventral intraparietal) (Colby et al., 1993), and FEF (frontal eye field) (Ferraina et al., 2000). The respective roles of these areas in depth perception remain essentially unknown.
As a broad generalization, dorsal areas may be predominantly involved in processing extended visual surfaces and resolving depth structure during self-movement, whereas ventral visual areas process relative disparity to support the analysis of the three-dimensional shape of objects. The differences between the neural connectivity within the dorsal and ventral streams that bring about these different properties would arise primarily from differences in the way in which these streams read out the topographic maps generated at earlier stages in V1. A dorsal stream neuron sensitive to the tilt or slant of a surface would pool inputs from spatially broad, overlapping sets of receptive fields in V1, whereas a neuron in the ventral stream might gain inputs from spatially narrow, neighboring receptive fields to give sensitivity to relative disparity with a center-surround spatial organization. Exactly how developmental processes set up these different selectivities in different streams is at present unknown.
Anatomical pathways for disparity processing
One of the major concepts to emerge from studies of the mammalian cortex is that it is organized into an array of relatively discrete cortical areas (Fig. 3). This organization has been most thoroughly studied in monkey visual cortex, in which well over 30 different visual areas have been distinguished based on a combination of histological markers, retinotopic maps, neuronal response properties, anatomical connections, and behavioral effects of lesions or microstimulation. The pathways interconnecting these areas are numerous (>300) but orderly and, in fact, allow the areas to be organized into a well behaved hierarchy that has done much to shape how visual neuroscientists think about visual processing (Felleman and Van Essen, 1991). One of the most striking features of the connections that have been revealed is that each visual area receives inputs from many other areas and, in turn, sends its outputs to more than one area. What is the purpose of so much interconnectivity? What does a given area “tell” another area? And why is this useful for the function of seeing?
One possible answer, implicit in the motivation for this chapter, is that it allows for a certain “computational flexibility” in the way that basic visual information is combined for various specific purposes. Visual primitives (e.g., orientation of contours, color of surfaces, velocity of moving objects, and binocular disparity) represented in early visual areas can then be recombined toward greater, more complex ends such as identifying a three-dimensional object based on its shape. In many cases, the same basic information, such as binocular disparity, will be integrated with other modalities (e.g., motion) to construct ultimately different kinds of representations.
One example of such selective integration is that of motion and binocular disparity in the dorsal stream area MT. There are two principal routes of cortical input to MT, known as the “direct” and “indirect” pathways. Both of these pathways have their cortical origins in layer 4B of V1. Is the information carried by these pathways redundant? Recent studies have made it clear that the two pathways are at least anatomically distinct. The 4B neurons project to either MT or the thick stripes of V2, with only a small minority (∼5%) projecting to both areas (Sincich and Horton, 2003). They also have different morphologies (Levitt et al., 1994) and distinct patterns of inputs, with the direct pathway receiving projections solely from the magnocellular stream and the indirect receiving a mixture of magnocellular and parvocellular inputs (Yabuta et al., 2001; Nassi and Callaway, 2006) (Fig. 4).
Another reason to believe that the two different types of information might arrive by different routes is that, whereas nearly all of the velocity tuning properties of MT cells can be accounted for by those of its V1 inputs (Pack et al., 2003, 2006; Churchland et al., 2005), the disparity tuning measured in V1 does not appear to account for that found in MT. The first major difference is that most MT neurons have odd-symmetric disparity tuning curves (DeAngelis and Uka, 2003), whereas most disparity-selective V1 neurons have even-symmetric curves. In addition, many MT neurons respond to disparities that are much larger than those discovered so far in V1 (DeAngelis and Uka, 2003). As pointed out previously (Cumming and DeAngelis, 2001), this suggests not only that the binocular disparity tuning in MT cannot be directly inherited from its V1 disparity-selective inputs, but that it cannot even be constructed from them and thus must be computed de novo, either within MT or in some other area that also projects to MT. The high prevalence of disparity-tuned neurons within the thick stripes of V2 (Hubel and Livingstone, 1987; Peterhans and von der Heydt, 1993; Roe and Ts'o, 1995), the largest source of indirect cortical input to MT (Maunsell and Van Essen, 1983), and V3 (Felleman and Van Essen, 1987) makes the indirect pathway a likely suspect.
The major relay stations of the indirect pathway to MT, V2, and V3 are located within the lunate sulcus on approximately opposite banks and some distance removed from V1. This arrangement allowed the Born laboratory to ask whether the two different input pathways are redundant or whether they carry different kinds of information. This was done by reversibly inactivating one set of inputs, those from the indirect pathways involving V2 and V3, while recording from MT neurons in alert monkeys. Transiently eliminating the V2/V3 input had a modest effect on the stimulus-evoked activity of MT neurons. In general, the reduction in visually evoked activity was approximately the same for all directions of stimulus motion, resulting in scaled-down direction tuning curves whose preferred direction and tuning bandwidth were rarely significantly affected. The same neurons showed much more dramatic changes in their tuning for binocular disparity, often losing significant tuning during cooling. Even those that remained significantly tuned to binocular disparity typically became much less modulated, resulting in a much greater change in the disparity discrimination index when compared with similar indices for direction and speed (Ponce et al., 2006). Moreover, the modest gain reduction measured from direction tuning curves cannot account for the loss of disparity tuning in the majority of neurons. It thus appears that at least one role of the indirect pathways is to endow MT neurons with sensitivity to binocular disparity. What perceptual role this integration might serve will be discussed below.
Areal specialization in stereopsis
This abundance of disparity signals outside V1 suggests that extrastriate areas play important roles in stereopsis. Moreover, several studies indicate that the representation of disparity in V1 cannot account for key aspects of depth perception (Cumming and Parker, 1997, 1999, 2000; Nienborg and Cumming, 2006b). Thus, there seem to be two broad possibilities for why disparity signals are found across such a broad expanse of visual cortex: (1) disparity processing is highly distributed such that most aspects of depth perception depend on simultaneous activation of many regions of cortex, or (2) different cortical areas have specialized representations of binocular disparity that are well suited to some tasks but not others. In the latter scenario, depth perception in a specific context may depend on only a small subset of visual areas or perhaps only on a subset of neurons within a single visual area.
As discussed above with regard to absolute versus relative disparity, for example, there is considerable emerging evidence for specialized representations of disparity among different visual areas. Moreover, it is likely that we only know about a small fraction of the differences between areas at this time. If different cortical areas are specialized to perform different tasks in 3D vision, then it should be possible to identify experimentally the areas and/or neurons that contribute to performance of a particular task. To do this, it is critical to perform experiments in alert, trained animals and to use techniques that allow us to link neural activity to perception. A few key approaches have been quite successful in this arena (Parker and Newsome, 1998).
In a psychophysical task performed around threshold, the same (weak) stimulus will give rise to different perceptual reports, as well as different neural responses, across repeated trials. By testing for a correlation between the trial-to-trial fluctuations in perceptual reports and neural responses [“choice probability” (CP)], it is possible to identify neurons that are functionally coupled to perceptual decisions (Britten et al., 1996). An advantage of this approach is that it affords single-cell resolution and allows one to relate the tuning properties of neurons to behavior. This can potentially allow one to identify a subset of neurons within a cortical area that contribute selectively to perception, as discussed further below. However, CP analysis only establishes a correlation with perception and does not necessarily imply a causal contribution. CP analysis has thus far suggested a link to depth perception for neurons in areas V2, MT, and inferotemporal (IT) (Bradley et al., 1998; Dodd et al., 2001; Nienborg and Cumming 2006b; Uka and DeAngelis 2004, 2006).
By passing weak alternating current through a conventional extracellular electrode, one can activate a small group of neurons near the tip of the electrode. Microstimulation has been used in a variety of perceptual and motor tasks and can establish a causal contribution of neurons to behavior. However, use of microstimulation requires that neurons are clustered according to their stimulus selectivity, such that the current activates a group of neurons with similar selectivity. It is possible that an area may contribute to perception without having clustered neurons and would therefore not be detected by this method. Microstimulation has thus far established a link between area MT and depth perception (DeAngelis et al., 1998; Uka and DeAngelis, 2006).
By locally cooling or applying an appropriate drug (such as the GABAA agonist muscimol), it is possible to temporarily inactivate a region of cortex. These techniques generally affect a region of cortex much larger than a feature-selective column and, thus, lack the stimulus selectivity of microstimulation. However, in retinotopic cortex, inactivation allows one to essentially deactivate all of the neurons that represent a local region of visual space. Thus, it is possible to probe for causal links to perception even if neurons are not locally clustered for the stimulus feature of interest (e.g., disparity).
Currently, methods are evolving for imaging cortical activity in the awake, behaving monkey; these include intrinsic signal methods (Grinvald et al., 1991; Vnek et al., 1999; Siegel et al., 2003), voltage-sensitive dyes (Seidemann et al., 2002), and two-photon methods (Ohki et al., 2006). These approaches, in conjunction with the methods described above, provide an additional population view of cortical activations during stereopsis.
Each of these methods, on its own, is imperfect, but together they provide a powerful set of tools for relating neural activity to behavior. In the future, new optogenetic techniques may provide the ideal combination of single-cell specificity and rapid activation/deactivation of neural tissue (Zhang et al., 2007). Below we describe a selection of studies supporting functional specialization in areas such as V2 and MT. Specialization can be manifest in distinctions of functional organization as well as in the relationships between neuronal activity and behavior.
Specialization in topography of representation
A fundamental feature of brain organization is that neural representations of sensory information are highly ordered. Many studies have suggested the origins of such organization, some developmental, others functional, and yet others evolutionary in nature. Although there exists debate on the degree of organization and its precise layout within a cortical area, the ubiquity of columnar and modular organization cannot be denied (Malach et al., 1993, 1994; Kritzer and Goldman-Rakic, 1995; Tsunoda et al., 2001; Siegel et al., 2003). Functionally, clustered or columnar organization is important because we must think in terms of neural pools to relate neural activity to perceptual decisions (for a discussion of this and related issues, see Parker and Newsome, 1998). Moreover, these pools may be organized along more than single dimensions, and the neurons of a single cortical area may need to be flexibly organized into different pools in response to different perceptual tasks (Krug et al., 2004). Formation of a pool suited to a particular task concerns not just single neuron properties but how the properties of a given neuron are related to those of its neighbors. How different orderly maps are coregistered and what constraints each has on the other has been a topic of intense consideration by both biologists and computational modelers (Goodhill et al., 2002; Swindale, 2004).
Topographic organization in MT
An orderly organization for direction selectivity had already been hinted at by Dubner and Zeki and was subsequently confirmed and elaborated on in 1984 by Albright et al.. This “map” of direction selectivity was formally quite similar to that for orientation selectivity demonstrated previously in striate cortex of cats and monkeys: neurons sharing similar preferred directions were clustered within columns running perpendicular to the cortical surface, and, as one moved parallel to the cortical surface, the preferred direction tended to rotate smoothly, punctuated by occasional jumps in which the preferred direction flipped by 180°. This organization takes place at a relatively fine spatial scale, with the entire range of directions for a given region of the visual field represented over ∼500 μm of cortex.
Fifteen years later, the map for direction selectivity was shown to coexist with a columnar organization for binocular disparity (DeAngelis and Newsome, 1999). As for direction selectivity, neurons with the same preferred binocular disparity were clustered within columns, and there was a relatively orderly progression from near to far preferred disparities along the dimension tangential to the cortical surface (Fig. 5).
How such coregistration of two maps relates to integration of function is not known. However, it was intriguing to find a specific relationship between the two maps. In two different monkeys, there was a significant tendency for preferred direction to change in a counterclockwise direction as the preferred disparity changed from far to near (Fig. 5b). The fact that the two maps are in some kind of register would seem to greatly facilitate the routing of intrinsic connections that would be necessary for certain functional computations (e.g., the cylinder model of Bradley et al., 1998) (see below). A more thorough understanding of this issue awaits experiments in which the intrinsic connectivity can be revealed and superimposed on the local map structure, as has been done for the orientation map in striate cortex.
The orderly alignment of two different feature maps seems like a formidable task. The different inputs arrive via completely different routes and convey different modalities of information, and they must integrate in specific combinations but also maintain spatial register (i.e., the receptive fields of the different inputs must have the same size and retinotopic location). It is thus tempting to speculate that a final reason for using the convergence of parallel pathways for the integration of direction and binocular disparity is to allow for an earlier stage at which single-parameter maps are formed.
Topography for disparity in V2
From this perspective, the recent findings from the Roe laboratory (Chen et al., 2006) indicating a columnar organization for binocular disparity within the thick stripes of V2 are particularly intriguing. Despite the number of studies on disparity responses in the visual cortex, there has been little description of any systematic representation of disparity in either V1 or V2 of the monkey (cf. Burkitt and Ts'o, 1998; in the cat, LeVay and Voigt, 1988; Kara, 2006). At issue are questions of whether disparity-selective responses in V2 are organized in any manner and whether there is any systematic relationship of such responses to maps for orientation that exist in V2. Furthermore, relative disparity selectivity (rather than absolute disparity selectivity) has been reported to have greater prevalence in V2 than V1 (Cumming and Parker, 1999; Thomas et al., 2002). What is the implication of this V1/V2 hierarchy on functional organization? Answers to these issues have significant bearing on the origins of disparity maps in area MT, which is a primary target of V2 thick stripes.
Two recent studies have indicated the presence of topographic organization for disparity in V1 and V2. Using two-photon calcium imaging, Kara (2006) investigated whether cat visual cortex (area 17) contains any organization of responses to disparity induced by phase-offset gratings (cf. Ohzawa and Freeman, 1986a,b). These images revealed beautiful pinwheels of disparity response in cat V1, similar in appearance to those of orientation pinwheels. In monkey visual cortex, within single vertical penetrations disparity preferences of single units are similar, suggesting the presence of a columnar organization for disparity within V2 thick stripes (Ts'o et al., 2001; Chen et al., 2006). In fact, intrinsic signal imaging methods have revealed clearly organized maps for disparity response in V2 (Chen et al., 2006). In this study, RDSs (Julesz, 1971), which produce the percept of surfaces at different depths, ranging from −0.5° near to +0.5° far, activated disparity-specific domains that shifted in topographic position with depth (Fig. 6). Consistent with previous electrophysiological and imaging studies (Hubel and Wiesel, 1970; Peterhans and von der Heydt, 1993; Bakin et al., 2000; Ts'o et al., 2001), these maps were found only in the thick stripes of V2. Anticorrelated random dot stereograms (ARDSs) produced no such maps, suggesting that these responses were related to the depth percept. Characterization of single units in these domains revealed that cells in V2 exhibited tuning for specific horizontal disparities in correlated RDSs and untuned responses to ARDSs and binocularly uncorrelated stereograms. No such topographies were observed in V1. Thus, these data find parallels with topographic hue representation within thin stripes of V2 (Xiao et al., 2003) and suggest a possible common architecture across different stripes in V2.
How, then, do disparity maps relate to orientation in V2? As in MT, we approach the question of how different parameters are corepresented within the same neurons and/or the same local region of cortex. It is well established that disparity-tuned neurons are commonly tuned for orientation. It is also known that thick stripes in V2 contain maps for orientation. However, unlike V1, in which there have been a number of studies examining relationships between different functional maps (Goodhill and Cimponeriu, 2002; Swindale, 2004), there is little data to address this question in V2. Are we better at distinguishing disparities at certain orientations over others (cf. Coppola et al., 1998)? Is there a full range of orientations represented at each disparity and is there a full range of disparities represented at each orientation? In one study, Chen et al. (2006) found that disparity-activated regions contained approximately equal representation for different orientations and that on average each orientation domain overlay a range of disparity domains (cf. Burkitt et al., 1998). Furthermore, at least within the range of disparities tested, there did not appear to be more cortex devoted to some disparities over others or some orientations over others. Although preliminary, these data suggest an orthogonality between orientation and disparity parameters within V2 thick stripes.
In the ventral pathway, disparity-selective responses (e.g., in V4, von der Heydt et al., 2000; Watanabe et al., 2002; IT, Janssen et al., 1999) are thought to contribute to 3D shape perception. Because V2 provides inputs to dorsal and ventral pathways, it sits in a pivotal position. How then are its disparity-selective responses channeled? Anatomically, V2 thick stripes project heavily to MT in the dorsal pathway, and the thin and pale stripes project primarily to areas such as V4 in the ventral pathway (e.g., Shipp and Zeki, 1985). However, prominent intra-V2 connections (Levitt et al., 1994; Malach et al., 1994), which tend to link different functional stripe types, leave open the possibility of disparity signals entering either pathway.
So are disparity responses in V2 tuned preferentially to absolute disparity, similar to observations in MT/dorsal pathway (Uka and DeAngelis, 2006), or to relative disparity, similar to responses in V4/ventral pathway (Umeda et al., 2007)? Using random-dot stereograms and a disparity clamp that permitted stepping absolute disparities without changing any relative disparities within the stimulus, Cumming and Parker (1999) demonstrated that V1 neurons are tuned for absolute disparity. Approximately 80% of their V1 disparity-tuned population exhibited absolute disparity tuning. In contrast, many more V2 neurons exhibit relative disparity tuning (preferences shift in the direction of the background disparity) (von der Heydt et al., 2000; Thomas et al., 2002). Clearly, the influence of surround disparity on center disparity response is greater in V2 than in V1. However, even in V2, very few neurons could be considered tuned for true relative disparity (i.e., had shifts equal to the background shift). Although topographic maps for true relative disparity are not expected in V2, is there evidence for some influence by shifts in background? Preliminary optical imaging evidence suggests this to be so (A. W. Roe, unpublished data). Optical imaging of thick stripes in V2 with a fixed center patch disparity and different background disparities resulted in predicted decreasing activation values from far to near. These effects were not seen in V1. In other words, responses in V2 thick stripes, but not V1, exhibit some influence by shifts in background in a manner consistent with the presence of some relative disparity tuning. Relative disparity selectivity is likely enhanced at later stages of processing. Whether there is an area of visual cortex in which neurons represent solely relative disparity remains to be seen; alternatively, partial shifts [seen commonly in V4 (Umeda et al., 2007)] may be all that is necessary to decode relative depth but still allow decoding of absolute disparities as well (Neri et al., 2004).
Together, these data suggest that topographic organization for “near” and “far” disparities may be established relatively early in the visual pathway and that such organization may be useful for establishing more refined representations in both the dorsal and ventral pathways.
Specialization in MT function
The hypothesis that disparity representations in different cortical areas are specialized to subserve different aspects of 3D vision makes a few specific predictions for how disparity signals should be related to behavior. Recent work, using some of the techniques described above, has begun to provide strong support for the specialization hypothesis.
Coarse versus fine discrimination
The first prediction is that particular areas and/or pathways should contribute to some stereo vision tasks but not others. This prediction has been examined recently in area MT, in which Uka and DeAngelis have evaluated the role of MT in coarse versus fine depth discrimination. In the coarse depth task, monkeys are trained to discriminate between two coarse absolute disparities (near vs far) in the presence of disparity noise. By titrating the noise (or binocular correlation), monkeys are required to perform the task around psychophysical threshold. In this coarse task, the average MT neuron has sensitivity comparable with that of the animal (Uka and DeAngelis, 2003), MT responses are correlated, trial-by-trial, with perceptual reports of the animal [significant choice probabilities (Uka and DeAngelis, 2004)], and electrical microstimulation systematically biases monkeys' judgments (DeAngelis et al., 1998; Uka and DeAngelis, 2006). Collectively, these findings suggest that MT is important for coarse discrimination of absolute disparities.
In contrast, MT does not appear to contribute to a fine depth discrimination task in which monkeys must report the relative depth between two adjacent stimuli. Threshold performance in this task is achieved by presenting very small differences in relative disparity between two stimuli in a center-surround configuration. In this fine task, the average MT neuron is substantially less sensitive than the monkey (Uka and DeAngelis, 2006), MT neurons do not show consistent choice probabilities (G. C. DeAngelis, unpublished data), and microstimulation does not bias percepts (Uka and DeAngelis, 2006). The selective contribution of MT to coarse, but not fine, depth discrimination may be explained by the fact that MT neurons represent absolute, but not relative, disparities in a center-surround stimulus geometry (Uka and DeAngelis, 2006). These findings support the notion that the perceptual role of disparity signals in a particular visual area is determined by the nature of the disparity representation itself.
Moving surfaces and depth order
Whereas area MT does not appear to contribute to fine judgments of relative depth at a step boundary, another series of studies has demonstrated that MT is involved in signaling the depth order of continuous surfaces defined by conjunctions of disparity and motion. This comparison highlights the notion that a particular area may play different roles in two seemingly related tasks, according to the particular constellation of selectivities exhibited by those neurons. Perceptual grouping by motion is quite powerful, to the extent that surfaces defined by differentially moving textures can be configured so as to create an illusion of depth, a class of illusion referred to as “structure-from-motion,” in which, for example, two coplanar sets of dots moving in opposite directions within a rectangular window create a vivid percept of a three-dimensional rotating cylinder. In this case, the assignment of front and rear surfaces is ambiguous and the illusory percept is accordingly bistable with periodic switches in the apparent direction of rotation of the cylinder.
What sort of neural circuitry might be responsible for such a percept? The first important clue came from single-unit studies in MT. From the first quantitative studies, it was clear that most MT cells were selective for both the direction of motion (Maunsell and Van Essen, 1983) and for binocular disparity (Maunsell and Van Essen, 1983). Subsequent studies in alert monkeys showed that this information was integrated in a rather interesting way. In particular, it was shown that motion opponency, that is, the antagonistic interaction between stimuli moving in the preferred versus null (or “antipreferred”) direction of the neuron, was often gated by binocular disparity (Bradley et al., 1995), such that antipreferred direction motion was most effective at suppressing the response of the neuron to preferred direction motion when it was presented at the preferred binocular disparity of the neuron (Fig. 7). In some cases, the antipreferred motion actually produced facilitation when presented at a different binocular disparity.
Bradley et al. (1998) created a model to account for the segregation of overlapping, moving textured surfaces. The basic idea is that pools of neurons tuned to the same disparity, but opposite directions of motion, inhibit each other, whereas there is opposite-direction facilitation between neuronal pools representing different depths. This would tend to force differentially moving surfaces into different depth planes, even in the absence of cues for binocular disparity as in the rotating cylinder illusion described above. Presumably, random fluctuations in the level of firing within the different pools would be enough to tip the system into one of the stable configurations corresponding to one of the two possible percepts. Although the current model does not account for the perceptual switches, one could imagine that some form of neural fatigue might lead to an alternation of configurations.
The idea that MT plays an important role in this type of perceptual grouping has received strong support from experiments in which neural activity was directly compared with monkeys' judgments of the direction of rotation of such illusory cylinders (Bradley et al., 1998; Dodd et al., 2001). The monkeys were trained to report the direction of motion of the front surface, which, on most trials, was unambiguously indicated by binocular disparity (Fig. 8a). On a subset of the trials, however, no disparity difference was supplied, just as for the cylinder illusion described above. In these cases, there was no correct answer, so the animals were rewarded randomly with a probability of 0.5. Nevertheless, they had to make a choice, and, given that the possible disparity steps (across trials) were small and the monkeys never knew when a trial was ambiguous, the bet is that they honestly reported what they perceived during the illusory motion condition. The striking finding (Fig. 8b) was that the firing rates of MT neurons were often highly predictive of the monkeys' choices for the ambiguous trials, that is, they exhibited high choice probabilities. In fact, the CPs measured by Dodd et al. (2001) for the cylinder task are the highest ever reported for single neurons from MT or any other early sensory area.
The “cylinder model” of Bradley and colleagues thus provides a satisfying connection between the measured integration of direction and disparity by single MT neurons and the perceptual capacity to assign moving surfaces to different depth planes. As detailed above, this integration appears to arise from the convergence of two separate pathways: the direct pathway carrying information about direction of motion and the indirect pathway, via V2, supplying information about binocular disparity.
Dependence on behavioral strategy
The second prediction of the specialization hypothesis is that the coupling of individual neurons to behavior should depend not only on the information that they carry but also on the strategy that the animal adopts to solve the task. In the coarse depth task, Uka and DeAngelis (2004) found that neurons with odd-symmetric disparity tuning curves (near or far disparity preferences) showed significant choice probability effects, whereas neurons with even-symmetric tuning curves (disparity preferences near zero) did not. This was observed despite the fact that the stimulus disparities were optimized to make each neuron most informative. This result was likely obtained because animals were heavily trained to discriminate between near and far disparities that were symmetric about the plane of fixation (zero disparity). Thus, the animals may have learned to selectively monitor the activity of neurons with odd-symmetric tuning, which were most informative during training. This finding suggests that the functional link between single neurons and perception is shaped by the strategy than an animal adopts for “reading out” sensory signals to solve the task.
Even stronger support for this notion comes from a recent study by Nienborg and Cumming (2006a). In the context of the coarse depth discrimination task described above, they used psychophysical reverse correlation to determine that monkeys placed more weight on near disparities than far disparities when attempting to report the depth of binocularly uncorrelated stimuli. Interestingly, while recording from V2 neurons during this task, they found that near-preferring neurons tended to show significant choice probability effects whereas far-preferring neurons did not.
In both of these choice probability studies, a suboptimal behavioral strategy of the monkey was reflected in the pattern of choice probabilities across a population of neurons suggesting that these experiments are able to decipher which neurons within an area are selectivity monitored by the animal to perform a task.
Learning and plasticity
A third prediction of the specialization hypothesis is that learning new tasks will alter how disparity signals are read out from different areas to drive perceptual decisions. Preliminary evidence in support of this prediction comes from reversible inactivation experiments performed by the DeAngelis laboratory in MT. When muscimol was injected into area MT, performance of the coarse depth task was severely impaired, as was performance of a conventional direction discrimination task (Britten et al., 1992). Effects were localized to the region of visual space represented by neurons at the injection site. These experiments were repeated after the animals were trained to perform the fine depth discrimination task, which relies on relative disparities. After fine depth training, muscimol injection continued to devastate direction discrimination performance but no longer produced a significant effect on coarse depth discrimination (nor fine depth discrimination). This finding likely reflects a change in how disparity signals are read out from MT because the disparity tuning of individual MT neurons was unchanged by fine depth training. A likely explanation is that fine depth training recruits areas [perhaps V4 (Umeda et al., 2007)] that represent relative disparities, and that these areas can support coarse depth discrimination when MT is inactivated. One important implication of this finding is that learning can act selectively on the transmission of sensory signals to decision circuitry without modifying the sensory representation itself. Another implication is that it may be crucial to control for order effects when animals are trained to perform multiple tasks, because training on a second task may alter the neural substrates that mediate performance of a previously learned task.
Although Sherrington concluded that an understanding of binocular depth perception was beyond the province of the neurobiologist, the last decade has seen a dramatic increase in our knowledge of the neural basis of stereopsis. Some important principles have emerged from these studies. We now know that, to achieve fully developed sensations of binocular depth, it takes many stages of preliminary processing and that different aspects of binocular function are channeled into different pathways. Areas of the dorsal pathway, such as area MT, are invested in combining disparity signals with motion signals for calculating, for example, structure from motion or for producing signals that guide self-motion. In the ventral pathway, areas such as V2, V4, and IT, relative disparity signals are combined in multiple stages to achieve perception of three-dimensional object shape. Furthermore, and perhaps most exciting, is the finding that there is flexibility in how information is channeled through the dorsal and ventral pathways. Indeed, how training and experience affects cortical processing of disparity signals is a topic ripe for future exploration.
This work was supported by National Institutes of Health Grants EY11744 (A.W.R.), EY11379 and EY12196 (R.T.B.), and EY013644 (G.C.D.) and Wellcome Trust and Royal Society Wolfson Research Merit Award (A.J.P.). We also thank Carlos Ponce, Dr. Steve Lomber, Andrew Zaharia, and Gang Chen for their contributions to this work.
- Correspondence should be addressed to Anna W. Roe, Department of Psychology, 301 Wilson Hall, Vanderbilt University, 111 21st Avenue South, Nashville, TN 37203.