The visual system uses information about the relative depth of contours and surfaces to link and segment elements of visual scenes. The integration of form and depth information was studied in areas V1 and V2 of the alert macaque. Neurons in area V2 used contextual depth information to integrate occluded contours, signal the presence of object boundaries, and segment surfaces: (1) Amodal contour completion occurs when a contour passes behind an occluder. The basis of contour completion, the facilitation of neuronal responses to stimuli located within their receptive fields (RFs) by contextual lines lying outside their RFs, was blocked by orthogonal lines intersecting the contours but was recovered when the orthogonal line was placed in the near depth plane. (2) An illusory contour will modally complete separated elements located across an isoluminant field if the elements are placed in the near depth plane. V2 neurons responded when line segments were placed outside the RF in the near depth plane and a field of uniform luminance covered the RF. (3) Texture elements within a surface will “capture” the perceived depth consistent with the disparity of the surface's boundary, even when given no disparity themselves. V2 neurons responded to the center elements of a grating as if they contained disparity, even though disparity was present only for the grating's end elements located beyond the RF borders. These results, which were more common in V2 than in V1, demonstrate a role for V2 in the three-dimensional representation of surfaces in space.
- surface segmentation
- contour integration
- intermediate level vision
- amodal completion
- model completion
- disparity capture
Between the analysis of local stimulus attributes, such as edge orientation and contrast, and the recognition of complex objects, such as faces, lies an intermediate stage of vision involving contour integration and surface segmentation (Nakayama et al., 1995). Segmenting natural visual scenes into surfaces belonging to different objects can be particularly challenging when contours intersect or when near surfaces are partially occluded by objects located closer to the observer. A potential source for a solution to this challenge is depth information, which can be used to link spatially separated surfaces that belong to a single partially occluded object (amodal completion) and to link contours that may be incomplete (modal completion) (Nakayama et al., 1995; Nakayama, 1996) (Fig. 1). In this process, depth cues must be propagated from one part of the visual scene to another.
Local depth information is encoded by disparity-selective neurons. These neurons are sensitive to the small difference in the projections of stimuli lying in different depth planes onto the two retinas because of the horizontal displacement of the eyes (Wheatstone, 1838, 1852;Julesz, 1960; Barlow et al., 1967; Nikara et al., 1968; Hubel and Wiesel, 1970). Neurons in cat and monkey that are selective to stimulus depth have been classified according to the range of disparity values to which they respond; Far neurons respond to stimuli located beyond the plane of fixation, whereas Near neurons prefer stimuli located in front of the plane of fixation [cat (Barlow et al., 1967; Pettigrew et al., 1968; Bishop, 1970, 1973, 1974; Blakemore, 1970; Joshua and Bishop, 1970; Hubel and Wiesel, 1973; Pettigrew, 1973;Ferster, 1981; Freeman and Ohzawa, 1990); monkey (Hubel and Wiesel, 1970; Poggio and Fischer, 1977; Poggio and Talbot, 1981; Poggio et al., 1988; Poggio, 1995; Prince et al., 2000)].
To use depth cues in contour integration, cells must integrate information about depth and form and be sensitive to global depth cues. Information about depth in one part of the visual scene has to propagate to other parts of the scene and influence the perception of the form of distance features. In this regard, it is known that the responses of cells are strongly modulated by the context within which local features are embedded in two-dimensional space, enabling integration of information over large parts of the visual field (Maffei and Fiorentini, 1976; Allman et al., 1985; Nelson and Frost, 1985;Gilbert and Wiesel, 1990; Kapadia et al., 1995, 1999; Zipser et al., 1996). In this study, we have explored the sensitivity of cells to global depth information, in particular, to determine whether this sensitivity is consonant with the perceptual effects of modal and amodal completion and of depth capture.
The contribution of depth information to contour integration and surface segmentation was studied by manipulating contextual depth information outside of the neurons' classical receptive field (RF). We show the sensitivity of cortical cells to partially occluded contours, to illusory contours generated only by depth cues, and to texture elements whose depth is determined only by distant contextual depth cues.
MATERIALS AND METHODS
The experiments were performed with two adult male macaque monkeys (Macaca mulatta) weighing 5.2 and 3.4 kg, respectively. All procedures followed National Institutes of Health approved guidelines on the care and use of laboratory animals. Animals were anesthetized with sodium pentobarbital and underwent an initial surgery. Scleral search coils were implanted in both eyes (Judge et al., 1980), recording chambers were positioned over the opercular surface to include cortical areas V1 and V2, and a head post was implanted in the skull. After recovery, the animals were placed on a fluid-restricted diet. During this phase of the experiment, they would receive their fluid-intake allotments during the daily neurophysiological recording sessions.
To receive their fluid rewards, the animals were trained to detect a change in the luminosity of a small target presented on a cathode-ray tube (CRT). During a recording session, the animal was seated comfortably in a primate chair with its head fixed. Pulling a lever attached to the primate chair initiated a trial. A 0.1° square target appeared on a 30 × 40 cm CRT monitor placed 140 cm in front of the animal. The animal had to establish and maintain binocular eye fixation within 0.5° radius of the target for the trial to continue. Actual target fixation performance was within 0.1° as measured with a scleral search coil system (CNC Engineering). After a variable time interval (1.0–6.0 sec), the target dimmed, and releasing the lever within 750 msec resulted in the delivery of several drops of water. Failure to maintain binocular fixation ended the trial immediately without reward. Animals were allowed to continue for as long as they were willing to work. Training sessions typically lasted 3–4 hr, with the animals performing correctly on ∼80–85% of the ∼1000–1300 trials performed during the session and consuming 120–150 ml of water. Animals received unlimited access to water during weekends, and food was available ad libitum. All stimulus presentations, eye position monitoring, and reward contingencies were controlled by a Pentium-based personal computer (PC) with in-house software.
Binocular stimuli were presented on a standard CRT placed behind a liquid crystal-modulated polarizing filter (NuVision Technologies) that alternated states in synchronization with the monitor's frame rate. A pair of opposed circularly polarized lenses that matched the two states of the liquid crystal-modulated polarizing filter was placed directly in front of the subject's eyes. Stimuli were generated on a CRT monitor (Nanao Flexscan F2-21; 460 by 512 pixels; 60 Hz refresh rate) through a Univision Piranha PC-based graphics card using in-house-written proprietary software (STIM). Disparity was limited to the horizontal direction only; the stereo stimuli were generated by alternating the horizontal location of the stimulus slightly in alternate frames. Because no polarizing filter system is 100% effective in preventing stimulus leakage through the “closed” shutter, it is possible for cells to respond to the ghost stimulus, rather than the intended stimulus. To avoid activating cells by the ghost image in the closed state of the liquid crystal shutters, we chose a luminance level that would be well below the threshold of cells when projected through the closed shutter but would elicit a brisk response when projected through the open shutter. The luminance–response curves of a large number of superficial layer cortical neurons were measured, and we selected a luminance value appropriate for meeting the above criteria. The level chosen resulted in a stimulus luminance of 1.51 cd/m2against a uniform background of 0.23 cd/m2when viewed through the open shutter (both measured by a Minolta Luminance Meter LS-110 viewing the CRT through the circular polarized lenses and the liquid crystal-modulated polarizing filter). An effective 10-to-1 rejection ratio in the polarizing filter system led to a ghost stimulus luminance of only 10% of the original luminance value when the stimulus was viewed during the out-of-phase state.
Neural recordings were made in three hemispheres. A craniotomy was made over the V1–V2 border, and a small fiberglass recording chamber was affixed to the skull after the animal's initial acquisition of the behavioral task. Penetrations were made through the dura mater with glass-coated Pt-Ir microelectrodes (Wolbarsht et al., 1960) (FHC, Inc) with typical impedance between 1.0 and 3.0 mΩ at 1 kHz. Electrodes were driven by the use of a stepping motor microdrive (Narishige MO-951). Successive penetrations were positioned ∼0.5 mm apart.
Neuronal activity was amplified, filtered (model 1800; A-M Systems), and passed into a time-amplitude window discriminator (Tucker Davis Technologies) before being displayed on a digital storage oscilloscope (Tektronix). Individual spike times of window discriminator output pulses were stored and sorted on-line according to stimulus condition by the use of proprietary software (HIST) on a Pentium-based PC. On-line analysis was limited to peristimulus time histograms and rasters. Statistical analysis was completed off-line.
Activity was sampled at a single site during each recording session. Recordings were targeted for the upper layers of either V1 or V2, and recording depths varied from 150 to 650 μm. After isolation of a single neuron, the RF size, length tuning, orientation tuning, direction preference, ocular dominance, and disparity selectivity were measured. The eccentricity of RF positions ranged from 3 to 6°.
During a single fixation trial, the animal received 1–5 stimulus presentations. Stimuli were presented in a random manner until 5–10 presentations of each condition were completed. Recordings were collected for 600–800 msec; this included a 200 msec preperiod with no visual stimulation, the stimulus presentation period, and a 200 msec poststimulus period. If the animal's fixation deviated beyond the small fixation window at any time during the trial, the trial was aborted, and the recorded activity from that trial was discarded.
Because vergence movements may occur when presenting stimuli located in depth, it is important to show that the animal was maintaining steady vergence on the plane of fixation and not changing his vergence in response to the target or contextual stimuli presented. Vergence measurements were calculated off-line for time epochs before, during, and after stimulus presentation. Neither subject in this study made stimulus-induced vergence movements that correlated with stimulus disparity (Fig. 2 B,filled circles). However, when the fixation point was moved in depth, vergence movements were recorded (Fig. 2 B,open squares). Therefore, we conclude that during the presentation of stimuli containing contextual depth information, when the animal was maintaining fixation on the fixation point, the animal did not make vergence movements driven by the contextual depth information, and therefore, we are confident that our contextual stimuli remained outside of the cell's classical RF and that they did not evoke a neural response that was caused by the simple shifting of the stimulus to within the neuron's core RF after a vergence movement.
Spikes occurring within the 200 msec preperiod were used to calculate the background firing rate of the cell. The magnitude of any stimulus-driven activity was represented by the mean firing rate during stimulus presentation minus the rate of background activity. The time window of the response was adjusted individually for each cell within the range of 100–250 msec after stimulus onset, depending on the latency and the duration of the cell's response profile. Latency was measured from stimulus onset to the first bin containing activity greater than the average background spike rate obtained during the prestimulus period.
In these experiments contextual disparity information was manipulated for features lying outside of the recorded neuron's classical RF. We defined the classical RF by measuring the minimum response field using a single, small line segment. All contextual stimuli were placed beyond RF borders determined by the outer edges of stimulus positions that elicited the smallest significant increase above background firing levels. In other words, we tried to ensure that presentation of the contextual stimuli alone did not elicit responses from the recorded neurons. However, the contextual stimuli were capable of strongly modulating the responses of the neurons to target stimuli placed within the neuron's classical RF.
Recordings were made from both areas V1 and V2, the boundary being determined by the visuotopic maps obtained by a series of electrode penetrations. The signature of a transition from V1 to V2 was the reversal in the progression of receptive fields toward and away from the midline and an increase in their size (Fig.3 A,B). This boundary correlated well with cell selectivity for stimulus direction and for disparity (Fig. 3 C,D). Basic RF characteristics were obtained from 94 cells in area V1 and 115 cells in area V2, obtained from three hemispheres in two macaque monkeys. After analyzing the cells' local receptive field properties, we then explored their sensitivity to global disparity cues. We were able to characterize the response of 76 V1 cells and 97 V2 cells to at least one of the experimental stimulus programs containing global stereopsis cues.
Contour integration: amodal completion
The responses of cells in V1 have been shown to be as dependent on the global characteristics of contours extending far beyond the minimum response field as they are on the local characteristics of line elements within the receptive field core (Kapadia et al., 1995, 1999). The response of a cell to a line within the receptive field can be as much as tripled by adding a colinear line outside the receptive field (Fig. 4 B). This observation correlates with experiments on line detection, which can be performed at a much lower contrast when the target line is presented along with a colinear flanking line separated by a small gap (Dresp, 1993; Field et al., 1993; Polat and Sagi, 1994; Kapadia et al., 1995). It has been suggested that these neural and psychophysical observations are related to the perceptual linkage of line elements within an extended contour (Fig. 4 A), in part because the insertion of an orthogonal line between the lines inside and outside of the receptive field blocks both the facilitatory effects on neuronal response rates and the brightness-induced perceptual effect (Kapadia et al., 1995) (Fig. 4 B).
On the basis of the idea that flank facilitation contributes to contour integration, we wished to determine how flank facilitation might be affected by occlusion. Objects located closer to the observer partially occlude more distant objects, yet contours belonging to the distant objects are perceived as continuous rather than separate edges. If flank facilitation is a signal for a continuity between contour elements, then one might suppose that it is maintained under circumstances that indicate occlusion but is diminished under those that indicate that the separated colinear line segments belong to distinct objects. Presenting the orthogonal segment closer to the observer than the colinear segments (by giving the orthogonal bar crossed disparity) would be consistent with partial occlusion of a single, unitary line segment. Presenting the orthogonal bar farther from the observer than the colinear segments (by giving the orthogonal bar uncrossed disparity), on the other hand, would result in the colinear line segments being perceived as discreet objects. We therefore tested the effect of varying the disparity of the orthogonal bar on flank facilitation (Fig. 4 C).
As illustrated in Figure 4 B, placing a flank outside of the RF core produced a marked facilitation in the response of this V2 cell to the preferred stimulus within the RF. The flank presented alone did not drive the cell. The orthogonal line segment blocked the facilitation when placed in the plane of fixation or behind the plane of fixation, situations that interrupted the linkage of the contour elements. However, the facilitation was preserved when the orthogonal bar was given near disparity and appeared to occlude a continuous contour behind it (Fig. 4 C, arrow).
To measure the effect of moving the orthogonal bar in depth on the facilitation observed, we calculated the amount of facilitation that occurred in the presence of an orthogonal line segment for each of the three disparity conditions (orthogonal bar given crossed, zero, or uncrossed disparity): Values >100% indicate that facilitation was preserved in the presence of an orthogonal line segment. Values <100% indicate that the addition of the orthogonal bar reduced the response of the neuron to a level less than its response to presentation of the target and flank together.
Flank facilitation was defined to occur when the response to the stimulus and the flank was at least 20% greater than the response observed to the target alone. This occurred in 17 of 39 (44%) V1 cells and 26 of 68 (38%) V2 cells tested. Of these, placing an orthogonal bar in the same depth plane as the target and flank stimuli blocked facilitation (facilitation < 20%) in 11 (65%) V1 cells and 19 (73%) V2 cells. These 30 cells became the data set used to investigate the effect that moving the orthogonal bar in depth might have on flank facilitation.
To compare the amount of facilitation that occurred when the orthogonal bar was in front of versus behind the plane of the target and flanking stimuli, we plotted the normalized response of each cell for the two conditions versus the amount of driving obtained when the orthogonal bar was presented in the same plane as the target and the flank. Moving the orthogonal bar in front of the plane of the target and flank stimuli resulted in facilitation in only 1 (9%) V1 cell (Fig.5 A, filled diamonds) but in 8 (42%) V2 cells (Fig. 5 B, filled diamonds). Moving the orthogonal bar behind the plane of the target and flank stimuli never resulted in facilitation of the neuronal response (Fig. 5, open diamonds). Thus, facilitation could be preserved when the orthogonal segment was given crossed disparity (e.g., appeared in front of the flank) and therefore was occluding the colinear target and flank, but was blocked when the orthogonal segment was behind the plane of fixation. This finding is consistent with flank facilitation signaling an occluded, but continuous, contour belonging to a single object.
Depth-induced illusory contours—modal completion
The Kanisza illusion illustrates how the visual system generates contours from partial information to form surface boundaries. Depth information, generated by adding disparity to the apexes of the triangle (the cutout portions of the pac-man shapes), can produce dramatic shifts in the perceived location of the illusory contours (Fig. 6 A,B). We tested disparity-sensitive neurons with a simplified illusory border (Fig.6 C,D). Two colinear line segments were placed beyond the borders of a large field stimulus of uniform luminosity. An illusory contour was seen between the outer line segments when they were placed in the near depth plane, appearing in front of the larger field stimulus. In contrast, when the disparity of the line segments was reversed, an illusory contour was perceived to run along the top and bottom of the large field stimulus orthogonal to the orientation of the colinear line segments. Finally, when no disparity was presented to the outer line segments, no illusory contours were perceived. By aligning the orientation of the field and colinear line segments with the orientation axis of the classical RF, it was possible to generate illusory contours that either crossed or did not cross the cell's classical RF. A key feature of this stimulus is that no disparity information and no stimulus contrast were presented within the cell's classical RF.
Quantified responses obtained from a V2 cell in response to contextual disparity-induced illusory contours are illustrated in Figure7. Over the range tested for this experiment, this cell showed only a slight preference for far stimuli, as indicated by the responses obtained to a single stimulus within its classical RF given disparity [Fig. 7 B; over a larger disparity range, this cell would have been classified a Far cell according to Poggio (1995)]. Placing a large stimulus over the entire RF resulted in little activity from the cell (Fig. 7 C). The cell did not respond when small line segments given either uncrossed or no disparity were presented in combination with the large field (Fig.7 D,E). The cell responded vigorously, however, when the ends of the bars lying outside of the classical RF were presented at near disparity (Fig. 7 F, right), the stimulus condition that produced a strong illusion in human observers of a bar bounded by illusory contours crossing in front of a large square. This increase in the response of the cell to the combined tab and square stimulus was nonlinear, because it was greater than the mathematical sum of the responses obtained when the tab ends were presented alone with crossed disparity (Fig. 7 F, left) and the response of the neuron to the large square stimulus presented alone (Fig. 7 C).
To establish that the cell's response was caused by the presence of the depth-induced illusory contour extending across the RF, we performed two control experiments. The illusory contour, which is perceptually required to form the boundary of the nearer surface, is generated as a response to the difference in depth between the large field and the contextual line segments. Removing the relative depth step between the large field and the contextual line segments by adding equal disparity to the large field generated a percept of a cross-like surface at a uniform near depth plane and eliminated any internal illusory contours. In the absence of this relative depth step, this cell failed to respond (Fig.8 H) even though it did respond to the contextual depth-induced illusory contour (Fig.8 G). Because all stimuli were presented in a random manner, this implies that the cell distinguished between stimulus conditions with a near illusory contour (Fig. 8 G) and those without (Fig. 8 H).
A second control manipulated the presence of the illusory contour by manipulating the luminance of the visual stimuli. A subtle shift in relative luminance can dramatically alter the perception of the illusory contour. By manipulating the luminance conditions of the large field stimulus and of the contextual bar ends, it is possible to generate two different visual stimuli with the same disparity relationships that do or do not contain an illusory contour (Metelli, 1974; Nakayama et al., 1990) (Fig.9 A). If the luminosity of the end line segments is less than that of the intervening large field, an illusory contour extending across the field is perceived when the contextual line segments are presented in the near depth plane. If, however, the relative luminance is reversed and the end segments are darker than the intervening large field, no subjective contour is perceived, and the two end segments now appear to be distinct stimuli located in front of the large field. One configuration (Fig.9 A, left, light tabsin front of a darker field) produces an illusory contour that forms the boundary of a transparent bar, whereas the other (Fig. 9 A, right, dark tabs in front of alighter field) produces a percept of two separate tabs hovering in front of a larger background field. Two of the three units that demonstrated sensitivity to the Metelli color rules are illustrated in Figure 9, H, I, P, andQ.
Thus, neurons in V2 appear to respond to the subjective contour induced by contextual disparity cues, as indicated by the strong responses recorded when the contextual end segments were presented in the near depth plane (Fig. 10, filled circles) and the lack of such responses when the end segments were presented either in or behind the plane of fixation (Fig. 10,open triangles, x's, respectively), conditions that do not generate illusory contours. When a neuron responded to the modally completed stimulus, the magnitude of its response was greater than the simple summation of the cell's responses to the individually presented component stimuli (Fig. 10). Eleven of 35 (31%) disparity-sensitive cells tested in V2 signaled modal completion, i.e., completion of a surface bounded by illusory contours in front of the larger background field. In contrast, only 1 of 13 (8%) cells in V1 responded to contextual line segments given crossed disparity. Interestingly, there was no relationship between the disparity tuning of the cell and the signaling of modal completion. As shown in Figures 7 B, 8 C, and 9, B andJ, both tuned Far and tuned Near cells could signal modal completion (the breakdown of disparity types that responded to a modally completed stimulus was two Far, one tuned Far, two tuned Zero, five tuned Near, and two Near cells). Finally, no cells in V1 or V2 exhibited modal completion in response to the stimuli used in this stimulus set (no stimulus boundary within the RF core).
Elements in a textured surface assume the depth defined by the boundaries of the surface, a phenomenon known as the “wallpaper illusion” (Brewster, 1884; Mitchison and McKee, 1985, 1987a,b; McKee and Michison, 1988). Observing a grating of equally spaced dots will lead to the judgment of a texture at a single depth, but that depth will vary depending on the observer's trial-by-trial vergence or the disparity apparent at the edges of the gratings. Closely related to this illusion is disparity capture (Fig.11 A,B) (Ramachandran and Cavanagh, 1985; Ramachandran, 1986), in which global surface or grouping issues dictate the local interpretation of stimulus depth. We designed a simplified version of this stimulus. The center line segment within a grating of equally spaced line segments will assume the depth value determined by the disparity present at the ends of the grating, even though there is a perfectly matched stereo pair element with zero disparity presented to the other eye (Fig. 11 C,D). We used this illusion to probe the ability of disparity-sensitive cells in V1 and V2 to integrate contextual depth information by placing the end elements of gratings beyond the borders of the neuron's classical receptive field. The gratings were composed of line segments of preferred orientation and length, and the interelement spacing was determined by the neuron's own disparity tuning. Responses were recorded to the center-preferred stimulus alone, the entire grating, and the grating ends alone, presented at each of five disparities selected for the cell.
Responses of a single neuron in area V2 to stimuli showing disparity capture (Fig. 12 A) are presented and quantified in Figure 12 B. This neuron preferred far stimuli, as indicated by the strong responses to stimuli of preferred orientation and length given uncrossed disparity (Fig. 12 B, open squares). Presenting horizontally displaced gratings of iso-oriented line segments to each eye with the same disparity steps as before resulted in moderate responses from the cell (Fig. 12 B,filled circles) when the gratings were given far disparity, but no response from the cell when the grating ends were given crossed, or near, disparity. Note that in both cases, the elements presented within the cell's classical RF had no disparity information, because there were matched stimulus pairs presented to each eye. Rather, in terms of pairs, disparity existed only at the ends of the gratings. However, the responses of this cell were not simply responses to the ends of the texture grating, because the cell did not respond to isolated stimuli given disparity and located in the same position as the grating ends (Fig. 12 B, open triangles, dotted line). Thus, this cell responded as if the elements within its classical RF were captured to the depth plane of the elements at the end of the texture grating.
The response profile of a V2 Near cell is illustrated in Figure13. This cell also responded to the capture stimuli in a manner similar to its response pattern obtained for a single stimulus within its classical RF core. Note that presenting the ends of the grating alone, with the same amount of disparity, results in no significant neuronal activity, indicating that these stimuli were indeed outside of the cell's classical RF (Fig.13 B, right).
As a whole, neurons in V2 were much more sensitive to contextual stimuli containing disparity. As illustrated in Figure14 A, individual cells in V2 showed a greater range in their responses to the capture-grating stimulus. Plotted on this graph is the response of the cell to the grating stimulus averaged over the five disparity conditions tested, as well as the cell's maximum and minimum responses obtained to grating stimuli. The stimuli are sorted in ascending order of response ranges: the longer the vertical line (Fig.14 A), the greater the amount of response modulation induced by the contextual disparity cues. With the exception of a single cell, most V1 cells showed little modulation in their firing pattern in response to the contextual stimuli. In contrast, a greater percentage of cells in V2 were influenced by the disparity-containing stimuli outside the core RF, as indicated by the abundance oflong vertical lines in the right-hand side of the graph (Fig. 14 A). It should be noted that it isnot the case that the population of cells tested in V1 was poorly driven by all stimuli. Responses of the cells to preferred stimuli presented with the core RF are indicated in Figure14 A by the x's. Note that both V1 and V2 cells showed strong responses to disparity-containing stimuli presented within their core RF.
To determine whether cells responded to this depth illusion with the same disparity preference as that obtained for a single stimulus presented within their classical RF, we plotted the disparity values that produced the peak response to the two conditions (Fig.14 B). Data points locatedalong the diagonal indicate cells that responded with an identical peak preference for each of the stimulus conditions. Twenty-nine of 47 (62%) disparity-sensitive cells in V2 responded to the texture grating with the same preference as to a single stimulus within the classical RF given disparity (Fig. 14 B, open circles). In contrast, no V1 cells (0 of 11) responded in such a manner to the grating stimulus (Fig. 14 B,filled squares).
Latency of responses to stimuli containing contextual depth information
It is important to note that the capture-induced responses had the same latency as that induced by a single stimulus presented to the center of the RF (Fig. 15, ∼65 msec in one monkey). In no observed case among the population of cells responding to the disparity capture, illusory contour, or flank facilitation stimuli was the latency of the capture response significantly different from that elicited by a single stimulus placed within the RF.
Physiological identification of recording sites
In addition to using physiological properties to identify the V1–V2 border, it was possible to tentatively locate the borders of a V2 thick stripe by comparing additional RF characteristics, such as the strength of disparity tuning, direction preference, and color sensitivity (Hubel and Livingstone, 1987). This putative thick stripe is indicated in Figure 16 as thegray shaded area (these maps are from a single hemisphere, the same one depicted in Fig. 3 in the maps of RF properties). Assuming the localization of the thick stripe on the basis of physiological criteria, it is possible to determine whether the sensitivity to global depth information correlated with this subcompartment of V2 or showed a more uniform distribution across V2 and V1. From all of the classes of experiments shown above, it is clear that the V2 thick stripe region is more likely to be influenced by contextual depth information than are either V2 nonthick stripe cells or V1 cells. This was seen in both experimental animals.
V2 neurons signal partially occluded contours via flank facilitation, respond to disparity-defined illusory contours, and respond to illusions that support disparity capture. These results demonstrate that V2 cells integrate stimulus information from beyond their classical RF, similar to previous descriptions of contextual stimuli on the responses of neurons in V1 (Maffei and Fiorentini, 1976;Allman et al., 1985; Gilbert and Wiesel, 1990; Gilbert et al., 1990;Gilbert, 1992; Knierim and Van Essen, 1992; Kapadia et al., 1995, 1999;Ito et al., 1998; Ito and Gilbert, 1999). The contextual influences in V1 have been attributed to a plexus of long-range horizontal connections, whose extent and specificity can account for the interactions observed in that area (Gilbert and Wiesel, 1979, 1983,1989; Rockland and Lund, 1982, 1983; Martin and Whitteridge, 1984;Ts'o et al., 1986; Malach et al., 1993; Das and Gilbert, 1995). This intracortical network provides information to a neuron from well beyond the area covered by its own classical RF, or those of its nearby neighbors. The circuits responsible for the lateral interactions observed in V2 remain to be explored.
Critical to the design of the stimuli involved in this study was the use of contextual depth information by placing disparity-containing stimulus elements outside of the RF core. All three experiments described demonstrate that local disparity cues alone do not determine to which stimulus configurations V2 cells will respond. For example, V2 cells responded to the disparity capture stimulus even though the grating elements within the RF core contained no disparity (i.e., there was a matched element presented to the other eye in the same position; Figs. 12-14).
The preferred disparity tuning of the cell did not always predict the ultimate response to the complex surface stimuli, because it varied depending on the particular stimulus configuration studied. For example, V2 cells responded with a similar preference for stimulus capture stimuli as they did for a single stimulus within their RF core (Figs. 12-14). However, there was no relation to the disparity tuning of the cell and the signaling of modal completion (Figs. 7-9). Both Near and Far cells responded to an illusory contour generated in response to crossed disparity given to contextual bar segments located beyond the RF core.
Although disparity-sensitive neurons in V1 track absolute disparity changes and not relative disparity (Cumming and Parker, 1999), visual perception tracks relative disparity relationships (Werner, 1938;Gogel, 1965; Nelson, 1977; Westheimer, 1979; Erkelens and Collewijn, 1985; Regan et al., 1986). Relative disparity of the stimuli, and not absolute disparities, determines how the visual system interprets the surface relationships present in the stimuli (Ramachandran, 1986; He and Nakayama, 1992, 1993). V2 cells responded to the subjective contours generated by these relative disparity differences that support the surface interpretations, as demonstrated in the modal completion experiments (Figs. 7, 8). Furthermore, V2 neurons responded to the disparity capture stimulus, indicating that they might contribute to the perception of depth, rather then local depth per se. In contrast, V1 neurons failed to respond in the same way to these stimuli. This failure of V1 neurons to respond to global disparity cues is not limited to the particulars of the stimulus used in this study, as it has been reported recently using an analogous stimulus consisting of displaced sine wave gratings observed through shifted apertures (Cumming and Parker, 2000). On the other hand, there is evidence of amodal completion in V1 (Sugita, 1999) (although this report of amodal completion found the phenomenon limited to cells with RFs within the central 2° of eccentricity).
It is well documented that area V2 has compartments specialized for the analysis of depth, motion, and color (Hubel and Wiesel, 1970;Livingstone and Hubel, 1984; Hubel and Livingstone, 1985, 1987; Roe and Ts'o, 1995; Zeki and Shipp, 1987). In addition to these attributes, V2 also seems to be more responsive to illusory contours than is V1 (von der Heydt et al., 1984; von der Heydt and Peterhans, 1989a,b; Peterhans and von der Heydt, 1993; Sheth et al., 1996). Previous experiments have demonstrated responses of cells in V2 to illusory contours formed by offset gratings or pac-man-like supports (von der Heydt et al., 1984; von der Heydt and Peterhans, 1989a,b;Peterhans and von der Heydt, 1991, 1993; Sheth et al., 1996). Here we show that V2 cells can respond to disparity-induced illusory contours. V2 sensitivity to surface interpretations of visual stimuli has also been suggested by recent evidence that pairs of V2 neurons in the cat can develop synchronous activity that correlates with surface segmentation rules determined by manipulation of stimulus transparency (Castelo-Branco et al., 2000), but there is debate about the general role of synchrony in surface segmentation (Lamme and Spekreijse, 1998). Synchrony between neurons develops when two intersecting plaids are perceived as separate surfaces, with one transparent surface moving above a background surface, but not when they are perceived as a single surface moving in an intermediate direction.
Further evidence that the phenomena we have studied relate to the higher order perceptual rules of surface relationships, rather than to local disparity selectivity within the RF core or surround, is the sensitivity of the cells to the Metelli rules of luminance and transparency (Fig. 9). In this case, all disparity relationships were identical across the stimuli tested; only the local stimulus contrast changed. Yet, these manipulations alter the percept of the surfaces, from a transparent surface bounded by illusory contours in front of a darker background square to two independent darker bar segments floating in front of a lighter background, with no illusory contours connecting the bar segments. V2 cells respond to the former stimulus configuration but not the latter, indicating sensitivity to the surface representation and not only to local stimulus disparity cues.
We found much more pronounced effects of global depth cues in area V2 than in area V1. The differences were not merely attributable to differences in spatial scale, because our stimuli were tailored to the sizes of the individual receptive fields of the recorded neurons. The difference in the propensity of V2 neurons to be sensitive to global depth cues compared with that of V1 suggests different functional roles for the two areas. Several lines of evidence point toward a role for V1 in coding image characteristics, i.e., orientation, spatial frequency, luminance, wavelength, and binocular disparity. This proceeds more or less independent of what real world scene may have given rise to a particular image. In contrast, the three experiments included in this study suggest that V2 plays a role in the coding of surface properties of a scene, including surface contours, regions of surfaces, opacity, and transparency, particularly as they are influenced by the relative depth between image regions.
This work was supported by National Institutes of Health Grant EY07968. J.S.B. was supported by National Institutes of Health Fellowship F32EY06842. We thank Kaare Christian for programming and Steven Kane and Joel Lopez for surgical assistance.
Correspondence should be addressed to Dr. Charles D. Gilbert, The Rockefeller University, 1230 York Avenue, New York, NY 10021. E-mail:.