Abstract
In the context of motion detection, the endings (or terminators) of 1-D features can be detected as 2-D features, affecting the perceived direction of motion of the 1-D features (the barber-pole illusion) and the direction of tracking eye movements. In the realm of binocular disparity processing, an equivalent role for the disparity of terminators has not been established. Here we explore the stereo analogy of the barber-pole stimulus, applying disparity to a 1-D noise stimulus seen through an elongated, zero-disparity, aperture. We found that, in human subjects, these stimuli induce robust short-latency reflexive vergence eye movements, initially in the direction orthogonal to the 1-D features, but shortly thereafter in the direction predicted by the disparity of the terminators. In addition, these same stimuli induce vivid depth percepts, which can only be attributed to the disparity of line terminators. When the 1-D noise patterns are given opposite contrast in the two eyes (anticorrelation), both components of the vergence response reverse sign. Finally, terminators drive vergence even when the aperture is defined by a texture (as opposed to a contrast) boundary. These findings prove that terminators contribute to stereo matching, and constrain the type of neuronal mechanisms that might be responsible for the detection of terminator disparity.
Introduction
Binocular disparity plays a central role in our ability to infer the 3-D organization of objects in the environment. Detecting binocular disparities requires solving the so-called “stereo correspondence problem” (Julesz, 1971), i.e., correctly identifying corresponding features in the images from the two eyes. When stimuli that vary along only one dimension are viewed, this problem is compounded by the “stereo aperture problem” (Morgan and Castet, 1997; Rambold and Miles, 2008): because neurons in early visual cortex have small receptive fields, when a line (a 1-D feature) traverses their receptive field they are only capable of extracting the component of disparity orthogonal to the line (Cumming and Parker, 2000). Since the end points of 1-D features are 2-D features, they could be expected to play a prominent role in determining how both problems are solved. For example, our ability to localize in depth a horizontal bar might be based on the disparity of the two bar endings (or terminators). However, as previously pointed out (McKee et al., 2004; Wilcox and Allison, 2009), under such conditions mechanisms that match the location of the bar in its entirety could also be used. Accordingly, hard evidence for the role played by terminators per se in extracting depth information is currently lacking.
In the realm of motion detection, a role for terminators has been proposed based on the “barber-pole” illusion (Wallach, 1935; Lorenceau and Shiffrar, 1992; Kooi, 1993). When a drifting oblique 1-D pattern (e.g., a sinusoidal or a square-wave grating) is seen through a stationary, vertically elongated, aperture, subjects initially perceive the pattern as moving in the direction orthogonal to its orientation. However, shortly thereafter the pattern appears to be moving vertically, parallel to the long edges of the aperture. The deviation of the perceived direction of motion from that of the 1-D features is usually imputed (Fisher and Zanker, 2001; Edwards et al., 2013) to the line endings along the long edges of the aperture, i.e., 2-D features that indeed move vertically.
Here we study, for the first time, the stereo analogy of the barber-pole motion stimulus: we dichoptically presented horizontal or vertical 1-D binary random noise line patterns, having disparity, viewed through a zero-disparity oblique elongated aperture. To assess the impact of the terminator disparity signal, we measured, in three human subjects, the short-latency reflexive vergence eye movements that are induced when an image is suddenly presented binocularly (Busettini et al., 1996). These movements, which are termed the disparity vergence response (DVR), are induced by vertical, horizontal, and oblique disparities (Rambold and Miles, 2008). Here we show that they are also strongly driven by terminator disparity, although with a slightly longer latency. In addition, we report that terminator disparity strongly drives perception, even with ultrashort exposures (60 ms).
Materials and Methods
Subjects.
Three male subjects participated in all the experiments; one (CQ) was an author and one (JH) was unaware of the experimental questions being investigated. Two additional male subjects, one (BGC) an author and the other (AB) naive, participated only in the depth perception experiments. All had normal or corrected-to-normal visual acuity and normal stereo acuity. Experimental protocols were approved by the institutional review board concerned with the use of human subjects.
Visual apparatus.
The subjects sat in a dark room and were positioned so that their eyes were located approximately in the center of a cubic box (70 cm/side) containing orthogonal magnetic field-generating coils. The chin and forehead of each subject rested on padded supports, and the head was stabilized using a head band. Visual stimuli were presented dichoptically using a Wheatstone mirror stereoscope. Each eye saw a CRT monitor (ViewSonic G225f) through a 45° mirror, creating a binocular image straight ahead at a distance of 521 mm from the corneal vertex, which was also the optical distance to the two monitor screens. Each monitor screen covered 42° (horizontal) by 32° (vertical) of visual angle, and was set at a resolution of 1600 columns by 1200 rows, and a refresh rate of 100 Hz. A single video card (EVGA GEForce GTX 580 Classified) was used to provide the inputs to both monitors. Using the Nvidia Control Center configuration tool, the two monitors were set up so as to appear to the operating system (Microsoft Windows XP) as a single monitor with a resolution of 3200 columns by 1200 rows. Using two photocells connected to a digital oscilloscope, we verified that the refresh timing of the two monitors was tightly synchronized, with the left eye image consistently preceding the right eye image by 0.4 ms, a delay that is inconsequential for the processing of disparity information (Julesz and White, 1969; Read and Cumming, 2005). Each of the video card output ports was fed to an attenuator (Pelli and Zhang, 1991) whose output was connected to a single channel of a video signal splitter (AC085A-R2, Black Box); the video outputs of the splitter were then connected to the RGB inputs of the monitor. In this way, only gray-scale images could be presented, but with a higher luminance resolution (12 bits) than normally possible (8 bits). Luminance linearization was performed by interpolation following dense luminance sampling (using a Konica Minolta LS100 luminance meter) independently for each monitor.
Eye movement recording.
A scleral search coil embedded in a silastin ring (Skalar; Collewijn et al., 1975) was placed in each of the subject's eyes following application of topical anesthetic (proparacaine HCl). The horizontal and vertical orientations of the eyes were recorded using an electromagnetic induction technique (Robinson, 1963). These outputs were calibrated at the beginning of each recording session by having the subject look at targets of known eccentricity. Peak-to-peak noise levels resulted in an uncertainty in eye position recording of <0.03°. Coil signals were sampled at 1000 Hz.
Experiment control.
The experiment was controlled by two computers, communicating over an Ethernet with TCP/IP. The Real-time EXperimentation software package (Hays et al., 1982), running on the master computer under the QNX operating system, was responsible for providing the overall experimental control as well as acquiring, displaying, and storing the eye movement data. The other machine, directly connected to the CRT displays, ran under the Windows XP operating system and generated the required visual stimuli in response to REX commands. This was accomplished using the Psychophysics Toolbox 3.0.8, a set of Matlab (Mathworks) scripts and functions (Brainard, 1997).
Behavioral paradigm.
Trials were presented in blocks; each block contained one trial for each stimulus condition. All conditions within a block were randomly interleaved. Eye movements and perceptual reports were collected on separate days. For eye movement recordings, each trial began with the appearance of a central fixation cross (width, 10°; height, 2°; thickness, 0.15°) on a gray background (20.8 cd/m2). The subject was instructed to look at the center of the cross and avoid making saccadic eye movements. After the subject maintained fixation within a small (1° on the side) invisible window around the fixation point for 800–1100 ms, the fixation cross disappeared and the visual stimuli appeared simultaneously on both screens. After 200 ms, both screens turned gray (again at 20.8 cd/m2), signaling the end of the trial. After a short intertrial interval, a new trial was started. If the subject blinked, or if saccades were detected during the stimulus presentation epoch, the trial was discarded and repeated within the block. With few exceptions, a single experiment required multiple daily recording sessions to collect enough trials from a subject (we collected between 150 and 450 trials for each condition, depending on the signal-to-noise ratio, and between 1000 and 2400 trials in a session; the number of conditions varied across experiments).
In perceptual reports sessions, each trial began with the appearance of a central fixation cross (width, 2°; height, 2°; thickness, 0.15°) on a gray background (20.8 cd/m2). The subject was instructed to look at the center of the cross. After 1400–1700 ms, the visual stimuli appeared simultaneously on both screens with the fixation cross superimposed on them. After 60 ms both screens turned gray (again at 20.8 cd/m2), and the subject had an unlimited time to report by pressing one of two buttons whether the pattern, or any part of it, was perceived in front of the zero-disparity fixation cross or behind it (two-alternative forced choice). The button press triggered the start of a new trial. The subjects were never given any feedback about their performance, but before data collection commenced they were allowed to practice with the task for as long as they wished.
Visual stimuli.
In all experiments described herein, we used 1-D binary random noise line patterns, either horizontally or vertically oriented. Each 4 pixel-wide (0.1°) line in a pattern was randomly assigned either a high (34.1 cd/m2) or a low (7.5 cd/m2) luminance value. The average luminance of each line pattern was equal to 20.8 cd/m2. In the first, third, and fifth experiment, the patterns were presented within an oblique (long sides tilted by ±45°; width, 11°; height, 27°) parallelogram. Outside the aperture, the screen was gray with a luminance of 20.8 cd/m2. In the first experiment, the pattern had a disparity of 20′ (orthogonal to the orientation of the lines), whereas the aperture had zero disparity (Fig. 1A). In the third experiment (Fig. 1C), both pattern and aperture had the same disparity (20′, orthogonal to the orientation of the lines). In the fifth experiment (Fig. 1E), the aperture had once again zero disparity, but the pattern was anticorrelated (i.e., once the 20′ disparity was applied, one of the two images was contrast-reversed). In the second experiment, the pattern was the same as in the first experiment, but instead of showing it through a hard aperture, we multiplied its contrast by a 2-D Gaussian function (Fig. 1B; 1:4 ratio for the minor/major axis SD, equal to 1.67 and 6.67°, respectively). Finally, in the fourth experiment, we used rectangular apertures (10 × 30°), to which we applied a 20′ disparity orthogonal to their long axis. In this case, a different zero disparity (or uncorrelated) noise pattern, oriented parallel to the short axis of the aperture, was used within and outside the aperture, which was thus defined by a texture change, as opposed to a contrast change (Fig. 1D).
For the perceptual experiments, only the configuration shown in Figure 1A was used, with disparities ranging from 20 to 1.7′ (one pixel).
Data analysis.
All the measures reported herein are based on vergence velocity. The calibrated eye position traces (see Eye movement recording) were differentiated using a 21-point finite impulse response acausal filter (47 Hz cutoff frequency). The difference between the horizontal (vertical) velocity of the left and right eyes was then computed, yielding the horizontal (vertical) vergence velocity for each trial. Trials with saccadic intrusions and unstable fixation that went undetected at run time were removed through an automatic procedure aimed at detecting outliers. For each velocity signal (left eye horizontal, right eye horizontal, left eye vertical, right eye vertical, horizontal vergence, vertical vergence) at each time point (0–199 ms from stimulus onset, in 1 ms increments), trials for which the velocity deviated more than ±4.5 SDs from the mean (across all the valid trials for a given condition) were excluded. This was repeated iteratively until no trials were excluded. The fraction of trials so excluded varied from session to session and usually increased during a session. Typically, during the first 20 min within a session, ≤5% of the trials were excluded, but usually by the end of the session this fraction increased, sometimes to as much as 20%. Eye discomfort due to the anesthetic effect wearing off and decrease in concentration were the most likely causes. Average temporal profiles, time-locked to stimulus onset, were then computed over the remaining trials, separately for each stimulus condition.
Opposite stimulus disparities (e.g., crossed and uncrossed by the same magnitude) are known to elicit DVRs that are not simply opposites (Busettini et al., 1996). The most striking example of this asymmetry can be seen when very large disparities are applied: very large positive and negative disparities induce the same (nonzero, and different across subjects) DVR. This response, independent of the sign of the stimulus disparity, is called a default response, and is probably partly related to the disengagement of fixation. To remove this component, and to increase the signal-to-noise ratio, most disparity vergence studies (Sheliga et al., 2006; Quaia et al., 2013) report not the raw DVR, but rather the difference between the DVRs to opposite disparities. We did so here as well. The traces and measurements reported here are thus based on the difference between the average response to a crossed (left-hyper) disparity pattern and that to the same size uncrossed (right-hyper) disparity pattern (seen through the same aperture). For the fourth experiment (Fig. 1D), when only the aperture had disparity, we show the difference between the average responses to the crossed (left-hyper) and uncrossed (right-hyper) aperture disparity conditions.
We used bootstrap-based methods (Efron, 1982) for all our statistical analyses (for a detailed description of the procedures used, see Quaia et al., 2013). For the component of the DVR orthogonal to the pattern (i.e., the early response), the latency was determined as the time at which the response (mean of response to +45 and −45° aperture orientation) became significantly (p < 0.05) different from zero. For the components of the DVR parallel to the pattern (i.e., the late response), the latency was determined as the time at which the response to the +45° aperture orientation first differed significantly (p < 0.05) from that to the −45° orientation. Bootstrap techniques are used to compute these measures and to compute confidence intervals around the measures themselves (Quaia et al., 2013).
Simulations.
To investigate the role that classical mechanisms might play in determining the responses to our stimuli, we simulated the behavior of V1 complex cells by implementing the binocular energy model (Ohzawa and Freeman, 1986; Ohzawa et al., 1990; Anzai et al., 1999). Because the stimuli we used are broadband, the spatial frequency to which the model units are tuned is probably not crucial. However, since the stimulus disparities were 0.2° (pattern disparity) and 0.28° (terminator disparity), we chose the spatial scale for the model so that this range of disparities corresponded to a phase disparity of ∼90°, in accordance with the size–disparity correlation hypothesis (Schor and Wood, 1983; Smallman and MacLeod, 1994; Prince et al., 2002). Thus, we chose Gabor filters with a carrier spatial frequency of 1.0 cycles per degree, which was multiplied by a 2-D Gaussian having a SD equal to 0.4° in the direction orthogonal to the preferred orientation of the unit, and 0.52° in the direction parallel to the preferred orientation of the unit. The values for the Gaussian envelope were chosen a priori based on averages for V1 neurons in the macaque monkey (Ringach, 2002). We then manipulated the orientation, receptive field location, and preferred phase disparity of the units and computed their output for the stimuli used in our experiments. We introduced phase disparity by changing the phase of the sinusoidal carriers defining the receptive field in just one eye. This produces the same phase difference between the eyes for each member of the quadrature pair. In all cases, disparity tuning curves were obtained by averaging the output of the units over 1000 randomly generated noise patterns.
Results
Vergence eye movements induced by terminator disparity
In our main experiment, we measured the short-latency vergence eye movements (DVRs) induced by a vertical binary random noise line pattern to which 20′ of horizontal (crossed or uncrossed) disparity was applied, and by a horizontal noise pattern having 20′ of vertical (left-hyper or right-hyper) disparity. The patterns were viewed thorough a zero-disparity oblique (±45°) aperture. A sample stimulus (horizontal noise pattern with left-hyper disparity, seen through a +45° aperture) is shown in Figure 1A. Note (central inset) that along the tilted sides of the aperture, the terminators of each line in the pattern have an oblique disparity: the disparity vector (white arrow) is aligned with the aperture edge, and its magnitude is 28.3′. Thus, both the disparity of the pattern and the orientation of the aperture contribute to determining the terminator disparity vector.
In Figure 2, we plot the DVRs induced by these stimuli in our three subjects. As explained in Materials and Methods (Data analysis) we actually report the difference between the DVRs to the crossed (or left-hyper) patterns and those to the uncrossed (or right-hyper) patterns, seen through the same aperture (we also inspected the raw responses, and found that opposite signed disparities induced opposite signed vergence responses, but there were some small idiosyncratic asymmetries in the magnitude of the responses). In the top row, the patterns were vertical (with horizontal disparity). Around 65 ms from stimulus onset (for latency measures, see Table 1), a horizontal DVR emerged. This response reflects the disparity of the pattern, and was the same whether the aperture had an inclination of +45° (thick black lines) or −45° (thin black lines). Approximately 35 ms later, a vertical DVR appeared, with opposite sign in the two cases (thick and thin gray lines, respectively). The sign of this response is congruent with the sign of the vertical component of the disparity vector of the line terminators along the tilted sides of the apertures. The time difference between the onset of the two components of the vergence response was always significantly different from zero (p < 0.05 in all subjects).
The responses shown in Figure 2, bottom row, were instead induced by horizontal patterns (having vertical disparity). In this case, the vertical DVR was the same whether the oblique aperture had an inclination of +45° (thick gray lines) or −45° (thin gray lines), and the latency of this response was very short (∼72 ms), but slightly longer than that of the horizontal DVRs elicited by the vertical patterns (top row). The timing difference between first-order responses to horizontal and vertical patterns is larger than previously reported (Busettini et al., 2001; Quaia et al., 2013); we suspect that this might be due to the smaller stimulus area used in the current experiment (Howard et al., 2000). Later on, a horizontal DVR (black lines), opposite in sign depending on the orientation of the oblique aperture, emerged. The sign of this response followed the sign of the horizontal component of the disparity vector of the terminators along the long sides of the aperture. This response was for all subjects significantly (p < 0.05) delayed relative to the vertical DVRs, but the delay was considerably shorter (by 10–20 ms) than that observed with vertical patterns (Table 1).
With these stimuli, the contrast envelope (i.e., the aperture) had zero disparity, and thus could not have been responsible for either of the DVR components. Since the direction of the late DVR component was congruent with the disparity vector of the line terminators, it is tempting to conclude that line endings were responsible for this component. Before reaching this conclusion, an alternative explanation must be ruled out. In the context of motion detection, it has been argued that the barber-pole illusion might be at least partially attributed to the orientation of the aperture itself and not to terminator motion per se (Badcock et al., 2003). Under this hypothesis, the visual system extracts from the 1-D pattern the component of the speed vector orthogonal to the pattern, and the component parallel to it is then determined from the orientation of the aperture (static form information). This scheme, which is a special case of the well established motion-from-form mechanism (Geisler, 1999; Mather et al., 2013), thus posits that the static structure of image elements constrains the motion direction computation, with the motion of the 2-D terminators playing no role. In analogy to this motion theory, it might thus be argued that, in the context of our experiments, the orientation of the aperture itself (i.e., monocular form information), paired with the disparity of the 1-D pattern (i.e., first-order binocular disparity information), is sufficient to explain the sign of the late DVR component (without needing to postulate any role for the disparity of the terminators). We call this the “disparity-from-form” hypothesis, and we tested it by carrying out two control experiments. First, we presented these same noise patterns within a zero-disparity aperture whose transparency was determined by a tilted 2-D Gaussian function (Fig. 1B). Since with this stimulus the line endings fade away smoothly, any terminator disparity signal is sharply weakened, but the orientation of the aperture (form information) is preserved. The energy of the 1-D pattern disparity was also somewhat diminished and, as shown in Figure 3, the early DVR components were accordingly delayed (Table 1) and attenuated compared with those measured with the hard aperture (Fig. 2). This attenuation was, however, much more dramatic in the late DVR components, which were virtually obliterated. One subject (JH) still exhibited some small responses, with a sign tied to the orientation of the aperture, but in the other two subjects there was no significant off-axis response.
A limit of this experiment is that the Gaussian aperture might have weakened not only the terminator disparity signal, but also form information. To further test the disparity-from-form hypothesis, we thus presented the same patterns used in the main experiment, but we now applied to the aperture the same disparity applied to the pattern (Fig. 1C). In this configuration, the disparity of the line endings was equal to that of the pattern (i.e., purely horizontal or vertical), and thus could not induce an off-axis vergence response. However, the disparity of the pattern and the monocular shape/orientation of the aperture (i.e., the signals used under the disparity-from-form hypothesis) were identical to those used in the main experiment. We found that a vertically oriented pattern (horizontal disparity of pattern and aperture) induced a short-latency (Table 1) purely horizontal vergence response: any vertical vergence response observed was weak and the same regardless of aperture orientation (Fig. 4, top row). When the pattern was oriented horizontally (vertical disparity of pattern and aperture), an early vertical vergence response was followed by a late horizontal DVR (Fig. 4, bottom row). Note, however, that the sign of this response is opposite that predicted by the disparity-from-form hypothesis (and cannot be explained by terminators, which have a purely vertical disparity). One possible explanation for this behavior is that a fast response to the disparity of the pattern was followed by a delayed response to the disparity of the aperture (presumably the contrast envelope), which was matched not vertically (as it should have been based on the actual disparity applied), but rather horizontally. This explanation is compatible with the observation that stereo matching for large oblique stimuli favors the horizontal direction (van Ee and Schor, 2000; van Dam and van Ee, 2004; Rambold and Miles, 2008), and correctly predicts the sign of the late horizontal vergence response. Regardless of the origin of this late component, this experiment provides no support for the operation of a hypothetical disparity-from-form mechanism with our stimuli. Accordingly, these two control experiments leave terminator disparity as the only plausible explanation for the direction of the late DVR component in our first experiment.
In an effort to further quantify the strength of terminator disparity, we used in our fourth experiment stimuli in which two binary random noise line patterns, uncorrelated to each other, were presented, one in a central rectangular aperture, and the other in the surround. We used horizontal patterns presented in a vertical aperture having horizontal disparity (see sample in Fig. 1D), and vertical patterns in a horizontal aperture with vertical disparity. The center and the surround patterns either had zero disparity or were uncorrelated across the eyes. In both cases, the location of the aperture could be extracted from the texture changes in each monocular image, and the disparity of the monocular apertures could then be used to drive vergence. However, when the center and surround patterns had zero disparity, the line endings along the aperture long sides also carried disparity information. If the visual system is capable of extracting this information, a stronger vergence response might thus be expected. We found that in all three subjects this was indeed the case (Fig. 5). Faster (Table 2) and stronger DVRs were observed when the patterns had zero disparity (thick lines), compared with when they were uncorrelated across eyes (thin lines), especially when the aperture had vertical disparity (bottom row). To quantify the strength of the responses, we measured separately for each condition the average vergence speed in a 60 ms time window starting from the latency of the response to the correlated, zero-disparity, patterns (Table 3). The mean speed for the correlated patterns was significantly larger than that for the uncorrelated patterns in all conditions (p < 0.05). We attribute this response enhancement with correlated patterns to terminator disparity, since even a disparity-from-form mechanism would not predict it.
To further guide the identification of the neuronal mechanism(s) that underlie the processing of terminator disparity, we conducted one additional experiment. We used a stimulus configuration identical to that of the first experiment (i.e., zero-disparity oblique aperture), but now the images seen by the two eyes were anticorrelated (i.e., the pattern seen by one eye was obtained by first applying disparity to, and then reversing the contrast of, the pattern seen by the other eye; Fig. 1E). It is well known that with anticorrelated patterns, the sign of DVRs flips (Masson et al., 1997; Takemura et al., 2001), and so does the tuning of neurons in monkey V1 (Cumming and Parker, 1997), MT (Krug et al., 2004), and MST (Takemura et al., 2001) areas. Both vergence and neuronal responses are also significantly reduced in magnitude. These results with first-order disparities can be explained based on simple disparity detection mechanisms (Cumming and Parker, 1997; Haefner and Cumming, 2008; Tanabe and Cumming, 2008; Samonds et al., 2013), but if the detection of terminator disparity relies on a 2-D feature detector (i.e., a mechanism that extracts features from the monocular images, and then computes their disparity), anticorrelation would be expected to obliterate the terminator response. We found (Fig. 6) that the early DVR component flipped sign and was attenuated (but not consistently delayed; Table 1). As noted above, this is compatible with this component being generated by first-order mechanisms. The late DVR component was also strongly attenuated, but it is clear, especially when the pattern was horizontal (with vertical disparity, bottom row), that it also reversed sign. This finding clearly cannot be explained by a strict 2-D feature detection mechanism, and shows that at least some of the neurons that respond to terminator disparity undergo a tuning reversal with anticorrelated patterns.
Perception of terminator disparity
As noted in the Introduction, we are unaware of psychophysical studies that tested the perceptual detectability of terminator disparity using a configuration equivalent to the barber-pole illusion, which rules out mechanisms based on contrast envelopes. Accordingly, we performed some very simple experiments to test the perceptual impact of terminator disparity. We presented the same stimuli used in our first experiment to our subjects (plus two additional observers) in a depth discrimination task. In this case the fixation cross, at zero disparity, was visible not only before, but also during stimulus presentation; subjects were asked to report whether the stimulus, or any part of it, appeared in front of the fixation cross or behind it. Since perception is dominated by the horizontal component of disparities, and all sources of horizontal disparity might be used to guide perception, care must be taken in the selection of the stimulus (Farell, 1998; Delicato and Qian, 2005). With a vertical pattern, the pattern and the terminators have disparity vectors with the same horizontal component, and it would thus not be possible to determine whether a subject responded to one or the other. In contrast, when the pattern is horizontal, its disparity is purely vertical, and does not drive the perceptual system. With this stimulus (Fig. 1A), a depth judgment could thus only be based on the horizontal component of the oblique terminator disparity vector. Because the shortest latency of the vergence responses we measured with our stimuli was 64 ms, we limited the stimulus duration to 60 ms, thus preventing eye movements from playing any role. When the vertical disparity of the pattern was 20′ (as for the eye movements recordings), all subjects performed well above chance, close to 100% correct (Table 4). Even with vertical disparities as small as 1.7′ (1 pixel on our monitor), all subjects performed above chance (between 70 and 93% correct at the smallest disparity, p < 0.001). Note that, since we intermixed two aperture orientations and opposite vertical disparities for the pattern (which cannot be perceptually discriminated), no cues other than the disparity of the terminators could be used to perform the task. We also tested our subjects using a tilted 2-D Gaussian aperture (Fig. 1B), and they all performed at chance. These results demonstrate that terminator disparity supports not only vergence eye movements, but also perception. Obviously much remains to be quantified regarding the spatial distribution and summation properties, spatial and temporal frequency sensitivity, and effectiveness relative to other depth cues of terminator disparity signals.
Neuronal mechanisms of terminator disparity detection
When the neuronal mechanism responsible for the perception of the barber-pole motion illusion are discussed, end-stopped cells in area V1 (Hubel and Wiesel, 1968) are usually considered a prime candidate for the detection of terminator motion (Pack et al., 2003). These cells discharge strongly when the end of a bar is at the center of their receptive field, but much less when the bar extends across their entire receptive field, and represent a large fraction of V1 neurons (Jones et al., 2001; Sceniak et al., 2001). Howe and Livingstone (2006) identified in monkey V1 a subset (∼20%) of disparity-selective neurons that can be considered the stereo equivalent of end-stopped cells. These neurons, which they labeled end-selective cells, respond preferentially when the ends of bar stimuli fall in the center of the receptive field, coding the disparity of the bar terminators, and not simply the disparity orthogonal to the bar orientation. End-selective cells thus appear to be an ideal candidate for generating the responses we reported in our main experiment (Fig. 2). Unfortunately it is not yet known how they behave with anticorrelated patterns, or whether they are able to detect texture-defined terminators, such as those used in our fourth experiment. Referring once again to the literature on end-stopped cells, studies in cats indicate that, when tested with sinusoidal stimuli, end-stopping in V1 is phase insensitive (Nelson and Frost, 1978; DeAngelis et al., 1994), which would make them unable to detect a texture-defined edge. A study in monkeys appeared to confirm this (Bair et al., 2003), but a more recent study indicated that end-stopped cells are sensitive to contrast, at least when the stimulus is a single bar (Yazdanbakhsh and Livingstone, 2006). If this last finding were to hold also for the noise patterns we used, end-selective cells might be able to account for all of our findings.
End selectivity is, however, not the only mechanism that might be used to extract terminator disparity, just as end-stopped cells are not necessary to extract terminator motion. For example, Löffler and Orbach (1999) showed that a two-pathway (labeled Fourier and non-Fourier) model, similar to the one proposed by Wilson and colleagues to account for the perceived motion direction of sinusoidal plaids (Wilson et al., 1992), can correctly detect the direction of motion of a tilted bar, without relying on end-stopped cells. Obviously, an analogous model could be developed for the extraction of terminator disparity. Furthermore, we will now show that binocular energy units (Ohzawa and Freeman, 1986; Ohzawa et al., 1990; Anzai et al., 1999), which mimic the behavior of V1 binocular complex cells and are usually associated with a Fourier energy analysis of the stimulus, are also sensitive to terminator disparity, at least under our experimental conditions.
We simulated the response of V1 complex cells (binocular energy model units; see Materials and Methods for implementation details) to a vertical noise pattern (0.2° of disparity, correlated or anticorrelated) seen through a zero-disparity aperture (tilted 45°). In Figure 7A, we plot the response of units centered on the middle of one of the tilted edges of the aperture, with a preferred orientation orthogonal to the edge, as a function of their preferred phase disparity. With correlated patterns, units tuned to ∼90° of phase disparity discharged most vigorously; the tuning was reversed when anticorrelated patterns were used instead. These units thus correctly detected the terminator disparity vector, and their tuning rotated by ∼180° when presented with anticorrelated patterns. Of course, since the pattern itself did not match the preferred orientation of the units, their response was rather weak (approximately one-eighth of the response of vertically oriented units), but nevertheless a selective read-out from a population of such off-axis cells could reproduce our results with both correlated and anticorrelated patterns.
Next, we used as stimuli for our simulations the texture-defined apertures we used in the fourth experiment. We simulated the condition in which the pattern is horizontal, and the texture-defined vertical aperture has horizontal disparity (Fig. 1D). In Figure 7B we show the simulated response of V1 complex cells with vertical preferred orientation, centered on the cyclopean location of the texture-defined aperture edge, as a function of their phase disparity. The population of cells shows a clear tuning to the disparity of the aperture when the patterns have zero disparity (black line), but not when they are uncorrelated. This indicates that the neurons are capable of extracting the disparity of the terminators, but not the disparity of the monocular texture edge itself.
Since the energy units used here are the same as those used by Löffler and Orbach (1999) in their model's Fourier pathway (or, more precisely, they carry out the same operation in the stereo domain as theirs do in the motion domain), it is natural to ask why their model requires an additional (non-Fourier) pathway to extract terminator motion. The reason is quite simple: in their Fourier pathway, the output of energy units is pooled across all orientations, obliterating the terminator information carried by the off-axis units, which as noted above respond less vigorously than units aligned with the pattern. Obviously, a more sophisticated read-out scheme is necessary to preserve this information. For example, estimates of disparity along multiple directions (at a minimum horizontal and vertical) might be computed, pooling selectively across orientations [i.e., a horizontal (vertical) disparity estimator would rely more, but not exclusively, on the output of vertically (horizontally) oriented units]. Schemes similar to this, but limited to the extraction of horizontal disparity for perception, have been proposed and experimentally supported (Matthews et al., 2003; Patel et al., 2003, 2006). Alternatively, a scheme in which the activity from all units is used in a template-matching operation, as found in models that account for transparency (Lehky and Sejnowski, 1990; Treue et al., 2000; Tsai and Victor, 2003), could also work. What such schemes have in common is that even neurons with weak responses can make important contributions.
Discussion
We investigated whether the human visual system extracts the disparity of terminators, 2-D features which can be used to resolve the ambiguity associated with the disparity of 1-D features (the stereo aperture problem). We designed our stimuli after the barber-pole motion illusion, which has been used to demonstrate that terminators affect both perceived motion direction (Wallach, 1935; Lorenceau and Shiffrar, 1992; Kooi, 1993) and ocular following eye movements (Masson et al., 2000; Barthélemy et al., 2010). We found that terminators induce vivid depth percepts and generate robust disparity vergence eye movements (DVRs). As far as we are aware, ours is the first investigation of the binocular equivalent of the barber-pole illusion. The advantage of this approach is that it rules out explanations based on the disparity of contrast envelopes (McKee et al., 2004; Wilcox and Allison, 2009), since the contrast envelopes for the stimuli used in our experiments either had no disparity (Figs. 2, 6), or could account only for a fraction of the response (Fig. 5). Furthermore, by relying on 1-D patterns, which only constrain the component of the disparity vector orthogonal to their orientation, our stimuli are free of the conflicts between disparity cues present in some previous studies (Anderson, 1994; Malik et al., 1999; van Ee et al., 2001). Anderson and colleagues, using untextured stimuli, concluded that the visual system does not extract terminator disparities. Our results prove that this does not apply in general, and demonstrate that line terminators can play a significant role in binocular vision.
The stereo aperture problem manifests itself in quite different ways, depending on the stimulus. With plaids (the sum of two 1-D patterns having different orientations), its solution requires integrating information across orientations; with barber-pole stimuli, integrating information across space is more fruitful. Accordingly, different neural mechanisms are probably involved. We had previously shown (Quaia et al., 2013) that plaids induce robust DVRs, characterized by two components: one driven by the disparity of the components (first-order), and another, delayed by 10–15 ms, driven by the disparity of the plaid. Here we also found a similar delay between first-order and terminator responses, but we think that the source of the delay is quite different in the two cases. For the plaids, we suggested that plaid disparity is computed by properly combining, according to the so-called intersections of constraints (IOC) rule, the signals associated with the disparity of the components (serial processing). We interpreted the delay as the time needed to carry out this computation. For the current results, the evidence points instead to two computations performed in parallel: one to extract first-order disparity and the other to extract terminator disparity. Note in fact (Fig. 2) that the delay varies considerably depending on the orientation (and thus disparity direction) of the noise pattern (the delay is significantly longer when the first-order response is faster, i.e., with vertical patterns; Table 1), which would not be expected under a serial processing scheme. A similar parallel processing scheme has been proposed to account for the ocular following responses induced by the barber-pole motion stimulus, which also exhibit an early first-order response and a delayed response to terminator motion (Masson et al., 2000; Barthélemy et al., 2010).
Although the IOC mechanism proposed to solve the stereo aperture problem with plaids cannot also extract terminator disparity, there are other neuronal mechanisms that could account for our results. In the context of the barber-pole motion illusion, it has been argued that at least part of the directional bias induced by the aperture might be attributed to local form information at the aperture edges, and not to terminator motion per se (Badcock et al., 2003). This would be a special case of the well established motion-from-form mechanism, in which static structure affects motion direction computations (Mather et al., 2013). We ran two experiments to directly test a hypothetical disparity-from-form mechanism (Figs. 3, 4). They failed to provide any evidence in support of such a mechanism, indicating that it either does not operate or is too weak under our experimental conditions to be of significance. Similarly, our results with anticorrelated patterns are incompatible with a mechanism that extracts 2-D features from the monocular images, and computes their disparity. However, at least three mechanisms might account for our results. First, area V1 in monkeys contains neurons (dubbed end-selective cells) that, when tested with a single dichoptic bar, are sensitive to the 2-D disparity of a bar ending falling within their receptive field (Howe and Livingstone, 2006). These neurons, and possibly V2 neurons whose response is affected by the disparity of features outside of their receptive field (Bakin et al., 2000), are potential sources of the responses that we have measured in our main experiments. From here on we refer to this mechanism as end selection. The second mechanism is the stereo equivalent of the two-pathway model proposed by Löffler and Orbach (1999) for extracting terminator motion. In that model, the output of a first-order Fourier pathway is combined with that of a second-order non-Fourier pathway, leading to an estimate of motion direction that approximates that of the terminators. We refer to this mechanism as non-Fourier. Finally, our simulations show that V1 complex cells located at the aperture edge, and with an orientation orthogonal to the aperture orientation (and thus off-axis relative to the 1-D noise stimulus), are also tuned to terminator disparity (Fig. 7A). We refer to this mechanism as off-axis terminator detection.
In addition to accounting for the presence of terminator responses in our main experiments (Fig. 2; Table 4), all three mechanisms are compatible with the delay we observed between pattern and terminator vergence responses (Table 1). End-selective cells initially respond equally well to the center and ends of the bar, with their preference for the bar ends emerging only after 10–20 ms (Howe and Livingstone, 2006), approximately matching the delay observed. Similarly, the non-Fourier mechanism has been explicitly modeled as slower than its Fourier counterpart. Finally, off-axis terminator detection relies on a signal carried by a small population of cells (those along the aperture), responding to a suboptimal (off-axis) stimulus. Hence, it is reasonable to expect that these weaker signals would lead to longer latencies.
Additional experiments will thus be needed to narrow down the mechanism(s) at play, and the novel stimuli that we introduced could prove helpful. It is well known that with anticorrelated patterns the sign of DVRs flips (Masson et al., 1997; Takemura et al., 2001). We found (Fig. 6) that when an anticorrelated 1-D noise pattern is seen through a zero-disparity aperture, the DVR component driven by terminator disparity reverses sign as well. Using texture-defined apertures (Fig. 5), we found that when the stimuli presented within and outside the aperture are correlated, the vergence response is significantly stronger than when they are uncorrelated. Unfortunately, it is not currently possible to use these data to definitively rule out any of the mechanisms outlined above. It is presently not known how end-selective cells behave with anticorrelated patterns, but it is certainly possible that their tuning might reverse. Similarly, their response to texture-defined edges has not been studied. The two-pathway model, which we have not explicitly simulated, might very well predict the reversal with anticorrelated stimuli, although it is hard to envision how it could respond to the texture-defined terminators. Finally, we found that anticorrelation reverses the tuning of off-axis units to terminator disparity. Furthermore, when presented with our texture-defined apertures, units oriented parallel to the aperture are capable of signaling its disparity, but only when terminators are present (Fig. 7B). The off-axis terminator detection mechanism has thus the potential of accounting for all of our results. However, this apparently simple mechanism would have to rely on a relatively complex read-out of local disparity signals, since schemes that pool responses across all orientations (Mikaelian and Qian, 2000) would predict very weak terminator-related responses, and could not explain the presence of early and late DVR components having different directions. It is also important to keep in mind that multiple mechanisms might be at work. For example, vergence and depth perception often respond quite differently to the same stimulus, indicating that they must rely on partially separate pathways. Accordingly, different mechanisms may underlie the vergence responses and the perceptual reports that we recorded. Similarly, different mechanisms might be responsible for the vergence responses to the various stimuli we used.
In conclusion, our findings provide compelling evidence that terminator disparities play a strong role in guiding vergence eye movements and depth perception. Although our findings constrain the type of mechanism that might be responsible for extracting terminator disparity, it is currently not possible to pinpoint a single mechanism, and multiple mechanisms might in fact cooperate. Valuable information to further our understanding of this aspect of binocular vision might be gained by using the stimuli introduced here in the context of neurophysiological recordings from visual cortex.
Footnotes
This work was supported by the Intramural Research Program of the National Eye Institute, National Institutes of Health, Department of Health and Human Services. We thank Dr. Boris Sheliga for assistance in running the experiments and for comments on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Christian Quaia at the above address. quaiac{at}nei.nih.gov