Abstract
A plane lying in depth is vividly perceived by viewing a random-dot stereogram (RDS) with a slight binocular disparity. Perception of a plane-in-depth is lost by reversing the contrast of dots seen by one of the eyes to generate an anticorrelated RDS. From a computational perspective, the visual system cannot find a globally consistent solution for matching the left and right eye images of an anticorrelated RDS. The neural representation of a global match should therefore be insensitive to binocular disparity in an anticorrelated RDS. Most neurons in the striate cortex (V1) respond to binocular disparity in anticorrelated RDSs, suggesting that further cortical processing in extrastriate areas is necessary to fully account for the matching computation. We examined neural responses to dynamic RDSs, both normal (correlated) and anticorrelated, in area V4 of the monkey visual cortex. More than half of the V4 cells were sensitive to the horizontal disparity embedded in a correlated RDS. Most of them greatly attenuated their selectivity for disparity when the RDS was anticorrelated. This attenuation was apparent from the response onset, and the degree of attenuation did not correlate with neuronal response latencies. Unlike the disparity tuning of V1 neurons to anticorrelated RDSs, that of V4 neurons was not an inversion of tuning to normal RDSs. Our results suggest that responses to false matches between contrast-reversed dots in the left and right eye images elicited in V1 are substantially reduced by the stage of V4.
Introduction
The visual system uses binocular disparity to derive the depth of visual targets and reconstruct three-dimensional scenes. The underlying computation involves determining which visual target in one eye corresponds to which of the numerous targets in the other, dubbed the “correspondence problem” (Julesz, 1971; Marr and Poggio, 1979).
In cats and monkeys, the first stage of stereoscopic processing is the striate cortex (V1) (Barlow et al., 1967; Pettigrew et al., 1968; Poggio and Fisher, 1977). Many V1 neurons are sensitive to horizontal binocular disparity embedded in a random-dot stereogram (RDS). A plane-in-depth can be derived by correctly matching patterns in the left and right eye images. Neural activity, as early as in V1, therefore was previously thought to reflect the global match solution of the stereo correspondence computation (Poggio et al., 1985, 1988). The responses of these neurons, however, are not the direct neural correlate of stereoscopic depth perception. When the contrast of an RDS presented to one of the eyes is reversed (see Fig. 1A), the stereo correspondence does not have a global-match solution, and the perception of a depth-plane is greatly diminished or abolished (Cogan et al., 1993). In contrast, although the tuning profile is inverted compared with the one obtained by a normal RDS, the disparity selectivity of V1 neurons is retained by the contrast reversal (Ohzawa et al., 1990; Cumming and Parker, 1997). Because of their local filter-like characteristics, most V1 neurons respond to local false matches that do not coherently constitute a plane lying in depth. Further processing in extrastriate areas is thus required for the rejection of false-match solutions (see Fig. 1B).
Random-dot stimuli and a schematic illustration of disparity tuning curves of cells with activity that correlates with depth perception. A, The left and center dot patterns correspond to the left and right eye images, respectively, of a cRDS. In this example, the center region of the cRDS patch has a crossed disparity. The center and right dot patterns are mutually anticorrelated in the center region, composing the left and right eye images, respectively, for an aRDS. The center region has the same binocular disparity for both cRDS and aRDS. B, Hypothetical response tuning curves of a cell selective to horizontal disparity. The thin solid curve represents the tuning profile to cRDS. The dashed curve is the tuning profile to aRDS in the case in which the neuronal response can be described by the disparity energy model, as is typically seen in V1. The thick solid curve is the tuning profile to aRDS when the neuronal response correlates closely with depth perception. The peaks and troughs should approach the overall response average and exhibit little or no disparity sensitivity.
Neural activity in the middle temporal area (MT/V5) is closely linked to the perceptual judgment in some stereoscopic depth discrimination tasks (Bradley et al., 1998; DeAngelis et al., 1998; Dodd et al., 2001; Uka and DeAngelis, 2003). Neither this area nor the medial superior temporal area (MST) appears to be the site at which the correspondence problem is solved, because most of these cells respond to a contrast-reversed RDS (Krug et al., 1999; Takemura et al., 2001). Neurons responding solely to global matches are found in the inferior temporal (IT) cortex (Janssen et al., 2003). A quantitative examination of neural responses at a stage between V1 and IT is necessary to understand how the correspondence computation is performed in the visual cortex.
Area V4, an intermediate stage along the ventral visual pathway, is involved in the processing of form, wavelength, and texture (Van Essen and Gallant, 1994), as well as binocular disparity (Hinkle and Connor, 2001; Watanabe et al., 2002). To examine the process of the false-match rejection, we studied the disparity tuning of V4 neurons to dynamic RDSs, both normal (correlated) and contrast-reversed (anticorrelated). A quantitative description of the disparity tuning of V4 neurons revealed that these cells reduced their disparity selectivity more than V1 neurons when presented with anticorrelated stimuli.
A portion of these results have been published previously (Tanabe and Fujita, 2003).
Materials and Methods
Surgery. Experiments were performed using one male and two female Japanese macaque monkeys (Macaca fuscata) weighing 4-8 kg. The detailed surgical procedures, performed under aseptic conditions and full anesthesia, have been described previously (Uka et al., 2000). Briefly, for each monkey, a plastic head holder was chronically cemented onto the skull with either plastic or stainless steel bolts drilled through holes in the skull. Acrylic resin was daubed onto the exposed part of the skull. A plastic recording chamber was placed over the prelunate gyrus, centered 25 mm dorsal and 5 mm posterior to the ear canals. Scleral search coils were implanted into both eyes to monitor eye movement (Judge et al., 1980). All animal care and experimental procedures were approved by the Animal Experiment Committee of Osaka University in compliance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals (1996).
Task and visual stimulation. Each monkey was seated with its head fixed in a primate chair. The chair was placed in front of a 21 inch screen (21MX, NuVision, Beaverton, OR) 57 cm away from the monkey's eyes. The screen subtended ∼40 × 30° of the monkey's visual field. A liquid crystal polarizing filter in front of the screen switched polarity every frame (120 Hz). Fixed polarity filter glasses were placed in front of the monkey's eyes. The filters in front of the left and right eyes were of orthogonal polarity, enabling dichoptic stimulation. Tasks were controlled using a commercial software package (TEMPO, Reflective Computing, St. Louis, MO). When a white dot (0.2 × 0.2°) appeared at the center of the screen, the monkeys were required to make an eye movement toward it within 500 msec. One second from the onset of the fixation point, an RDS was presented parafovealy. Monkeys were required to keep fixation within a range of 0.7-1.0° from the center of the fixation point. They also had to keep the vergence angle within ±0.5° of the screen plane. The RDS was removed after 1 sec of presentation. Monkeys were still required to keep fixation for an additional 500 msec. For maintaining their fixation during the task period, the monkeys were rewarded with a drop of juice or water. If the animals broke their fixation, the trial was immediately aborted, and the animals were not rewarded. Intertrial intervals were randomly selected to be either 1, 1.5, or 2 sec.
A computer graphics program was developed for visual stimulation using a graphics application programming interface (OpenGL Utility Toolkit). The RDS contained 50% dark (0.6 cd/m2) and 50% bright (3.5 cd/m2) dots presented on a midlevel background (1.5 cd/m2). The dots in the RDS were placed randomly; a new dot pattern was presented every 10 frames (12 Hz). The dot size was 0.17 × 0.35°, with a dot density of 26%. Subpixel antialiasing was achieved by the automatic function of the video board (Oxygen GVX1, 3Dlabs, Milpitas, CA). Disparity was only applied to the dots within the center portion of the RDS patch (see below). The outer annulus was maintained at zero disparity. The width of the outer annulus was 1° to ensure the absence of a positional shift in the monocular images, even at the largest tested disparities. For anticorrelated RDSs (aRDSs), the contrast of the dots within the center portion was reversed for only one of the eyes (Fig. 1 A, center and right).
Only red guns were used (except for the white fixation point), because the red phosphors have the shortest decay time, thus minimizing interocular cross talk. To measure the cross-talk intensity, we presented a bright rectangle (3.5 cd/m2) to the left eye only with a midlevel background. The luminance of the “ghost” image stimulating the right eye was 1.7 cd/m2, a 10% cross talk [(1.7-1.5)/(3.5-1.5)]. No measurable ghost image was detected in the left eye when the same stimulus was presented to the right eye. All luminance levels were measured after the lights passed through both the liquid crystal and fixed polarizing filters. The cross talk may reduce the luminance contrast if the dots overlap in an aRDS. This occurs only for zero disparity aRDS, because the minimum disparity step was 0.2°, whereas the dot width was 0.17°.
The binocular correlation of an aRDS was ∼20% smaller than that of a correlated RDS (cRDS) in our stimulus. The anticorrelation was thus not perfect; however, these binocular correlations were deviated with a nearly equal amount from the baseline correlation, which was slightly above zero. As long as the binocular correlation is equidistant from baseline, the contrast imbalance of the bright and dark dots from background does not affect the modulation amplitude ratio of the disparity tuning curves. It would only affect the baseline level of the disparity tuning curves, because all of our neural response data include responses to this baseline correlation.
Electrophysiology. One day or a few days before starting recording sessions, a recording hole was drilled through the skull inside the recording chamber under anesthesia. On each day of recording sessions, a tungsten-in-glass microelectrode (0.2-1.2 MΩ at 1 kHz) was set on a micromanipulator (MO-95S, Narishige, Tokyo, Japan) and attached to the recording chamber. Voltage signals were amplified and filtered using custom-made instruments and then monitored on an oscilloscope (COR-5521, Kikusui, Yokohama, Japan). Extracellular action potentials from a single neuron were isolated using either a custom-made window discriminator or template-matching software (Multi Spike Detector, Alpha-Omega Engineering, Nazareth, Israel). The discharge timings were recorded at a 1 msec resolution for subsequent data analysis. Voltage signals were stored on disks at a 20 kHz sampling rate. The positions of both the left and right eyes were monitored using magnetic search coils (MEL-25, Enzansi Kogyou, Tokyo, Japan) with analog-to-digital conversion at a 1 kHz sampling rate.
After isolating action potentials from a single cell, we estimated the classical receptive field (RF) of the cell using the stimulus that elicited the strongest response: a bright bar, a small patch of drifting sinusoidal grating, or a small RDS patch. We then presented an RDS slightly larger than the RF, such that the center portion of the RDS to which the horizontal disparity was applied completely covered the RF. Stimuli extending beyond the RF suppressed the responses of a subset of V4 cells (Desimone and Schein, 1987). For recordings from such cells, the size of the RDS patch was successively reduced until a sufficient visual response was elicited. Before the recording sessions were performed for each monkey, we identified the location of area V4 within our recording chamber based on the RF size-eccentricity relationship of neurons, the visuotopic map, and the estimated locations of the surrounding sulci (Gattass et al., 1988; Watanabe et al., 2002).
For all of the cells from the first monkey and several cells from the second, the tested disparity levels ranged from -0.8 to +0.8° at a 0.2° step or -1.6 to +1.6° at a 0.4° step (negative and positive values refer to crossed and uncrossed disparities, respectively). Stimuli were presented in blocks of randomly interleaved trials with 21 stimulus conditions: 9 disparity levels for cRDS and for aRDS, left eye only and right eye only presentations, and a binocularly uncorrelated RDS. For most of the cells from the second monkey and all of the cells from the third, the tested disparity levels were -1.6, -1.2, -0.8, -0.6, -0.4, -0.2, 0, +0.2, +0.4, +0.6, +0.8, +1.2, and +1.6°, bringing the total of stimulus conditions to 29. Stimuli were presented at least six times (median 10) for each condition.
Histology. After all recording experiments were completed, one of the monkeys was anesthetized with an overdose of pentobarbital sodium (60 mg/kg body weight). The monkey was transcardially perfused with PBS and then with 4% paraformaldehyde. Four pins were inserted into the brain at the corners of the recording chamber. A block of the brain tissue delineated by the pins was cut into 50 μm frozen coronal sections, and the sections were stained by standard Nissl staining methods. Scars of electrode penetrations were found only in the prelunate gyrus; the recording site was histologically identified as area V4. The other two monkeys are still alive and being used in related experiments.
Data analysis. The firing rate of each stimulus condition was calculated by counting the number of discharges during the 1 sec period from 80 msec after the onset to 80 msec after the offset of RDS presentation. The v-sync pulses for the display were used to align spikes accurately with the timing of stimulus presentation. Ongoing (spontaneous) firing rates were calculated from the 250 msec period immediately before stimulus onset. In this period, the monkey had already finished making an eye movement to the fixation point, whereas the RDS had not yet been presented.
We assessed the disparity selectivity of each cell by calculating the disparity discrimination index (DDI) (Prince et al., 2002; DeAngelis and Uka, 2003). First, we square-root transformed the response firing rate data of each trial. This transformation reduces the typical proportional relationship of the mean and variance of the firing rate across trials (Tolhurst et al., 1981; Shadlen and Newsome, 1998), allowing pooling of all response variabilities across different stimulus conditions. The DDI was then calculated for cRDS and aRDS separately, as follows:
where Rmax and Rmin are the maximum and minimum mean responses among the disparities tested. SSE is the sum of the squared error of the responses at each disparity. N is the total number of trials, and M is the number of disparities.
Disparity tuning curves were quantitatively described and analyzed by fitting a Gabor function and extracting its parameters. By denoting the disparity as x and the response as R, this function is expressed as follows:
The disparity tuning curves to cRDS and aRDS were fit with the four parameters, y0, x0, σ, and f, using the same values for both curves. Two parameters, A and φ, were selected independently for the tuning curves for cRDS and aRDS. The fit was performed with the following six constraints. (1) The baseline offset, y0, was constrained to values between zero and the maximum response to all trials of cRDS. (2) The amplitude of the Gaussian envelope, A, was constrained to values between zero and twice the difference between the maximum and minimum responses to all trials. (3) The horizontal offset of the Gaussian envelope relative to zero disparity, x0, was constrained to values within the disparity range being tested. (4) The width of the Gaussian envelope, σ, was constrained to values between 0.1° and the total range of tested disparities. (5) The disparity frequency was estimated from the power spectrum of the mean data points in response to cRDS (Prince et al., 2002). Next in the fitting procedure, the frequency of the cosine carrier, f, was constrained to ±10% of the disparity frequency. (6) The phase, φ, relative to the center of the Gaussian envelope was constrained to be within ±3 π. To describe the disparity tuning of each neuron, the “fmincon” function in MATLAB's Optimization Toolbox (MATLAB, The Mathworks, Natick, MA) was used to extract the parameter combination that minimized the summed squared-error of responses in all trials using the above constraints. We calculated the summed squared-error from a half-wave rectified Gabor function to tolerate for data points that were clipped near zero discharge rate.
Processing to reduce disparity selectivity for aRDS must take place somewhere higher in the visual processing hierarchy than V1. We examined two possible effects that this processing could have on the temporal pattern of neural responses. First, if the reduction of response selectivity is achieved through neural activity propagating recurrently within a neuronal network or through feedback signals from higher areas, then at the onset of the neural response, this activity might display the characteristics of the cells providing the primary input. With time, the selectivity to disparity in aRDS would be progressively reduced. Given that onset delay is ∼10 msec per cortical area and that both feedforward and feedback connections have similar conduction velocities (Nowak and Bullier, 1997), the fastest feedback is expected to have a delay of 20 msec from response onset in V4. On the other hand, a recent report shows that lateral interaction in V1 has a delay of ∼20 msec compared with the initial excitation (Ringach et al., 2003). Therefore, the fastest expected latency of response changes attributable to feedback or lateral interaction is ∼20 msec from response onset in V4. This estimate is based on signals along the same pathway. If we consider signals from areas MT, MST, and frontal eye field, responses in V4 can be modified at the onset. Average latencies, however, are expected to be magnitudes longer, because latencies are generally diverse for neurons in a given cortical area (Nowak and Bullier, 1997; Schmolesky et al., 1998). Second, not all cells in V4 may constitute the same processing stage but may instead constitute multiple stages of processing. Cells at earlier stages receiving direct input from earlier areas presumably have relatively shorter response latencies (Maunsell and Gibson, 1992; Raiguel et al., 1999), and the level of each cell within the processing hierarchy was estimated by examining their response latency. Instead of defining the neuronal response latency as the time at which the mean discharge rate crosses a predefined level, we used the Poisson spike train analysis (Legéndy and Salcman, 1985; Hanes et al., 1995; Thompson et al., 1996; Schmolesky et al., 1998; Nieder and Wagner, 2001) by taking advantage of the null hypothesis that spike sequences follow a Poisson process. This method estimates the latency of neural response for every trial based on a statistical criterion; a single trial is sufficient to estimate the latency of a cell. Moreover, the estimation accuracy is available when multiple trials are tested.
Given the observed spike count during the entire period of each trial, we obtained a probability density function, g(r), of the spike count distribution for the hypothetical Poisson process. The probability of obtaining a higher spike rate than r = R is equal to the area under g(r) exceeding R. Spike rates were calculated successively for time intervals between the first spike after stimulus onset and subsequent spikes. As more spikes intervals are included, the probability of observing a higher spike rate than the observed spike rate decreases until spikes become sparse. If this probability does not fall below p = 0.05, then we judge that a burst of firing (i.e., visual response) never occurred in that trial. If this probability does fall below p = 0.05, the timing of the spike minimizing the probability is determined as the end of the burst firing period. Next, the time intervals between the last spike of the burst and preceding spikes were analyzed. The spike minimizing the probability of observing a higher spike rate than the observed one was determined as the beginning of the burst firing. We used only spike sequences within 300 msec after the stimulus onset from the three stimuli giving the highest mean responses. The median latency of the trials analyzed in each cell was determined as its response latency.
Psychophysics. To ensure that stereoscopic depth perception is abolished with our aRDS stimuli in monkeys, we trained one of the monkeys to discriminate between ±0.2° disparity with cRDS. The monkey was required to report the perceived depth (“near” or “far” relative to the screen) of the RDS by making an eye movement to one of two saccade targets (same color and size as the fixation point) that was presented 0.5 sec after the RDS was turned off. Targets were placed 5° to the left or right of the center of the screen. The monkey was rewarded after making an eye movement to the left target when the disparity of the RDS was +0.2° or to the right target when the disparity was -0.2°. In the first 6 d of training, only the correct target was turned on to guide the monkey's eye movement. In the next series of behavioral experiments, again only the correct target was illuminated in 9 of 10 trials. In 1 trial of 10, both targets were turned on, invoking a two-alternative forced-choice (2AFC) task. The proportion of correct choices was calculated from these 2AFC trials. After 16 d of training with cRDS, we interleaved 2AFC trials with aRDS presentation. In a block of 10 trials within this set of behavioral experiments, the monkey was confronted with 2 2AFC trials, 1 with cRDS and 1 with aRDS.
Results
Psychophysical performance
In 12,878 trials (single choice plus 2AFC) during the 16 d after the initial 6 d training, the monkey reached 83% correct judgments in the 2AFC task using cRDS. In the 12,245 trials (16 d) of aRDS-interleaved experiments, the proportion of correct trials for cRDS was preserved (62-81% correct; last 10 d, binomial test; p < 0.0001); however, the proportion of correct trials for aRDS showed no tendency to increase but remained fluctuating near the chance level (41-59% correct; last 10 d, binomial test; p > 0.8). These behavioral results demonstrate that learning of stereoscopic depth discrimination was not possible with the aRDS within this time period. Our data also indicate that discrimination performance failed to transfer to aRDS stimuli. We reproduced and extended the behavioral results of previous studies (Cumming and Parker, 1997; Janssen et al., 2003) that stereoscopic depth perception is abolished with aRDS in monkeys.
Responses to cRDS and aRDS
We examined the responses of 139 V4 cells in at least six trials of each stimulus condition. Of these cells, 115 cells were visually responsive to at least one of the stimulus conditions (t test; the significance level p = 0.05 was divided by the number of stimulus conditions to correct for multiple comparisons). The 115 visually responsive cells were the database for the following analysis.
More than half of the tested cells (66 of 115; 57%) were sensitive to horizontal disparities in cRDS (Kruskal-Wallis test; p < 0.05). Although only 24 of the 66 cells altered their responses depending on horizontal disparity in aRDS (p < 0.05), most of disparity selective cells (42 of 66; 64%) lost their sensitivity when the RDS became anticorrelated (Kruskal-Wallis test; p > 0.05). Only four cells were sensitive to disparities in aRDS, whereas they were insensitive to disparities in cRDS. Typical V4 cell responses to cRDS and aRDS are shown in Figure 2. This neuron demonstrates vigorous responses to -0.2° horizontal disparity in cRDS but not to +0.2° disparity (Fig. 2A). For aRDS, this neuron responded only minimally to either crossed or uncrossed disparities, although slight increases in the discharge rate were visible. The disparity tuning curve for this cell to cRDS exhibits the preference for a range of crossed disparities (Fig. 2B, filled circles). Responses to aRDS did not differ across different disparities (Kruskal-Wallis test; p > 0.05). The insensitivity for disparity in aRDS is captured by the flat tuning curve (Fig. 2B, open squares).
Responses of a V4 cell that lost sensitivity to horizontal disparities of anticorrelated stimuli. A, The raster plots and the corresponding PSTHs for the preferred disparity of the cell to cRDS of -0.2° (left column) and nonpreferred disparity of +0.2° (right column). The top row represents responses to cRDS, whereas the bottom represents responses to aRDS. B, The horizontal disparity tuning curve to cRDS (solid circles) and aRDS (open boxes). Error bars show the mean ± SEM of 10 trials. The Gabor functions fitted to the raw data points are superimposed (solid and dashed curves, respectively). The amplitude ratio for this cell was 0.06. The bottom dotted line represents the ongoing activity level, whereas the mean responses to left- and right-eye monocular presentations are shown on the right-hand side, marked as L and R, respectively. The U mark indicates the mean response to uncorrelated RDS. C, The time-averaged vergence angle of the stimulation duration with respect to the prestimulus period is plotted as a function of horizontal disparity of the stimulus. Error bars represent the mean ± SD.
If the monkey adjusted the vergence angle to allow the RDS to come into registration (zero retinal disparity), then the vergence angle would have a positive correlation with the image disparity on the screen. The vergence response of the monkey was unaltered by disparity (two-way ANOVA; p > 0.1), by the sign of binocular correlation (p > 0.5), or by the interactive effect of disparity and sign of binocular correlation (p > 0.9) (Fig. 2C). Thus, vergence eye movements account for neither the sensitivity of the cell to disparity nor its attenuation of disparity selectivity for aRDS. The time-averaged vergence angle depended on stimulus disparity in only 9 of the 115 cells (8%) and on the sign of binocular correlation of the stimulus in 17 of the 115 cells (15%; two-way ANOVA; p < 0.05).
Examples of the responses of four additional V4 cells are shown in Figure 3. The top row demonstrates two cells with disparity tuning that was lost to aRDS (Kruskal-Wallis test; p > 0.05). A substantial portion of cells tested (Fig. 3A) responded to both cRDS and aRDS at all disparities but demonstrated disparity modulation only to cRDS. Similar to the cell shown in Figure 2, some cells responded only weakly to all disparities for aRDS but responded vigorously to a particular range of disparities for cRDS (Fig. 3B). Because of an abrupt drop in the responsiveness of the cell near zero disparity, the fitted Gabor function was clipped at zero firing rate. Although the response modulation is weak, the bottom row exhibits two cells retaining their disparity selectivity to aRDS (Kruskal-Wallis test; p < 0.05): one exhibiting an inverted profile (i.e., phase shift by 127) compared with that for cRDS (Fig. 3C) and the other with a phase shift by 0.15 π from the disparity tuning profile for cRDS (Fig. 3D).
Horizontal disparity tuning curves and the fitted Gabor functions for four additional cells. The other conventions are the same as seen in Figure 2. The amplitude ratios for these cells are 0.07, 0.15, 0.46, and 0.43, respectively.
Disparity discriminability
DDI was calculated for all of the 115 visually responsive cells. The attenuation of disparity selectivity was assessed by changes in DDI values from cRDS to aRDS. For the cell examined in Figure 2, the DDI to cRDS and aRDS was 0.62 and 0.25, respectively. For all cells tested (n = 115), the median of the distribution of DDI to cRDS and aRDS was 0.45 and 0.36, respectively (Fig. 4, top and right histograms, respectively). The median DDI of cells with statistically significant disparity selectivity for cRDS (n = 68) was 0.54 for cRDS and 0.39 for aRDS. The scatter plot demonstrates that disparity-insensitive cells (open circles) lie close to the diagonal line and do not exhibit any differences between the values for cRDS and aRDS (Wilcoxon test; p > 0.06). Disparity-selective cells for cRDS (filled circles) displayed higher values of DDI to cRDS than to aRDS (Wilcoxon test; p < 0.0001), resulting in the data points distributed below the diagonal. There was a positive correlation between DDI to cRDS and DDI to aRDS (r = 0.51; p < 0.0001). Sensitivity of V4 neurons to horizontal disparities thus was not completely lost by contrast-reversal of dots in an RDS. We assessed decreases in disparity selectivity for each cell by subtracting the DDI to aRDS from that to cRDS for each cell. A positive value for this DDI difference indicates the attenuation of disparity selectivity for aRDS. The decline of disparity selectivity is conspicuous: the distribution of DDI differences (top right) exhibits a clear shift toward positive values in disparity-selective cells (Wilcoxon test; p << 0.0001) but not in disparity-insensitive cells (p > 0.06). These results show that although V4 cells subtly retained their sensitivity for disparity by anticorrelation, cells highly sensitive to disparities showed a greater decrease of disparity sensitivity than that seen in disparity-insensitive cells.
Disparity discriminability across the whole population of V4 cells analyzed (n = 115). The center panel depicts a scatter plot of DDI to cRDS on the horizontal axis and DDI to aRDS on the vertical axis. Cells displaying statistically significant disparity modulation to cRDS are plotted in black (Kruskal-Wallis; p < 0.05). The dashed diagonal line indicates the level of no difference in disparity discriminability regardless of whether the RDS is correlated or anticorrelated. The top and right histograms exhibit the distribution of DDI from the same data set. The top right histogram shows the distribution of DDI differences between correlated and anticorrelated conditions, with positive values representing reduced discriminability to an aRDS.
Gabor-fitted parameters
We fitted Gabor functions to the disparity tuning curves of all cells that had statistically significant disparity selectivity either for cRDS or aRDS (n = 70). The Gabor function provided a fairly good fit for the disparity tuning in most cells examined (Fig. 5A). The goodness-of-fit values (R2) for cRDS are 0.90 for the cell in Figure 2 and 0.79-0.99 for the cells in Figure 3. The majority of neurons demonstrated a relatively high goodness-of-fit (R2 > 0.6) either for cRDS or for aRDS (Fig. 5A). Two cells exhibiting poorly fitted tuning curves are shown in Figure 5, B and C. The 11 cells including these 2 cells for which the R2 fell below 0.6 for both cRDS and aRDS were discarded in the following analysis, because the Gabor functions were not adequate to describe the tuning profiles.
Quality of the Gabor function fit. A, The goodness-of-fit, R2, for cRDS on the horizontal axis and for aRDS on the vertical axis is shown for all cells subject to the Gabor function fit. The dotted lines represent the borders of the criterion, R2 > 0.6, for subsequent data analysis of Gabor function parameters. B, C, Horizontal disparity tuning curves and the fitted Gabor functions of two poorly fit cells are shown. The labeling conventions are the same as seen in Figure 2.
The ratio of the amplitude parameter, A, between the cRDS disparity tuning curve and the aRDS tuning curve gives a quantitative measure for declines in disparity sensitivity. On the other hand, the inversed profile of the tuning curve would appear as a π shift in the phase, φ. Two-dimensional scatter plots of these two measures of V4 cells indicate that most of the data points possess amplitude ratio values substantially <1 (Fig. 6A). This result demonstrates that the modulation is reduced to aRDS in most V4 neurons. The median amplitude ratio was 0.24 (mean, 0.38). A direct comparison with V1 data from a previous study (Cumming and Parker, 1997) shows that the amplitude ratio of V4 cells is significantly lower (Fig. 6B) (Mann-Whitney test; p < 0.05). Although phase differences of V1 cells tend to be concentrated near π, phase differences of V4 cells were uniformly distributed from 0 to 2 π (χ2 test; p > 0.09).
Quantitative analysis of the tuning curve profiles. A, Gabor amplitude ratio is plotted against the phase difference between the disparity tunings to cRDS and aRDS for V4 neurons (n=59). The open square indicates where the plot would lie if the responses were perfectly described by the disparity energy model(Ohzawa et al., 1990). The distributions of phase differences and amplitude ratios are plotted in the top and right histograms, respectively. Filled symbols represent cells that have significant disparity sensitivity to both cRDS and aRDS (Kruskal-Wallis test; p < 0.05). B, The same plots for V1 neurons (n = 72) studied by Cumming and Parker (1997).
Fitting a Gabor function to disparity tuning data provides several advantages in estimating disparity modulation over simply calculating the peak-to-trough ratio of raw data. Because neural activity in V4 originates in V1, one would expect some characteristics of V1 responses to remain in V4. The similarity, as well as the difference, in the response properties between V4 and V1 cells can be captured by a quantitative comparison of fitted function parameters between the two areas. In practice, measures of the tuning profile derived from the fitted parameters are more robust to noise than from the raw data. The fitting algorithm was also tailored to tolerate the iceberg effect, i.e., clipping of tuning curves at zero spike rate. Despite these advantages of fitting a Gabor function to the tuning curves, it is important to carefully evaluate the fitting results. We compared the ratio of modulation amplitude from the raw data with that from the Gabor amplitude parameter, A (Fig. 7A). To estimate the raw modulation amplitudes, we analyzed the peak-to-trough amplitude, Rmax-Rmin, of raw plots for the disparity tuning for cRDS as well as aRDS and then calculated their ratio. In the scatter plot of the ratios obtained from raw and fitted tuning curves, data points from most cells lay below the diagonal line, demonstrating that the amplitude ratio estimated from Gabor fitting was lower than that estimated from the raw data (Wilcoxon test; p < 0.0001). To differentiate whether this discrepancy is attributable to modulation estimates for cRDS or those for aRDS, the modulation estimates from Gabor parameters were compared with those from the raw data for cRDS and aRDS. There were no significant differences in the disparity modulation for cRDS when estimated either by the Gabor parameter, A, or from the raw data (Fig. 7B) (Wilcoxon test; p > 0.06). In contrast to that for cRDS, the disparity modulation for aRDS was substantially smaller when estimated by the Gabor parameter, A, than when estimated from the raw data (Fig. 7C) (Wilcoxon test; p < 0.0001). These results indicate that fitting Gabor functions to disparity tuning resulted in a smaller estimate of the tuning amplitude compared with the raw data for aRDS only.
A, The amplitude ratio calculated from the fitted Gabor function is plotted against the modulation ratio calculated from peak-to-trough of the raw mean responses. B, The amplitude of the fitted Gabor function is plotted against the peak-to-trough modulation of the raw mean responses for cRDS disparity tuning. C, The amplitude of the fitted Gabor function is plotted against the peak-to-trough modulation of the raw mean responses for aRDS disparity tuning. D, Histogram of the minimum mean responses to aRDS is plotted.
We checked whether the underestimation of the tuning amplitude for aRDS was caused by our fitting constraints. The constraint for the frequency, f, of the cosine carrier was loosened, because this constraint is relatively tighter than the others. The constraint for f was loosened from ±10 to ±50% of the mean of disparity frequency for cRDS and aRDS. All of the 70 cells were refitted with the loosened constraint. There was no statistically significant change in the estimated amplitude ratio (Wilcoxon test; p > 0.8). Some data points fell below the R2 > 0.6 criterion, and some other data points that failed this criterion in the original fit passed with the loosened constraint. The median amplitude ratio for cells that passed the criterion was 0.30 (mean, 0.43). Loosening the fitting constraint altered only moderately the median and the mean amplitude ratios.
If we expand the scale of Figure 7B and look into cells that have both Gabor amplitude and peak-to-trough for cRDS <30 spikes/sec, the Gabor amplitude is systematically smaller than the peak-to-trough (Wilcoxon test; p < 0.004). Thus for small modulation curves, whether the curves are for cRDS in Figure 7B or for aRDS in Figure 7C, the amplitude of Gabor function is smaller than peak-to-trough. If the response magnitude of the small modulation went down near zero, then the variability should also drop, because the variance/mean ratio is typically unity for cortical neurons; however, baseline responses of V4 neurons to RDSs are typically much higher than the spontaneous level. Therefore, variability does not necessarily change drastically between the responses to preferred and to baseline. This accounts for a better smoothing effect with curve-fitting for small modulation data than for large modulation data.
The smoothing effect of fitting a Gabor function is visible in Figure 3, A and B. The modulation of responses in the raw plots for aRDS is larger than the amplitude of the fitted Gabor function. The modulation in the raw plots, however, does not systematically change with disparity. Therefore, the modulation conceivably reflects neuronal noise that was not completely averaged out within the number of trials tested. For cells in which the modulation was systematic, as those illustrated in Figure 3, C and D, the Gabor function fitted the raw plots fairly well.
In nature, neurons cannot transmit negative signals through their discharge rates. Therefore, the tuning curve can be clipped at the trough if the firing rate goes down to zero. This threshold property could possibly bias the amplitude ratio toward lower values. The distribution of the firing rate at the trough of the tuning curve, Rmin, to aRDS is shown in Figure 7D. Most of the cells had >4 spikes/sec at the trough of their tuning curves, and only in four cells was the trough <4 spikes/sec. We analyzed whether the ratio between the tuning amplitude estimated with Gabor fitting and with the raw data for aRDS correlated with the ongoing firing rate or with the vertical offset of the Gabor function, y0. We did not observe, however, significant correlation in either of these examinations (r = -0.11, p > 0.3 for the former; r = -0.12, p > 0.3 for the latter). This result was confirmed using the amplitude ratio; no significant correlation was found with either the ongoing firing rate (r = 0.04; p > 0.7) or the vertical offset of the Gabor function, y0 (r = 0.05; p > 0.7). When clipped, the tuning amplitude of even-symmetric tuning curves is more likely underestimated by Gabor fitting than that of odd-symmetric ones. We tested whether the ratio between the amplitude derived from Gabor fitting and from the raw data for aRDS correlated with the symmetry of the Gabor function for aRDS. The symmetry was assessed as the absolute value of the Gabor phase for aRDS relative to π/2 or 3 π/2, depending on whether the phase is between 0 and π or between π and 2π. We could not identify a significant correlation (r = 0.20; p > 0.1). These results demonstrate that the clipping effect imposed by the threshold did not bias our estimate of the amplitude ratio toward lower values.
If the disparity selectivity of cells in V4 merely reflects signals originating in V1 and conveyed to V4, then the tuning profile of V4 cells should inherit the characteristics of V1 cell responses. The phase difference of the fitted Gabor function between cRDS and aRDS, however, did not cluster around π (Fig. 6). We examined whether this lack of clustering was an artifact produced by an inaccurate estimation of the phase parameters to the flat tuning curves of V4 neurons for aRDS. The accuracy of the estimated phase difference was assessed with the width of the 95% confidence interval of the phase difference. The confidence interval of any of the parameters, as well as the confidence interval of a linear combination of any of the parameters, is given by the elements of the covariance matrix (Press et al., 1992). The covariance matrix, C, is given by C = (H/2)-1, where H is the Hessian matrix. The confidence interval obtained in this way gives the range in which one would find estimated parameters, assuming that the estimated values are distributed normally. For a circular variable, this is not the case when the distribution width is close to one cycle. For example, the 2σ of a uniformly distributed phase is larger than 2π. Precaution is needed in interpreting the confidence interval of a non-normally distributed parameter. Following the convention, we use it simply as a measure of estimation accuracy. The width of the 95% confidence interval of the phase difference was plotted as a function of the estimated phase difference (Fig. 8). We did not observe a tendency for cells with a smaller confidence range than the median (<0.20 127), i.e., cells with a relatively accurate estimation of phase difference, to cluster between the phase differences 0.5π and 1.5π (χ2 test; p > 0.1). For a handful of cells, we also analyzed the confidence interval of phase difference with the bootstrap method that does not assume the Gaussian distribution of noise. Again, we did not see a tendency for phase-difference estimates near zero to have wider confidence intervals than estimates near π, even with the bootstrap method. This indicates that the uniform distribution of phase difference did not result from inaccurate fitting. The results suggest that responses of V4 neurons do not simply reflect the propagation of activities initially elicited in V1, but rather reflect further processing of disparity information downstream of V1.
The width of 95% confidence interval of the phase difference is plotted against the estimated value of the phase difference. Data for cells with confidence ranges >2π are plotted on the ceiling. Filled symbols represent cells that have significant disparity sensitivity to both cRDS and aRDS (Kruskal-Wallis test; p < 0.05).
Response time course
To examine whether transformation of a V1-type inverted tuning profile to a flat tuning profile gradually proceeds as time evolves, we analyzed the temporal profile of V4 responses. We created the poststimulus time histograms (PSTHs) for each cell for four stimulus conditions: preferred and nonpreferred disparities to cRDS and the corresponding disparities to aRDS. For each cell, the response magnitude was normalized such that the maximum of the four PSTHs was assigned a value of 1. The four PSTHs were then averaged across the 59 cells for which the tuning curves were analyzed with Gabor functions (Fig. 9A). We observed a difference in the responses to preferred and nonpreferred disparities to cRDS at the beginning of the response (Fig. 9A, thick solid and thick dashed lines). In contrast, responses to the two disparities in aRDS remained at similar levels during the entire time course of the responses (Fig. 9A, thin solid and thin dotted lines) (linear discriminant analysis; p > 0.1). This result may seem to contrast with the result that highly disparity-sensitive cells for cRDS retained their disparity sensitivity, albeit weakly, for aRDS (Fig. 4). One possible explanation for this discrepancy is that the preferred and nonpreferred disparities were determined from cRDS responses, and the phase difference between the cRDS and aRDS tuning curves is distributed uniformly (Fig. 5). Therefore, response differences, if any, may have been averaged out by summing responses across neurons; however, if either the recurrent propagation of activity within V4 or the feedback activity from higher areas is involved in reducing disparity selectivity for aRDS, at the response onset, V4 neurons should reflect the characteristics of V1 neurons. This reflection should be observed as a separation in the responses to the two disparities for aRDS (Fig. 9A, thin solid and thin dotted lines). The absence of separation in the responses suggests that selectivity to disparities in aRDS is reduced either in a feedforward manner or in earlier areas before signals reach V4.
Temporal profile of responses of V4 neurons. A, Average normalized PSTHs of cells with tuning curves that were fit to a Gabor function are plotted as a function of time from stimulus onset. Thick solid and thick dashed lines represent the responses to cRDS at preferred and nonpreferred disparities, respectively. Thin solid and thin dotted lines indicate the responses at the same disparities, with anticorrelated RDS. The inset indicates the line styles and the corresponding responses in a disparity tuning curve. B, The amplitude ratio is plotted against the response latency. The population of cells is the same as in A.
If processing within the neuronal network of V4 cells is involved in the reduction of selectivity to disparity in aRDS, then cells receiving direct input from earlier areas should have higher amplitude ratios. Assuming that response latency correlates with the number of processing stages involved, we would expect a correlation between the latency and the degree of reduction of disparity tuning for aRDS. The latency was distributed across a wide range of values (114 ± 37 msec; n = 59). No correlation, however, was found between the amplitude ratio and the response latency (r = -0.1; p > 0.4) (Fig. 9B). Because there were abundant cells with low amplitude ratios possessing the shortest response latencies, this result may reflect considerable progression of false-match rejection by the time visual signals arrive at V4. If one assumes that this reduction takes place inside a single cortical area, then the candidate site would be an area earlier than V4, possibly V2. Response properties of neurons, however, can also be created by convergence of inputs across areas. For instance, orientation selectivity of V1 cells is achieved mainly by convergence of lateral geniculate neurons. In analogy, reduction of responses to aRDS could be achieved by V2 neurons converging onto a V4 neuron. In this way, whether the reduction of disparity selectivity for aRDS is accomplished inside V2 or at the V4 site where projections from earlier areas terminate is equally possible.
Discussion
We studied the horizontal disparity tuning of cells in V4 to dynamic RDS. V4 neurons were sensitive to binocular disparities embedded in a dynamic cRDS. In contrast to previous studies using solid bar stimuli (Hinkle and Connor, 2001; Watanabe et al., 2002), we exploited dynamic RDS stimuli to isolate response modulations attributable to binocular interaction from those attributable to positional shifts of stimuli. The results confirm that a population of V4 neurons is genuinely binocular disparity selective (Hegdé and Van Essen, 2001). These cells, including those with the shortest response latencies, substantially reduced their selectivity to disparities when the RDS was contrast-reversed in one of the eyes. This result suggests that responses to false matches are mostly rejected by the stage of V4. Testing of a wide range of disparities allowed a quantitative description of the profiles of disparity tuning curves. This approach enabled a comparison of V4 with other cortical areas tested in similar experimental paradigms, giving an insight into where and how the correspondence problem is solved in the visual cortex.
Disparity selectivity tested with RDS
As a prerequisite for neurons to transmit or represent depth information from disparity, neuronal responses must be reliably modulated by the disparity in cRDS. The DDI analysis allows us to compare the reliability of the disparity selectivity of V4 neurons with that of V1 (Prince et al., 2002) and MT (DeAngelis and Uka, 2003). The median value of DDI in V4 is 0.44 compared with 0.54 in V1 and 0.74 in MT. To evaluate the relative reliability of these areas for coding stereoscopic depth, it is necessary to take into account at least three factors. First, the selection of neurons based on their disparity sensitivity biases the population data. Unlike the V1 study, the MT study as well as our sample included all visually responsive cells regardless of their disparity sensitivity. Second, neural responses are generally more reliable for longer stimulus durations. The stimulus duration in this study was the shortest; we adopted a 1 sec duration for V4, whereas the other studies adopted 1.5 sec for MT and 2 sec for V1. MT neurons show only a mild decrease in disparity discriminability for shorter durations. Third, the stability of fixation by the subjects, especially in the depth direction, can also have an impact on the DDI value if neurons are selective to the “absolute” disparity (Cumming and Parker, 1999). Despite fluctuation being inevitable using behaving subjects, we controlled the vergence fluctuation to be within ±0.5° during the experiments. The vergence control in the MT study was similar to ours, but it is not explicitly stated in the V1 study. The available information about the database across these three areas indicates that MT has the most reliable disparity sensitivity, whereas V1 and V4 have comparable disparity sensitivities.
Neural correlates of stereoscopic depth perception
Analysis of the amplitude of the disparity tuning curve by fitting the raw data to a Gabor function indicates that the correspondence computation is advanced in V4 (mean amplitude ratio, 0.40; median, 0.24) in comparison with V1 (mean, 0.52; median, 0.39) (Cumming and Parker, 1997). The reduction of responses to contrast-reversed stimuli is even more prominent in comparison with cat V1 neurons (mean amplitude ratio, 0.79) (Ohzawa, 1998). We confirmed that this finding was not an artifact resulting from inaccurate fitting (see Results). An additional comparison across different areas would be made based on the distribution of DDI to cRDS and aRDS, which does not assume a fitting of tuning curves by any mathematical function. A comparison of these values was not possible, however, because data are not available from the literature for areas V1, MT, or MST. We predict that if the distribution of DDI of V1 cells were superimposed onto our data (Fig. 4), these values would be shifted toward the top right corner, lying between the diagonal line and a regression line for the V4 data.
An issue related to the reduction in disparity selectivity for aRDS is the possible effects of attention. In an orientation discrimination task, attention increases the amplitude of the orientation tuning curve of V4 neurons by a ratio of 1.26 (McAdams and Maunsell, 1999). If the same ratio applies to disparity tuning curves, and attention is always directed toward cRDS and away from aRDS, then our attention-corrected estimate of the mean amplitude ratio is 0.38 × 1.26 = 0.48. Although this is comparable with the mean amplitude ratio for V1 (0.52) (Cumming and Parker, 1997), attentional modulation alone cannot explain the low amplitude ratio observed in V4. Although attentional modulation increases with time (McAdams and Maunsell, 1999), the amplitude ratio of V4 cells did not change with time (Fig. 9A). We consider it unlikely that attention is a major factor for causing the amplitude ratio to be reduced in V4.
A portion of V1 cells reduce their disparity selectivity when the stimulus is anticorrelated (Cumming and Parker, 1997) (Fig. 6B). Such responses may reflect feedback from higher cortical areas that are responsible for solving the correspondence problem (Ohzawa, 1998). Modification of the disparity energy model without any feedback components, however, can also describe these responses (Read et al., 2002). In V4, rejection of responses to false matches did not likely result from feedback, because no changes were observed in the response time course. These results beg the following question: do V4 neurons receive their inputs predominantly from cells with low amplitude ratios in earlier areas? The V1 cells with low amplitude ratios and various phase differences may specifically feed input to V4. Alternatively, processing may progressively advance as disparity signals propagate along the visual processing hierarchy, gradually losing the characteristics of the neural responses seen in earlier areas. At the final stage of the ventral visual stream, IT neurons lose their sensitivity to surface concavity and convexity defined by disparity gradients for aRDS (Janssen et al., 2003). These data suggest that the stereo correspondence problem is “fully” solved by the stage of IT. To determine whether processing advances progressively along the hierarchy, continuing to advance from V4 to IT, it will be helpful to examine the disparity tuning of IT cells in a similar experimental paradigm. Superimposing the amplitude ratios and latencies of visual responses collected from different visual areas onto the scatter plot of Figure 9B may help address this issue.
Several studies (Bradley et al., 1998; DeAngelis et al., 1998; Dodd et al., 2001; Uka and DeAngelis, 2003) provide evidence that neurons in area MT, an area along the dorsal visual pathway, are functionally involved in some stereoscopic tasks. These studies used RDSs composed only of bright dots. Potential matches between opposite contrast patterns do not exist; thus these tasks do not require the rejection of false matches in the sense discussed here. When aRDS is used as a stimulus, neural responses in MT and MST areas appear to detect matches between contrast-reversed stimuli in a manner similar to that observed in V1 (Krug et al., 1999; Takemura et al., 2001). The responses to aRDS do not necessarily contradict the notion that these neurons contribute to stereo depth perception. In natural images, regions of pure binocular anticorrelation are very rare, if they occur at all, and multiple surfaces often fall within the RF of a neuron. A peak in the cross-correlation of the left and right eye images computed by the neuron can signal the average depth of the surfaces within the RF. This local disparity signal may support coarse stereopsis (Tyler, 1990), as well as vergence eye movement (Masson et al., 1997; Takemura et al., 2001).
Comparison with the visual Wulst of owls
Disparity-selective neurons in the visual Wulst of owls consist of a spectrum of cells, including cells with reduced disparity selectivity for aRDS and cells that retain their disparity selectivity for aRDS (Nieder and Wagner, 2001). A wide spectrum of cell properties suggests that multiple visual areas reside in the owl Wulst. Cells with lower amplitude ratios tend to have longer latencies and smaller secondary peaks in the disparity tuning curves for cRDS. These authors propose that stereo correspondence is hierarchically processed in the visual Wulst by several mechanisms, including the convergence of multiple disparity frequency channels, nonlinear threshold operation, and stimulus-induced synaptic inhibition (Nieder and Wagner, 2001). The key observation for their model is the increasingly pronounced suppression at the trough of tuning curves for cells with smaller secondary peaks and smaller amplitude ratios. In monkey V4, however, we did not observe a relationship either between the amplitude ratio and the ongoing firing rate or between the amplitude ratio and the vertical offset, y0, of the fitted Gabor function.
In conclusion, we found that responses to false matches are rejected to a considerable extent by the stage of V4, raising the possibility that the ventral processing stream may be a neural substrate for global matching computation and the representation of stereoscopic depth. Further studies relating these activities to specific behavior of monkeys performing disparity discrimination tasks should elucidate whether neural activity in these areas is functionally involved in stereoscopic depth judgment.
Footnotes
This work was supported by grants to I.F. from the Ministry of Education, Culture, Science, Sports and Technology (13308046, 15016067), Uehara Memorial Foundation, and Toyota Physical and Chemical Institute. S.T. was supported by the Japan Society for the Promotion of Science Research Fellowship for Young Researchers. We thank I. Ohzawa, H. Tamura, and T. Uka for helpful comments on this manuscript and M. Yamamoto and K. Kotani for help in collecting the data. We are grateful to B. Cumming for providing V1 data and to J. Schall and his colleagues for kindly providing their source code of the Poisson spike train analysis.
Correspondence should be addressed to Dr. Ichiro Fujita, Laboratory for Cognitive Neuroscience, Graduate School of Frontier Biosciences, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan. E-mail: fujita{at}fbs.osaka-u.ac.jp.
Copyright © 2004 Society for Neuroscience 0270-6474/04/248170-11$15.00/0