Abstract
Using event-related functional magnetic resonance imaging, we studied the activation correlating with the awareness of stereoscopic depth using a bistable slanted surface (slant rivalry). Bistability resulted from incongruence between two slant-defining cues: binocular disparity and monocular perspective. The stimulus was perceived as alternating between the perspective-dominated percept (monocular depth) and the disparity-dominated percept (stereopsis), while sensory input remained constant, enabling us to study changes in awareness of depth associated with either cue. Transient activation relating to perceptual alternations was found bilaterally in the caudal part of the intraparietal sulcus, in the right-hemispheric anterior intraparietal sulcus, within visual area V4d-topo, and inferior to area MT+. Transient activation correlating specifically with alternations toward the disparity-dominated percept was found in a number of visual areas, including dorsal visual areas V3A, V7, and V4d-topo and visual areas MT+ and lateral occipital complex. No activation was found for alternations toward the perspective-dominated percept. Our results show that of all visual areas responsive to disparity-defined depth, V4d-topo shows the most robust signal changes correlating with the instigation of stereoscopic depth awareness (stereopsis).
Introduction
A fundamental question in neuroscience is how the brain combines sensory information into a unified conscious experience. Studying perceptual processes in relation to underlying neurobiological mechanisms could reveal how a biological system can go beyond sensory input and produce awareness. Human depth perception depends on several cues, both monocular (e.g., linear perspective) and binocular (disparity). Linear perspective refers to foreshortening of distant objects; disparity refers to the difference between the images on our retinas, used to create the sensation of depth called stereopsis (Howard and Rogers, 2002). A majority of visual areas contain disparity-tuned neurons (Barlow et al., 1967; Pettigrew et al., 1968), classifiable on the basis of their tuning (Poggio and Fischer, 1977; Poggio et al., 1988). Furthermore, the human occipito-parietal junction shows activation during viewing of stimuli containing stereoscopic depth (Gulyas and Roland, 1994; Backus et al., 2001; Nishida et al., 2001). This region includes visual areas V3a, V7, V4d-topo, and the caudal parietal disparity region. V4d-topo has been termed the topographic homolog of macaque V4d (Tootell and Hadjikhani, 2001); the caudal parietal disparity region could be a homolog to macaque CIP, because neurons in this macaque area represent surface orientation based on disparity (Taira et al., 2000), linear perspective (Tsutsui et al., 2001), and texture (Tsutsui et al., 2002). Currently, it remains unanswered to what extent different cortical areas containing disparity-tuned neurons contribute to stereopsis.
To study the perceptual representation of depth, we must dissociate neural activation of perceptual representations from underlying sensory processing. This requirement is met by the phenomenon of bistability (Blake and Logothetis, 2002), in which perception alternates between two different perceptual interpretations (e.g., binocular rivalry, the Necker cube). The dissociation between constant sensory input and alternating perceptual states can be used to assess the contribution of neural systems to perception using single-cell neurophysiology (Logothetis and Schall, 1989; Leopold and Logothetis, 1996, 1999) and functional magnetic resonance imaging (fMRI) (Kleinschmidt et al., 1998; Lumer et al., 1998; Tong et al., 1998; Lumer and Rees, 1999; Inui et al., 2000; Polonsky et al., 2000; Tong and Engel, 2001; Muckli et al., 2002).
We developed a slant rivalry stimulus containing perspective and disparity cues that can be manipulated independently. When the slant cues are incongruent (indicating opposite slants), the stimulus (see Fig. 1) becomes bistable: subjects perceive either a perspective-dominated percept (slanted rectangle) or a disparity-dominated percept (slanted trapezoid). This stimulus allows us to examine which cortical areas are associated with the perceptual representation of depth. We investigated slant rivalry in relation to slant estimation (van Ee et al., 2002) and other bistable stimuli (van Ee, 2005; van Ee et al., 2005), showing it to be a representative example of a bistable stimulus. Here, we use fMRI to identify areas showing activation related to stereoscopic depth. Subjects were instructed to indicate perceptual alternations using button presses. To partial out signal changes related to motor responses, we occasionally presented stimuli containing congruent cues, and subjects were instructed to press buttons at random times.
Materials and Methods
Subjects
Seven subjects participated, and informed written consent was obtained before every scanning session. Subjects had normal or corrected-to-normal vision and good stereo-acuity. All procedures were approved by the F. C. Donders Centre for Cognitive Neuroimaging.
Visual stimuli
Stimulus presentation. Stimuli were presented using an EIKI projector (model LC-X986; resolution, 800 × 600 pixels; Eiki International, Rancho Santa Margarita, CA) onto a projection screen positioned at the rear end of the MR scanner bore. Subjects viewed stimuli through a mirror attached to the head coil. The distance to the screen via the mirror was 80 cm. Red and green filters were attached to MR-suitable glasses for viewing of stereoscopic stimuli.
Slant rivalry stimuli. We used a conventional red-green anaglyphic technique to present stimuli. Using OpenGL (SGI, Mountain View, CA), a wire-frame rectangle was rotated 70° about the vertical axis. By horizontally minifying one half-image and magnifying the other, we created a disparity gradient specifying a slant about the vertical axis. Four different disparity-defined slants were used (-70, -56.5, -35, and 70°), creating a total of four different stimuli. When disparity and perspective both specify a slant of 70° (congruency between cues), the stimulus is stable and no alternations are experienced. In all three other stimuli, disparity and perspective specify opposite slants. This incongruence between cues results in bistability. The stimulus width was 2.1°. The height of the stimulus was 4.8° (left side) and 3.3° (right side). Stimuli were presented within an aperture of a surrounding pattern (13.8 × 11.9°) consisting of small squares (0.5 × 0.5°) providing a zero-slant reference and prevented depth contrast illusions. Only 80% of the squares were shown to prevent fixation in the wrong depth plane (Fig. 1).
Shape experiment stimuli. For the shape control experiment, we used a similar stimulus configuration, presenting subjects with a wire-frame rectangle and trapezoid, both containing zero disparity. Stimulus width and height were similar to the bistable stimuli and were presented within the same aperture of a surrounding pattern of squares.
Depth congruency stimuli. For the depth congruency experiment, we used 25 different stereo-photographs of natural scenes, objects, and faces. The stimulus width and height was 4°. Images were presented containing zero disparity (by presenting one half-image to both eyes), incongruent disparity (by presenting the left eye half-image to the right eye and vice versa), or congruent disparity.
Polar retinotopic mapping. Polar retinotopic mapping was done using methods described in detail previously (DeYoe et al., 1996; Tootell et al., 1997, 1998; Wandell, 2000). We used two rotating, contrast-reversing (8 Hz) wedges (thickness, 30°; rotation speed, 5°/s) developed recently (Slotnick and Yantis, 2003).
MT+ mapping. We used a conventional block design experiment to localize MT+. Epochs consisted of random dots that were either stationary or moving outward toward the observer (starfield motion). Dots (11 arcmin in width and height) were created at random positions from which they moved outward from the origin (fixation point) at a speed of 3°/s. The contrast between the stationary and moving dots localized MT+ and other areas sensitive to visual motion. It is important to note that the full-field method we used did not allow us to dissociate between MT and its satellite areas, MST and FST. We shall therefore refer to the localized activation as MT+.
Lateral occipital complex mapping. Identifying the lateral occipital complex (LOC) was done using conventional methods, described in detail previously (Kourtzi and Kanwisher, 2000). Images consisted of faces, houses and landscapes, common objects, textures, and scrambled versions of both faces and houses/landscapes. All images were 4° in size. The contrast between faces, houses, landscapes, and objects on the one hand and textures and scrambled images on the other hand localizes object-sensitive areas, including LOC.
Stimulus and associated percepts. A, The size of the stimulus and surrounding pattern. The surrounding pattern of squares was used as a zero-slant reference. B, The two percepts when disparity and perspective cues are incongruent. In the perspective-dominated percept, a slanted rectangle is perceived. In the disparity-dominated percept, a slanted trapezoid is perceived. See also our demo page at http://www.phys.uu.nl/_~vanee/Demos/BiStabDemo/BiStabDemo.html.
Procedure
Anatomy. A high-resolution T1-weighted anatomical scan (3D-MPRAGE; optimized contrast between gray and white matter; field of view, 256 × 256; 1 mm3 voxel size) was obtained for segmentation and flattening purposes.
Slant rivalry experiments. During the main slant rivalry experiments, subjects viewed four different slant rivalry stimuli during one run. These stimuli were presented for 132 s and interleaved with black displays (16 s) containing a yellow fixation dot. Subjects were instructed to keep fixation either on the fixation dot (during fixation epochs) or on the center of the stimulus (Fig. 1). Three of four stimuli were bistable, and subjects were instructed to indicate their perceptual alternations using two buttons, one for each perceptual state (disparity/perspective). The fourth stimulus was a perceptual stable version of our stimulus, and subjects were instructed to press either of two buttons at random times. These random button presses were included to partial out activation related to the act of pressing a button when reporting an alternation. Stimulus order was randomized, and all seven subjects participated in four runs, with each run lasting 608 s.
Shape experiments. For the shape experiments, we used a similar block design as in the slant rivalry experiment. Runs consisted of three blocks (132 s) during which subjects viewed a rectangle wire frame alternating with a trapezoid wire frame at random times, interleaved with black displays (16 s) containing a yellow fixation dot. Subjects were instructed to indicate the change in stimulus using two buttons, one for each shape. All seven subjects participated in one run, each lasting 460 s. Although stimuli contained no non-zero disparities, subjects nevertheless wore anaglyphic glasses to ensure similarity with the slant rivalry experiment.
Depth congruency experiments. For the depth congruency experiments, we presented multiple stimuli during 16.8 s epochs. Epochs contained zero disparity, congruent stereo-images, or incongruent stereo-images, each presented for 1200 ms, followed by a 100 ms blank display. All three conditions were repeated six times, and between epochs, a fixation display containing only a yellow dot was shown for 16.8 s. Two of the original seven subjects participated in three runs, with each run lasting 621.6 s.
Retinotopic mapping and the localization of MT+ and LOC. All seven subjects performed three polar mapping runs, consisting of 10 cycles (full hemifield rotation of both wedges), lasting a total of 456 s. In addition, subjects performed one run of MT+ localization and one run of LOC localization. MT+ localization runs (400 s) consisted of six 16 s epochs of stationary dots and six 16 s epochs of moving dots, interleaved with 16 s blank fixation displays. LOC localization runs (520.8 s) consisted of three epochs of each category (face, house, objects, texture, and scrambled). During epochs (16.8 s), different stimuli were presented for 1200 ms, interleaved with a 100 ms blank display. Epochs were interleaved with a black fixation display of 16.8 s.
Magnetic resonance imaging. All images were acquired using a 3 tesla Siemens (Erlangen, Germany) TRIO, with the exception of the high-resolution T1 anatomical scan, which was acquired using a 1.5 tesla Siemens Sonata. Scanners were located at the F. C. Donders Centre for Cognitive Neuroimaging. All functional images were recorded using gradient echo planar imaging. For the slant rivalry, shape, and depth congruency experiments, we used 30 horizontal slices [repetition time (TR), 2400 ms; echo time (TE), 30 ms; flip angle, 75°; 64 × 64 matrix; voxel size, 3.5 mm3]. For MT+ and LOC localizer runs, scanning parameters were similar, but only 25 horizontal slices were used. For retinotopic mapping, we used 25 horizontal slices (TR, 3000 ms; TE, 30; flip angle, 75°; 64 × 64 matrix; voxel size, 3 mm3). Scanning parameters were optimized for each particular experiment. For example, the relatively long volume TR (3000 ms) for the retinotopic mapping provides a good signal-to-noise ratio.
Cortical flattening and area border delineation. The cortical sheets of the individual subjects were reconstructed as polygon meshes based on the high-resolution T1 scans. The white-gray matter boundary was segmented, reconstructed, smoothed, inflated, and flattened (Kriegeskorte and Goebel, 2001). Area border delineation using the polar retinotopic mapping was done using methods described previously (DeYoe et al., 1996; Tootell et al., 1997, 1998; Wandell, 2000). Using the correlation between wedge position and neural activity, borders were identified on the basis of field sign alternations, and areas were drawn in on the flattened sheet manually. We present our criteria for area delineation because they are subject to some uncertainty, especially in higher retinotopic visual areas. Some groups define V7 as an area adjacent and anterior to V3A that contains a crude representation of at least the upper visual field, mirror-symmetric to that in V3A (Tootell et al., 1998; Press et al., 2001; Tsao et al., 2003). Tootell et al. (1998) also define V4d-topo as the human topographic homolog (topolog), an area situated (1) superior to V4v, (2) anterior to V3a, and (3) posterior to MT+. This area has been previously called lateral occipital central/lateral occipital peripheral (LOP). However, to avoid confusion with the more ventral area lateral occipital cortex, lateral occipital central/LOP has been renamed V4d-topo. However, these definitions do not include a clear criterion for drawing the border between V7 and V4d-topo, because both lie between V3a and area MT+. Our retinotopic mapping experiments suggest that these areas are indeed separate visual areas but that a clear border is not easily found using conventional mapping techniques. However, we do conform to these criteria for identifying areas. After border delineation, we combined all four quadrants of V1 and V2 (left/right ventral/dorsal) to yield a single area V1 and V2 for our region-of-interest (ROI) analysis. In addition, we combined left- and right-hemispheric V3, V3A, V7, V4d-topo, VP, and V4V.
Analysis
Psychophysical analysis. Psychophysical data were analyzed by comparing the durations of percepts and the interval between random button presses. In addition, we fitted durations to a two-parameter gamma distribution:
By definition, Γ(n) is the canonical continuous extension of (n - 1)!, which itself is only defined for natural n. The parameters k and λ in the equation are usually referred to as the shape and scale parameters.
We used the cumulative distribution function for the determination of optimal fits, as opposed to the probability distribution function, to prevent histogram binning problems. The quality of the fit was quantified through the Kolmogorov-Smirnov test, which involves the deviation between the cumulative distribution function of the data and the fit. Any fit with a Kolmogorov-Smirnov value of >0.05 is considered to be of high quality (for a detailed description, see Brascamp et al., 2005).
fMRI analysis. We used BrainVoyager (BrainInnovation, Maastricht, The Netherlands) for all functional data analyses as well as for the creation of flattened cortical representations. Before analysis, we removed the first three volumes of every scan. All remaining functional images were subjected to a number of preprocessing steps: (1) motion correction; (2) slice timing correction; (3) linear trend removal using a high-pass filter; and (4) transformation of the functional data into Talairach coordinate space (Talairach and Tournaux, 1988). We convolved random button presses and perceptual alternations (indicated by button presses) with a standard hemodynamic model of blood oxygenation level-dependent (BOLD) activation (Boynton et al., 1996). Similar models were created for the perceptual phases. To exclude unspecific stimulus onset and offset effects, perceptual alternations occurring five volumes after stimulus onset and 5 volumes before stimulus offset were excluded. In addition, we specifically included a predictor for the expected signal changes for stimulus presentation, as multiple stimuli were presented during one run. This predictor was used to partial out any activation specifically related to these signal changes. The predicted time courses were then used to find significant activation, using voxel-by-voxel, fixed-effects general linear model (GLM) analysis (Friston et al., 1995). In this analysis, all significance values were corrected for serial correlations and multiple measurements (Bonferroni's correction). We set the threshold for this analysis to p < 0.005 (corrected). In addition, we analyzed the signal changes within mapped visual areas (retinotopic areas, MT+, and LOC), taking those voxels of these areas that were activated by the stimulus itself for a fair comparison between visual areas. Because these regions are selected a priori, no correction for multiple measurements is required, and therefore uncorrected p values are reported (significance values were corrected for serial correlations). We contrasted perceptual alternations with the random button presses to obtain signal changes related to these alternations. We contrasted alternations toward the disparity-dominated percept with perceptual alternations toward the perspective-dominated percept to investigate activation related to either alternation type. The same was done for the perceptual phases. Event-related averages were created by averaging time courses around perceptual alternations (5 s before alternations to 14 s after alternations) at a 250 ms resolution. The stochastic nature of the perceptual alternations provided us with a natural occurring jitter needed for event-related averaging (Rosen et al., 1988; Buckner, 1998).
Eye movement recording and analysis. Eye movements were recorded using a custom-made camera and mirror system. The camera was positioned at the feet of the subjects pointing toward the head coil where a mirror reflected the image of one eye toward the camera. An infrared light-emitting diode was used to illuminate the eye so that it was visible for the camera. The iView software package was used to record eye movements at a frequency of 50 Hz. Drift correction was applied by filtering the data with a high-pass filter, removing components with a frequency <0.1 Hz. Blinks were detected as those samples during which no eye position was recorded. Starting points of saccades were detected as samples in which the velocity of eye movements exceeded 18° per second, and end points were detected as the return below this value. We analyzed the eye movements by examining the density of the mean location of fixation during the entire experimental session and during either perceptual state. In addition, we analyzed both position and velocity of eye movements around perceptual alternations by testing whether, at any time around perceptual alternations, the position and/or velocity of eye movements deviated significantly from positions and/or velocities at other times.
Results
Psychophysical results
Figure 2A depicts the mean durations of both the disparity- and perspective-dominated perceptual states and the mean interval between random button presses. These mean durations illustrate a common property of the dynamics of bistable perception: large variation is observed between individual durations of perceptual phases. Further characterizing the behavioral data, Figure 2B shows the best-fitting gamma distribution (see equation) for the periods between random button presses (left) and the durations of each perceptual state (middle and right), and Figure 2C shows the associated fitted parameters. The quality of the fit between data and distribution was determined using a Kolmogorov-Smirnov test, revealing that all fits could be accepted in terms of quality (Fig. 2C). [It has been shown (Brascamp et al., 2005) that for some bistable stimuli, fit quality is increased when one uses the rate (1/duration) of alternations but that for slant-rivalry, no significant difference was observed between these two possible fitting methods. Consistent with these findings, no significant differences in quality were observed between fitting durations or rates in the present study.] Figure 2D overlays the group gamma distributions, illustrating the similarity between them. These findings are consistent with previous findings on the dynamics of slant rivalry (van Ee, 2005).
Activation correlating to perceptual alternations
We first investigated the presence of transient activation related to the perceptual alternations (regardless of direction: either to the disparity- or perspective-dominated percept), defining transient activation as a response that is correlated only with the onset of an event. In contrast, sustained responses refer to activation that is present for the duration of the event. To test for transient activation, we convolved the events (perceptual alternations and random button presses) with the hemodynamic response function (HRF) (Boynton, 1996). We contrasted the perceptual alternations with the random button presses, allowing us to test the presence of activation correlating with perceptual alternations. A number of clusters showing significant signal changes were found (Fig. 3A, Table 1), including bilateral activation within area V4d-topo and the caudal part of the intraparietal sulcus (cIPS), superior to V7. In the right hemisphere, we found significant activation on the more medial and anterior part of the intraparietal sulcus (aIPS) and activation posterior to MT+. This last cluster of activation was found to show some overlap with area LOC, identified in all subjects using a conventional LOC localizer (see Materials and Methods). Finally, we found activation in the left hemisphere, inferior to area MT+. The absence of any activation around the left-hemispheric central sulcus shows that the transient activation is related to perceptual alternations and not merely reflecting the button presses subjects used to indicate alternations. The contrast between actual alternations and random button presses ensured that this activation was partialed out. Analyzing the data using all button presses indeed revealed a significant activation in the left-hemispheric central sulcus, indicative of motor responses (Fig. 3A, inset). Focusing on individual activation, Figure 3B shows the left hemispheres of four subjects, illustrating that the group activation that was found to be located inferior to MT+ in one subject (Fig. 3A) does not overlap with MT+ at an individual level as well.
Statistical significance of found clusters of activation for all three contrasts in the slant rivalry experiment
Psychophysical data. A, Average time between random button presses and duration of perceptual phases for all subjects. No significant differences were found between these durations, and no clear trend is observed because subjects differ in their perceptual bias for one of the two perceptual states. B, Histograms of all durations overlayed with the best-fitting gamma distribution. C, Individual fitted shape and scale parameters. All fits were found to be of significant quality. Note that high p values are associated with high fit qualities (Kolmogorov-Smirnov test). D, Comparison of the gamma distributions (inset shows the fitted scale and shape parameters for all subjects), demonstrating the similarity of these distributions.
Subsequently, we examined activation related to either type of alternation (toward the disparity-dominated percept or toward the perspective-dominated percept) by contrasting both types of alternations using the same convolution of alternations and the HRF. This revealed that the cluster of activation within area V4d-topo related to perceptual alternations (see above) showed significant increased activation for alternations toward the disparity-dominated percept. The same was found for the cluster of activation inferior to left-hemispheric MT+. Finally, this contrast also revealed bilateral activation within LOC. No significant activation was found for alternations toward the perspective-dominated percept (Fig. 4A, Table 1).
Transient activation correlating with perceptual alternations. A, Group data shown on the inflated hemispheres of one subject. Visual areas were mapped using conventional retinotopic mapping procedures. Significance values were corrected for serial correlation and multiple comparisons (Bonferroni's correction). Transient activation correlating with the perceptual alternations was found bilaterally superior to V7, on the cIPS, and on the right-hemispheric medial IPS and aIPS. In addition, we found bilateral activation within visual area V4d-topo, within right-hemispheric LOC, and on the left inferior temporal gyrus, inferior to area MT+. The inset shows the activation correlated with the button presses on the left-hemispheric central gyrus (cG) and postcentral sulcus. Table 1 provides full statistical results for activated clusters. B, Individual data of four subjects (left hemispheres). These individual subjects demonstrate that the activation overlaps with V4d-topo, although the overlap is not complete. In addition, it also demonstrates that the activation on the inferior temporal gyrus does not overlap with MT+.
Because the significant signal changes appear to overlap with or are nearby known visual areas (V4d-topo, MT+, and LOC), we analyzed signal changes related to either perceptual alternation (toward the disparity-dominated percept/toward the perspective-dominated percept) and random button presses within these and other mapped visual areas using an ROI-based analysis. Before analysis, we combined all four quadrants of V1 and V2 (left/right ventral/dorsal) to yield a single area V1 and V2. In addition, we combined left- and right-hemispheric V3, V3A, V7, V4d-topo, VP, and V4V. Analyses were performed on the voxels that were activated by the stimuli, ensuring a fair comparison between visual areas. We repeated both contrasts used in the whole-brain analysis, determining whether signal changes within the visual areas correlated with alternations and whether these areas showed significant signal changes related to each type of alternation. This revealed that visual areas V3A, V7, V4d-topo, MT+, and LOC showed significant transient signal changes during alternations. More specifically, all of these areas showed significant signal changes in response to alternations toward the disparity-dominated percept. Of these areas, the largest and most significant effect was found in V4d-topo and areas MT+ and LOC (Fig. 4B,C). Examining the data in Figure 4C suggests that all visual areas show increased activation for both types of alternations. However, when these signal changes are contrasted with the random button presses, no difference is found between these events in a number of visual areas (V1, V2, V3, VP, and V4V), because the random button presses are associated with similar signal increases as the alternations. Although we have included a specific predictor within the analysis to partial out any signal changes related to stimulus presentation, it is likely that some residual activation related to this stimulus presentation still accounts for the apparent positive signal changes for both alternations and random button presses. Fortunately, our specific contrasts will partial out such residual activation.
Event-related activation
Figure 5 shows event-related averages of four subjects, taken from the activated voxels of V4d-topo, illustrating the apparent transient nature of the signal. The presence of a decreasing signal for alternations toward the perspective-dominated percept in three of four subjects (S3, S7, and, to some extent, S4) could be indicative of a sustained response during the disparity-dominated phase preceding it, returning to baseline after subjects alternate back toward the perspective-dominated percept. Because the amount of signal change at the time of the alternation is set to zero (a required normalization step to obtain relative signal changes), this signal appears to negative. We tested the presence of these sustained signal changes by reexamining the data from within visual areas, assessing the correlation between signal changes and perceptual phases. This revealed significant sustained activation during the disparity-dominated percept in LOC (t(7027) = 2.39; p < 0.02) and visual areas VP (t(7027) = 2.62; p < 0.009) and V4v (t(7027) = 3.07; p < 0.003). Note that both VP and V4v did not show any significant transient signal changes, whereas LOC did. In the remaining visual areas showing significant transient responses for alternations toward the disparity-dominated percept (V3A, V7, MT+, and V4d-topo), no significant sustained signal changes were identified. However, as pointed out above, the event-related averages from V4d-topo did show a decrease in activation for alternations toward the perspective-dominated percept (with the exception of subject S6), indicative of sustained responses. This discrepancy can be explained by taking into account the difference between the statistics provided by the GLM analysis and the event-related averages. In our study, the events (alternations) are relatively close together, depending on the subject. Because convolving the HRF with those events takes this into account, the GLM truthfully reflects the statistical power of all individual trials. Averaging time courses around events does not take into account the close temporal proximity of trials. Therefore, these averages could be misleading if not compared with the GLM analysis. In all probability, the decrease that is seen in the event-related averages reflects the return of the transient activation related to the disparity-dominated percept to baseline (as a result of the close proximity of the alternations). As an extreme example of this effect, for one subject (S6) there was increased activation in V4d-topo for both types of alternations in the event-related averages, with a small shift of the perspective-dominated alternations to an earlier point in time. Taking into account the short perceptual durations of the disparity-dominated percept for this subject (Fig. 2) revealed that the signal that appeared to be related to alternations toward the perspective-dominated percept was actually a copy of the disparity-dominated activation: the shift in time of the event-related activity matched the average duration of the disparity-dominated percept for this subject.
Eye movements
To exclude eye movements as a possible confound, we recorded them during scanning for the subject showing the most robust BOLD activation. After drift-correction of the data, we detected both saccades and blinks (see Materials and Methods) and determined the average fixation position during both states and during each separate state. Density maps (Fig. 6A) demonstrate that this subject maintained stable fixation at the center of the stimulus. Furthermore, there exists no apparent difference between the fixation positions of the two perceptual states (disparity/perspective). Indeed, no significant difference was found between perceptual states in either the horizontal (t(121) = 0.20; p = 0.84) or vertical (t(121) = 1.22; p = 0.22) component of the fixation positions. In an additional analysis, we examined whether the absolute fixation position changed consistently before and after an alternation. As can be observed from Figure 6 B, the average position does not appear to show any consistent changes in either the horizontal or vertical direction for both types of alternations (toward the perspective-dominated percept/toward the disparity-dominated percept). Statistical analysis revealed that there were no significant deviations in the mean position at any time around the moment of perceptual alternations toward the disparity-dominated percept (horizontal: F(200,12260) = 0.39, p = 1; vertical: F(200,12260) = 0.40, p = 1) and for alternations toward the perspective-dominated percept (horizontal: F(200,12260) = 0.47, p = 1; vertical: F(200,12260) = 0.25, p = 1).
Transient activation correlating with the perceptual alternations toward the disparity-dominated percept. A, Group data on the flattened hemispheres of one subject. We found bilateral transient signal changes relating to perceptual alternations toward the disparity-dominated percept in V4d-topo. In addition, an area just inferior of left-hemispheric MT+ was found. Significance values were corrected for serial correlation and multiple comparisons (Bonferroni's correction). IPS, Intraparietal sulcus. B, Statistical results associated with the ROIs for both contrasts (alternations > random button presses; disparity dominated percept > perspective dominated percept). RP, Random button presses; DIS, alternations toward the disparity-dominated percept; PER, alternations toward the perspective-dominated percept. C, ROI-based analysis of signal changes related to alternations toward either the perspective-dominated percept (dark gray) or the disparity-dominated percept (light gray) in all mapped areas, demonstrating the correlation of signal changes of V3A, V7, V4d-topo, LOC, and MT+ with alternations toward the disparity-dominated percept but not the perspective-dominated percept. Error bars represent SEM.
Similar to absolute positions, the velocity of eye movements (covered distance in degrees per milliseconds) did not show any significant changes around alternations (Fig. 6C) for either alternations toward the disparity-dominated percept (F(199, 12399) = 0.85; p = 0.94) or perspective-dominated percept (F(199, 12399) = 0.91; p = 0.82). Finally, we compared the number of blinks and saccades during either perceptual state. The mean number of blinks between states did not differ significantly (6.6 blinks/min during the disparity-dominated percept and 9.6 blinks/min during the perspective-dominated percept; t(123) = 0.55; p = 0.58). A similar nonsignificant difference was found for the mean number of saccades (2.46 saccades/min during the disparity-dominated percept and 2.76 saccades/min during the perspective-dominated percept; t(125) = -0.21; p = 0.77). Note that these saccades were small in both number and amplitude (microsaccades), because the subject was instructed to maintain strict fixation during the entire session.
Together, this demonstrates that eye movements are not responsible for activational changes around these alternations. Hardware limitations of the eye-tracking device used in combination with the MR scanner did not allow us to measure both eyes (e.g., vergence) at the same time. However, thorough experiments have been performed in our laboratory, demonstrating the lack of consistent variations in vergence or in the number of blinks and microsaccades between alternations toward either percept (van Dam and van Ee, 2005).
Shape processing
The two perceptual states differ as to the depth-cue that dominates percept, and also in the shape they represent. The perspective-dominated percept is associated with a rectangular shape, and the disparity-dominated percept is associated with a trapezoidal shape (Fig. 1), meaning that the found differential activity could, in principle, relate solely to shape. To exclude this alternative explanation, we performed a control experiment using our seven subjects who were shown trapezoidal and rectangular wire frames, with zero disparity. These stimuli alternated at random times, mimicking the timing of alternations experienced during viewing of our bistable stimulus. We determined whether any activation was found to correlate with either shape (sustained) or change in shape (transient response). Changing stimuli leads to strong transient responses in retinotopic visual areas, but more importantly, no differential activation was found between the trapezoidal and rectangular shape for both transient and sustained signals. The lack of signal changes related to shape indicates that the difference in shape between perceptual states is not responsible for the differential activation found between these same perceptual states (disparity/perspective).
Event-related averaging for four subjects, taken from activated voxels within individual V4d-topo. The black lines represent activity correlating to perceptual alternations to the disparity-dominated percept, and the gray lines represent perceptual alternations to the perspective-dominated percept. All subjects show a clear increase in activation for alternations toward the disparity-dominated percept. In addition, S7 and S3 (and, to a lesser extent, S4) show a clear decrease for alternations toward the perspective-dominated percept. In contrast, the event-related averages of subject S6 show similar time courses for both types of alternations, with the signal related to the alternations toward the perspective-dominated percept shifted in time relative to the signal changes related to alternations toward the disparity-dominated percept. Taking into account the short durations of perceptual states for this subject (Fig. 2), it is likely that the activation that appears to relate to the perspective-dominated percept is in fact related to the alternation toward the disparity-dominated percept, which typically occurred shortly before it. Error bars represent SEM.
Suppression of stereoscopic depth in familiar images
When presenting familiar images containing incongruent disparity (e.g., by swapping the half-images), this disparity information appears to be suppressed. Using stereograms of photographs depicting familiar objects, people, and scenery (Fig. 7A), we examined whether visual areas were differentially activated by images containing zero disparity, congruent disparity, or incongruent disparity. Analysis revealed that a number of areas prefer images containing non-zero disparity (regardless of the congruency of this disparity): V3A (t(727) = 3.88; p < 0.0001), V4d-topo (t(727) = 4.31; p < 0.0001), V7 (t(727) = 4.49; p < 0.0001), V4V (t(727) = 2.30; p < 0.05), and LOC (t(727) = 3.178; p < 0.002). More importantly, of these areas, both V4d-topo and V7 responded with significant greater signal changes to images containing congruent disparity, relative to images containing incongruent disparity: V4d-topo (t(727) = 2.79; p < 0.005); V7 (t(727) = 3.08; p < 0.002). These findings provide additional evidence for the role of V4d-topo in stereopsis: this area shows a clear increase in activation when the stereoscopic depth information is perceived, relative to a situation in which this same stereoscopic information is also present, but suppressed because it is incongruent with the familiarity of the pictorial information (Fig. 7).
Eye-movement data. A, Density of fixation positions during both states (left), during the disparity-dominated percept (middle), or during the perspective-dominated percept (right). Grid spacing equals 1° of visual angle. The inset shows the enlarged center, including the mean fixation period and SD for both components (horizontal/vertical). B, Average horizontal and vertical components around perceptual alternations (-2 to 2 s) toward the disparity-dominated percept (left) and toward the perspective-dominated percept (right). The red line represents the average location over time; the blue area represents the SD of this position. No significant change in position (horizontal or vertical) is observed for either alternation. C, Average velocity (degrees per milliseconds) around the moment of perceptual alternations. The red line represents the average velocity, and the blue area represents its SD. Similar to the average position, no significant changes in velocity are observed for either type of alternation.
Discussion
Using a slant rivalry stimulus, evoking perceptual alternations between perspective- and disparity-dominated perceptual representations, we examined the neural correlates of alternations toward either percept. We identified a number of cortical areas that show significant signal changes when subjects experience a perceptual alternation that included V4d-topo, bilateral cIPS, and right-hemispheric aIPS. Right-hemispheric activation of the intraparietal sulcus fits with a number of experimental observations that suggest that activation of the right parietal lobe correlates well with alternations evoked by other perceptually bistable images (Kleinschmidt et al., 1998; Lumer et al., 1998). In addition, damage to the right-hemispheric parietal lobe is more likely to lead to neglect than damage to the left hemisphere (Driver and Mattingley, 1998).
Activation of visual areas when viewing stereograms of familiar objects. A, Example stimulus used in the experiment. B, Activation changes for visual areas. C, Event-related averages from area V4d-topo. A number of visual areas show greater signal changes to images containing disparity (V3A, V4d-topo, V7, V4V, and LOC). More importantly, of these areas, only V4d-topo and V7 show increased signal changes for images containing congruent disparity, relative to images containing incongruent disparity. These results provide additional evidence for the role of V4d-topo in the awareness of stereoscopic depth perception: it shows a significant increase in activation when the stereoscopic depth information in the stereograms is perceived (and not when it is suppressed).
Previous studies have identified a number of visual areas responsive to disparity-defined depth. In this study, we demonstrate the presence of transient activation for alternations toward a disparity-dominated percept in V4d-topo, indicating that its activity correlates with the instigation of stereopsis (awareness of stereoscopic depth). Neighboring areas V3a and V7 also showed correlated activation, but with a lower significance compared with V4d-topo. It has been reported (Backus et al., 2001) that when disparity no longer leads to stereopsis (because it is outside the fusible range), the largest activation changes occurred in V3a. We conclude that the area anterior to V3a, namely V4d-topo, correlates its activation with stereopsis even more strongly. A direct comparison is not possible because V4d-topo was not identified in the mentioned study. A block design fMRI study to localize disparity-based depth perception in both humans and macaques found disparity-related activation in the parieto-occipital junction in humans, including V3a, V7, and V4d-topo (Tsao et al., 2003). Compared with our data, their activation was more widespread. This is somewhat logical considering that their stimulus covered a large portion of the visual field and that a block design is statistically better suited to find activation related to stimulation. The size of our stimulus was deliberately kept small to limit eye movements. More importantly, our stimulus allows us to dissociate between sensory and perceptual processes. By correlating activation to perceptual changes during constant sensory input, we were able to investigate the representation of stereoscopic depth perception.
Although no visual motion was present in our stimuli, the fact that MT+ also shows activational changes for the disparity-dominated percept still fits well with existing literature: studies have shown the presence of disparity-sensitive cells in MT+ and that these neurons can detect surface orientation on the basis of disparity gradients (Palanca and DeAngelis, 2003; Nguyenkim and DeAngelis, 2003). We also identified an area approximately inferior to left-hemispheric MT+ in which the signal changes related to alternations and especially the signal changes related to the alternations toward the disparity-dominated percept are much more pronounced than MT+.
Several studies have demonstrated that area LOC is sensitive to disparity-defined shape. For example, it has been shown recently that LOC combines disparity and perspective information to represent perceived three-dimensional shape (Welchman et al., 2005). In our study, we elaborate on these findings, demonstrating that LOC shows both transient and sustained signal changes correlating with perceiving the disparity-dominated percept.
We did not find any activation related to alternations toward the perspective-dominated perception when these alternations were contrasted with the alternations toward the disparity-dominated percept. However, our stimulus might not have been suitable for determining neural correlates of perspective, because the perspective cue was always present, regardless of the percept. This could explain the lack of activation for alternations to the perspective-dominated percept.
The activation in V4d-topo appears transient rather than sustained. Possibly, V4d-topo detects disparity-defined depth but does not sustain that activity. We identified three areas in which signal changes did correlate (albeit weakly) with the perceptual states (opposed to alternations): V4v, VP, and LOC. This suggests an involvement of dorsal visual areas with the alternations (V4d-topo, V3A, and V7) and an involvement of ventral visual areas with the perceptual phases (V4v, VP, and LOC). Perhaps these latter areas respond to the detection of disparity-defined depth by the former areas and actually code for the shape defined by this disparity-defined depth for the time it is seen.
Are there alternative explanations for the activation of V4d-topo that we relate to stereopsis? We have ruled out the possibility that the activation relates to shape rather than to depth perception: a control experiment using trapezoidal and rectangular shapes did not provide evidence for this alternative explanation. Also, using both on-line and off-line (van Dam and van Ee, 2005) eye-movement recordings, we have shown that there are no differences between fixation positions, the number of eye movements, and vergence between perceptual states, excluding eye movements as an alternative explanation. We also provided additional evidence for the role of dorsal visual areas in stereoscopic depth perception. When viewing stereograms of familiar objects, disparity information appears to be suppressed if this disparity information is incongruent with other cues, such as familiarity of the object, shading, and texture cues. Indeed, the dorsal areas show increased activation for congruent stereograms during which stereoscopic depth is perceived compared with zero or incongruent stereo-images (during which the stereoscopic depth is suppressed). Furthermore, the activation could also be attributable to attention. It remains to be investigated to what extent attention influences the activation of V4d-topo and other areas that show transient activation for alternations toward the disparity-dominated percept. The activation could be related solely to attention to disparity, but if this activation reflects modulation of those neurons detecting disparity, it would still mean that the activation in these areas is related to the processing of stereoscopic depth. Finally, we note that our stimulus is not completely visually symmetric (because of foreshortening, the left side of the stimulus is somewhat smaller than its right side). We reasoned that the asymmetry is counterbalanced by the larger and symmetrical background surrounding the stimulus. Indeed, we found that the asymmetry produces no significant differences between the activation of left- and right-hemispheric visual areas (our unpublished observations). We therefore conclude that this asymmetry could not have affected the outcome of the experiment and the conclusions that we draw from it.
In conclusion, we used a bistable slant rivalry stimulus that enables us to correlate activation associated with either perspective or stereoscopic depth perception under constant sensory input. We found transient activation correlating with the perceptual alternations in V4d-topo and in both bilateral cIPS and right-hemispheric aIPS. More importantly, we found transient activation related to perceptual alternations toward the disparity-dominated percept in a number of visual areas (V3A, V7, MT+, LOC, and V4d-topo, with the latter showing the most robust effect). Sustained activation was found in areas VP, V4V, and LOC. Different paradigms, such as bistability (Kleinschmidt et al., 1998; Tong et al., 1998; Polonsky et al., 2000; Tong and Engel, 2001), visual discrimination (Ress and Heeger, 2003), and imagery (Kosslyn et al., 2001) have shown that activation in the visual cortex correlates with perception and awareness. Our results indicate that human V4d-topo serves an important role among areas responsive to disparity in that its activation correlates with the awareness of stereoscopic depth or stereopsis.
Footnotes
This work was supported by grants from the Netherlands Organization for Scientific Research and the Helmholtz Institute (R.v.E.). We thank the reviewers for thoughtful and constructive comments. We also thank our colleagues for their support and additional comments, Frank Tong for fruitful discussions on the design of the experiments, and Paul Gaalman at the F. C. Donders Centre for Cognitive Neuroimaging for technical support.
Correspondence should be addressed to Dr. Raymond van Ee, Helmholtz Institute, Faculty of Physics and Astronomy, Utrecht University, Princetonplein 5, 3584 CC Utrecht, The Netherlands. E-mail: r.vanee{at}phys.uu.nl.
Copyright © 2005 Society for Neuroscience 0270-6474/05/2510403-11$15.00/0