Abstract
Processing of binocular disparity is thought to be widespread throughout cortex, highlighting its importance for perception and action. Yet the computations and functional roles underlying this activity across areas remain largely unknown. Here, we trace the neural representations mediating depth perception across human brain areas using multivariate analysis methods and high-resolution imaging. Presenting disparity-defined planes, we determine functional magnetic resonance imaging (fMRI) selectivity to near versus far depth positions. First, we test the perceptual relevance of this selectivity, comparing the pattern-based decoding of fMRI responses evoked by random dot stereograms that support depth perception (correlated RDS) with the decoding of stimuli containing disparities to which the perceptual system is blind (anticorrelated RDS). Preferential disparity selectivity for correlated stimuli in dorsal (visual and parietal) areas and higher ventral area LO (lateral occipital area) suggests encoding of perceptually relevant information, in contrast to early (V1, V2) and intermediate ventral (V3v, V4) visual cortical areas that show similar selectivity for both correlated and anticorrelated stimuli. Second, manipulating disparity parametrically, we show that dorsal areas encode the metric disparity structure of the viewed stimuli (i.e., disparity magnitude), whereas ventral area LO appears to represent depth position in a categorical manner (i.e., disparity sign). Our findings suggest that activity in both visual streams is commensurate with the use of disparity for depth perception but the neural computations may differ. Intriguingly, perceptually relevant responses in the dorsal stream are tuned to disparity content and emerge at a comparatively earlier stage than categorical representations for depth position in the ventral stream.
Introduction
The horizontal separation of the eyes provides a powerful cue to depth in the form of binocular disparity. Humans are exquisitely sensitive to disparity and exploit it for multiple purposes [e.g., breaking camouflage, perceiving albedo, and grasping objects (Julesz, 1971; Blake and Bülthoff, 1990; Bradshaw et al., 2004)]. Computationally, different processing stages are required to extract useful information from the images registered by the two eyes (Marr and Poggio, 1976). Although neurons sensitive to disparity are found throughout visual cortex (Orban et al., 2006; Parker, 2007), our understanding of the circuits supporting different stages of disparity computation and the functional utility of these computations is still underdeveloped.
In the human brain, disparity-evoked responses are widespread with perhaps greatest sensitivity in dorsal visual cortical areas V3A and V7 (Backus et al., 2001; Tsao et al., 2003; Neri et al., 2004; Tyler et al., 2006). However, studying the neural code for disparity in the human brain is limited by the spatial resolution of conventional functional magnetic resonance imaging (fMRI). In particular, neurons sensitive to different disparities are known to be colocated within cortex (DeAngelis and Newsome, 1999; Chen et al., 2008), making it difficult to reveal differential responses when averaging across voxels. Here, we trace disparity selectivity using multivoxel pattern analysis (MVPA) that extracts information distributed across voxels (Cox and Savoy, 2003; Kamitani and Tong, 2005; Haynes and Rees, 2006; Norman et al., 2006). Using this sensitive technique, we investigate selectivity for depth position across retinotopic visual cortex, higher ventral [lateral occipital complex (LOC)] and dorsal [human motion complex (hMT+/V5) and intraparietal sulcus (IPS) regions] areas. To test the perceptual relevance of fMRI responses, we contrast pattern-based decoding of fMRI responses evoked by planes rendered by correlated and anticorrelated random dot stereograms (RDSs). The latter, anticorrelated stimuli [in which the contrast polarity of dots in the two eyes is reversed (see Fig. 1A)], are informative because the disparities they contain do not support the perception of depth (Cogan et al., 1993; Cumming et al., 1998; Read and Eagle, 2000). We show that correlated stimuli are reliably decoded from fMRI signals across visual areas, whereas decoding accuracies for anticorrelated stimuli are attenuated in dorsal areas [retinotopic regions, hMT+/V5, and IPS regions] and ventral stream area LO (lateral occipital). Importantly, using parametric stimulus manipulations and high-resolution fMRI, we demonstrate that responses in dorsal areas are highly selective for the viewed disparity, whereas disparity processing in LO appears to encode near and far positions in a categorical manner. In contrast to the traditional dichotomy of ventral and dorsal disparity processing, our results suggest that perceptually relevant information is processed in both pathways. However, the neural code for disparity signals appears to differ between the two pathways in two main respects: multivoxel pattern selectivity for perceptually relevant disparities (1) develops at comparatively earlier stages in the dorsal stream and (2) reflects metric representations of disparity content in contrast to categorical representations of depth position at later processing stages in the ventral stream.
Stimulus illustration and regions of interest. A, A representation of RDSs similar to those presented to observers. The illustration is designed for crossed-eye fusion. This can be achieved by first fixating the white space between the left and center images and then slowly crossing the eyes until the white-edged squares at the center of each image are lined up and two planes rendered in a correlated RDS are revealed (left plane near, right plane far as is illustrated by the right diagram in B). The same disparities are rendered in the fusion of the center and right images, but the luminance polarity of dots in the two eyes is reversed (a white dot in one eye matches a black dot in the other). This anticorrelated stimulus does not evoke a reliable disparity-defined depth. (Note that our subjects were not required to free fuse, and the fixation marker was considerably smaller.) B, A schematic representation of the disparity-defined depth structure of the stimuli. One of two depth configurations was presented on each trial: one in which the left plane was further and the right plane closer to the observer (diagram on left) or one in which the left plane was closer to the observer and the right plane further (cartoon on right). The planes to the left and right of fixation always had different signs (i.e., one near, one far) but the magnitude of the disparity was the same (i.e., disparity left = − disparity right). C, Regions of interest in one subject showing retinotopic areas, V3B/KO, hMT+/V5, LOC (LO and pFs subregions), and three-dimensional shape-related areas in the parietal cortex (VIPS, POIPS, and DIPS). Regions were defined using independent localizers (see Materials and Methods). Sulci are coded in darker gray than the gyri. Major sulci are labeled: STS, superior temporal sulcus; ITS, inferior temporal sulcus; CS, central sulcus.
Materials and Methods
Observers
Eight observers from the University of Birmingham participated in experiment 1 and experiment 2. Four participated in both experiments. Mean age was 26.7 years (range, 20–42 years). All observers had normal or corrected-to-normal vision and were screened for stereo deficits using a dynamic stereo test (van Ee and Richards, 2002). All stimuli presented in the study were above observers' detection thresholds. Experiments were approved by the local ethics committee. All observers gave written informed consent.
Stimuli
Stimuli consisted of dense (15 dots/deg2) random dot stereograms (22 × 16°) presented on a mid-gray background (Fig. 1A). Dots in the stereogram had a Gaussian luminance profile with the diameter (at half-height) of 0.15° and were randomly black (lowest screen luminance at center) or white (highest screen luminance at center). For comparison with previous studies on depth from anticorrelated RDS (Cogan et al., 1993; Cumming et al., 1998), these parameters correspond to a dot density of ∼61%. The stereogram region was surrounded by a grid of black and white squares (75% density) designed to provide an unambiguous background reference and promote a stable vergence posture. Within the stereogram region, two coherent planes were presented: one to the left and one to the right of the fixation point (Fig. 1B). One plane had a crossed horizontal disparity and the other had an uncrossed horizontal disparity of equal magnitude, thereby minimizing the net stimulus vergence demand. Planes were 8 × 12° in size and were surrounded by random dots that depicted a surface in the plane of the screen (“the surround”). The disparity-defined planes were separated from the fixation marker by 1° in the horizontal direction. The surround extended 2° beyond the edges of the planes in the horizontal and vertical directions. Stimuli were constructed to ensure no monocular cues were available; psychophysical judgments based on monocular views of the stimuli were at chance. Anticorrelated stimuli were created by reversing the contrast of all the dots in the random dot stimulus, except a 1° row at the top and bottom of the surround. Thus, the stimulus could be segregated on the basis of binocular correlation, but segregation was not differential depending on the depth position of the disparity-defined planes (the measure of interest in our study). Although some depth can be perceived in anticorrelated RDS when dot density is low (less than ∼16%) (Cogan et al., 1993; Cumming et al., 1998; Read and Eagle, 2000), psychophysical tests confirmed that our dense stimuli (∼61% density) did not support depth discrimination. A fixation marker was presented at the center of the stimulus that consisted of a square (0.5° side length) with horizontal and vertical nonius lines (length 0.375°).
Stereoscopic stimulus presentation was achieved using a pair of video projectors (JVC D-ILA SX21) containing separate interference filters (INFITEC) whose projected images were optically combined using a beam-splitter cube before being passed through a wave guide into the scanner room. The INIFITEC interference filters produce negligible overlap between the emission spectra for each projector, meaning that there is little crosstalk between the signals presented on the two projectors for an observer wearing a pair of corresponding filters. Projectors were color and luminance calibrated. Stimuli were projected onto a translucent plastic screen behind the head coil inside the bore of the magnet. Observers viewed the screen via a mirror angled at 45° above their heads. The viewing distance was 65 cm, and the entire random dot pattern was visible within the binocular field of view.
Experiment 1.
A 2 × 2 experimental design was used that manipulated the sign of the disparity (crossed vs uncrossed) of the planes presented each side of the fixation marker. In all cases, the plane on one side of the fixation marker had a crossed disparity, whereas the plane on the other had uncrossed disparity (Fig. 1B). Anticorrelated random dot stereograms were created by inverting the polarity of corresponding dots in the two eyes (i.e., a black dot in the left eye corresponded to a white dot in the right eye). For both correlated and anticorrelated stimuli, planes were presented with ±8, 10, 12, 14, or 16 arcmin disparity.
Experiment 2.
Stimuli were identical in structure to those used in experiment 1 but were centered around six different depth positions (±3, ±9, and ± 15 arcmin). To minimize adaptation, the stimulus disparity was randomly jittered around each of these depth positions by ±1 arcmin (e.g., the +3 condition consisted of stimuli with +2, +3, and +4 arcmin disparity). Only correlated stereograms were used.
Imaging
Data were acquired at the Birmingham University Imaging Centre using a 3 tesla Philips MRI scanner with an eight-channel head coil. Blood oxygenation level-dependent signals were measured with an echo-planar imaging (EPI) sequence [echo time (TE), 35 ms; repetition time (TR), 2000 ms] at standard fMRI resolution (voxel size, 2.5 × 2.5 × 3 mm; 33 slices) for the localizer scans and experimental runs for experiment 1. For experiment 2, EPI data (TE, 35 ms; TR, 2000 ms) were collected at a higher resolution (localizer scans: voxel size, 1.5 mm isotropic, 27 slices; experimental scans: 1.5 × 1.5 × 2 mm near coronal, 28 slices). A high-resolution anatomical scan (1 mm3) was also acquired for each participant.
Mapping regions of interest.
For each subject, we identified regions of interest (ROIs) using standard retinotopic mapping procedures (Fig. 1C). Retinotopic areas V1, V2, V3d, V3A, V7, V3v, and V4 were defined using rotating wedge stimuli and expanding concentric rings (Sereno et al., 1995; DeYoe et al., 1996; Aguirre et al., 1998). Area V4 was defined as the region of retinotopic activation in ventral visual cortex adjacent to V3v that contained a full hemifield representation of the upper visual field (Tootell and Hadjikhani, 2001; Tyler et al., 2005). Area V7 was defined as a region anterior and dorsal to V3A that contains a representation of the lower visual field (Tootell et al., 1998; Tsao et al., 2003; Tyler et al., 2005). Furthermore, we identified the lateral geniculate nucleus (LGN) and higher dorsal [V3B/kinetic occipital area (KO), hMT+/V5], ventral (LOC), and parietal regions in independent localizer scans that followed standard procedures. In particular, the LGN was localized using a contrast-reversing checkerboard stimulus (4 Hz) presented in quadrants to the left or right of the fixation point (Kastner et al., 2004). The LGN was defined as the set of contiguous voxels centered on the thalamus that showed greater activation for the reversing checkerboard than fixation. Area V3B/KO (Dupont et al., 1997; Zeki et al., 2003) was defined retinotopically as the region of cortex with a full hemifield representation located inferior to, and sharing a foveal representation with, V3A (Tyler et al., 2005). This retinopically defined area overlapped with the set of contiguous voxels that responded significantly more (p < 10−4) to kinetic boundaries than transparent motion of a field of black and white dots. Area hMT+/V5 was defined as the set of voxels lateral in the temporal cortex that responded significantly higher (p < 10−4) to a coherently moving array of dots than to a static array of dots (Zeki et al., 1991). Area MT was separated from the hMT+/V5 complex by mapping the set of voxels within hMT+/V5 that showed retinotopic organization for a rotating wedge comprising coherently moving dots (Huk et al., 2002). The LOC was defined as the set of voxels in lateral occipito-temporal cortex that responded significantly (p < 10−4) more strongly to intact than scrambled images of objects (Kourtzi et al., 2005). LOC subregions (LO, extending into the posterior inferotemporal sulcus; posterior fusiform sulcus (pFs), posterior to mid-fusiform gyrus) were defined based on the overlap of functional activations and anatomical structures, consistent with previous studies (Grill-Spector et al., 2000). Finally, we localized areas along the intraparietal sulcus [ventral IPS (VIPS); parieto-occipital IPS (POIPS); dorsal IPS (DIPS)] that responded significantly more strongly to three-dimensional shape defined by both disparity and structure-from-motion cues than random patterns (shuffled disparities and motion speeds) (Orban et al., 1999; Chandrasekaran et al., 2007).
fMRI design
Experiment 1.
Stimuli with one of the two possible spatial configurations (left plane near–right plane far or left far–right near) were presented in blocks lasting 20 s. In each block, 20 stimuli (chosen randomly from a set with different magnitudes of disparity ranging from 8 to 16 arcmin in 2 arcmin steps) were presented for 500 ms. Four blocks of each disparity configuration were presented on an individual run in a counterbalanced randomized order, and the scan started and ended with a 16 s fixation interval. Scans lasted a total of 192 s. Data for correlated and anticorrelated stimuli were collected on separate runs (eight runs each). Observers were required to perform an attentionally demanding task on the fixation point (detect a luminance increment of a small square marker that was flashed on and off once per second). This ensured similar levels of attentional engagement across all stimulus conditions (correlated and anticorrelated) and during the fixation periods.
Experiment 2.
Experiment two comprised six different stimulus conditions on each run corresponding to the six different depth plane positions (±3, ±9, and ±15 arcmin). The blocked presentation of each condition lasted 16 s. Within a single block, 16 different RDS stimuli were presented (500 ms stimulus on; 500 ms stimulus off). Each condition block was repeated three times per experimental run, and the order of conditions was randomized across runs and subjects. Each scan lasted 320 s, and subjects completed nine runs. Observers performed an attentionally demanding luminance increment task at the fixation point.
fMRI data analysis.
For each subject, BrainVoyager QX (BrainInnovation) was used to transform anatomical scans into Talairach space, inflate the cortex, and create flattened surfaces of both hemispheres. Functional runs were preprocessed using three-dimensional motion correction, slice scan time correction, linear trend removal, and high-pass filtering (three cycles per run cutoff). No spatial smoothing was performed on the functional data used for the multivariate analysis. Functional runs were aligned to the subject's corresponding anatomical scan and transformed into Talairach space.
Multivoxel pattern analysis.
For each ROI, we selected gray matter voxels from both hemispheres and sorted them according to their response (t statistic) to all stimulus conditions compared with fixation baseline across all experimental runs. We selected the same number of voxels across ROIs and observers, restricting the pattern size to those voxels that showed a t value larger than 0 for the contrast of “all stimuli versus fixation.” This procedure resulted in the selection of 100 voxels per ROI (across both hemispheres) for experiment 1, comparable with the dimensionality used in previous studies (Haynes and Rees, 2005; Kamitani and Tong, 2005). For some cortical areas in some subjects, 100 voxels were not available (3.5% of cases in retinotopic areas; 3.9% in parietal areas), in which case we used the maximum number of voxels that had a t value greater than 0. For the LGN, only 57 voxels were chosen because of smaller volume of the area. To ensure that this smaller voxel number did not produce results that discriminated against the LGN, we analyzed classification accuracies at the 57th voxel in all regions of cortex. This showed similar patterns of results as the analyses using 100 voxels (supplemental Fig. S1, available at www.jneurosci.org as supplemental material). For experiment 2, the higher spatial resolution (smaller voxel size) necessitated the use of more voxels. Using the same procedure as for experiment 1, we selected 600 voxels to be used for pattern classification (same volume of cortex as 100 voxels at lower resolution).
We normalized (z-score) each voxel time course separately for each experimental run to minimize baseline differences across runs. The data vectors for the multivariate analysis were generated by shifting the fMRI time series by 4 s to account for the hemodynamic response lag and then averaging all time series data points of an experimental condition. We used a support vector machine (SVM, SVMlight toolbox, http://svmlight.joachims.org) for classification and performed an n-fold leave-one-out cross-validation. For experiment 1, this consisted of an eightfold cross-validation in which data from seven scans were used as training patterns (56 patterns, eight per run) and data from the remaining run was used as test patterns (eight patterns). For experiment 2, this consisted of a ninefold cross-validation (144 training patterns, 18 test patterns). The prediction accuracy of the classifier corresponded to the proportion of trials on which it correctly predicted the stimulus based on the pattern of fMRI responses, in which chance performance would be 0.5 for a two-way classification. For each subject, we took the mean accuracy across cross-validations. Plotting prediction accuracy across pattern size (number of voxels) showed that classification values had saturated by 100 or 600 voxels, respectively, for experiments 1 and 2 (supplemental Fig. S1A, available at www.jneurosci.org as supplemental material), validating the choice of this pattern size. Statistical significance of the results (accuracies across subjects at the selected pattern size) was evaluated using either repeated-measures ANOVAs or paired t tests.
In experiment 1, we used an index to quantify the difference between the near–far prediction accuracy for correlated stimuli (Acorr) compared with anticorrelated stimuli (Aanti). This was defined as follows:
where c is the probability associated with chance performance (in this case, 0.5). This index provides a value between −1 (perfect classification performance for anticorrelated RDS, chance for correlated RDS) and +1 (perfect classification performance for correlated RDS, chance performance for anticorrelated RDS). (Note that accuracy is defined in the same way for anticorrelated RDS and correlated RDS: it simply reflects discriminability of the stimulus class and does not indicate the sign of the estimated disparity.)
Six-way classifier.
In experiment 2, we determined whether we could predict the viewed stimulus from the six possible alternatives (depth plane positions) using a one-against-one system of binary classifiers. In particular, we trained and tested all possible pairwise classifiers (15 comparisons) and collated their results for each test pattern. The selected stimulus category corresponded to the category that received the fewest “votes against” when collating results across pairwise classifications. In the event of a tie, the prediction was randomly assigned to one of the tied categories. We expressed the accuracy of the six-way classifier as the proportion of test patterns for which it correctly predicted the viewed stimulus. Testing the six-way classifier on shuffled data demonstrated the technique to be unbiased (predictions equally distributed between stimulus classes).
Results
Classification of depth position for correlated and anticorrelated stimuli
Observers were presented with random dot stereograms (correlated or anticorrelated) in which two disparity-defined planes were depicted: one in front of and the other behind the fixation point (Fig. 1). For each observer, we identified regions of interest: LGN, retinotopic visual areas, and higher dorsal (V3B/KO, hMT+/V5), ventral (LOC), and parietal regions. We then trained a linear support vector machine to associate activity in the population of voxels in each ROI with the disparity of the stimuli that gave rise to that activity. We first tested whether it was possible to predict the viewed stimulus from the fMRI activity, calculating the mean leave-one-out prediction accuracy for classifiers trained to discriminate crossed from uncrossed disparities based on correlated or anticorrelated stimuli. We examined the performance of the classifier as a function of the pattern size (number of voxels) available in each ROI (supplemental Fig. S1A, available at www.jneurosci.org as supplemental material) and quantified accuracies across areas using 100 voxels for each subject.
Figure 2A shows between-subjects mean classification accuracies for each ROI for correlated and anticorrelated stimuli in which accuracy of 0.5 indicates chance performance (two-way classification), and 1 would indicate that the classifier could perfectly predict the viewed stimulus based on fMRI activity. As expected from the lack of direct binocular processing in the thalamus, fMRI responses in the LGN did not contain information that could be reliably used to discriminate depth position (predication accuracies were not significantly greater than chance). However, the classifier could extract diagnostic information about disparity in correlated stimuli from the voxel activities in all cortical regions tested (F(4,30) = 7.329; p < 0.001) (supplemental Table S1, available at www.jneurosci.org as supplemental material) with the exception of the anterior subregion of the LOC, pFs. Interestingly, we found that fMRI responses evoked by anticorrelated stimuli supported prediction accuracies that were significantly above chance in early visual areas (V1 and V2), intermediate ventral areas (V3v and V4), as well as V3d and V3A (F(3,23) = 5.255; p = 0.005) (supplemental Table S1, available at www.jneurosci.org as supplemental material). This suggests disparity-selective signals in these regions not directly related to perceptual estimates of depth.
Prediction accuracy for near versus far discrimination. A, Mean prediction accuracy for the discrimination of crossed versus uncrossed disparity in different regions of interest. Dark bars represent performance of the classifier for fMRI responses evoked by correlated stimuli, and white bars represent performance for anticorrelated stimuli. The dotted horizontal lines depict chance performance (0.5 accuracy). Error bars depict the SEM across subjects (n = 8). B, Prediction accuracy expressed using an index to represent the preference for correlated stimuli. A value of 1 would indicate chance performance for responses evoked by anticorrelated stimuli and perfect classification performance for correlated stimuli. A value of 0 would indicate equal performance for correlated and anticorrelated stimuli. Error bars depict bootstrapped 95% confidence intervals for the index. *p < 0.05.
Comparing prediction accuracies for correlated and anticorrelated stimuli showed a significant interaction between the type of stimulus viewed and the cortical region of interest (F(3,26) = 4.96; p < 0.01). In particular, high classification accuracies for correlated stimuli were associated with decreased accuracies for anticorrelated stimuli across the dorsal visual hierarchy. In contrast, classification accuracies were similar for correlated and anticorrelated stimuli in the ventral stream, with the exception of LO. To quantify the difference in prediction accuracy for correlated and anticorrelated stimuli, we used a selectivity index that contrasted the near–far prediction accuracy for correlated stimuli against that for anticorrelated stimuli. This statistic could take values from −1 (chance performance on correlated stimuli, perfect performance on anticorrelated stimuli) to 1 (chance performance on anticorrelated stimuli, perfect performance on correlated stimuli) in which 0 represents no difference in performance based on correlated and anticorrelated stimuli. Figure 2B plots this index for each ROI and illustrates the clear preference for correlated stimuli in dorsal, rather than ventral, areas with the exception of LO. The selectivity index was significantly higher than chance (i.e., bootstrapped 95% confidence intervals did not include 0) in dorsal areas: V3d, V3A, V3B/KO, V7, hMT+/V5, ventral stream area LO, and parietal areas (VIPS, POIPS, and DIPS) (Fig. 2B).
Evidence for the reliable decoding of stimulus disparity in early visual areas based on both correlated and anticorrelated RDS stimuli is consistent with electrophysiological recordings from macaque V1 (Cumming and Parker, 1997) that demonstrate that both types of stimuli are encoded by early binocular neurons. In contrast to V1, we observed dissociated prediction accuracies for correlated and anticorrelated across dorsal areas, including hMT+/V5. This seems discrepant with reports that some neurons in macaque middle temporal area MT/V5 (Krug et al., 2004) and medial superior temporal area MST (Takemura et al., 2001) respond to anticorrelated planar stimuli. One possibility is that different subregions within the human motion complex (hMT+/V5) have different selectivities for anticorrelated stimuli, thus the generalized response of the complex might not be indicative of properties of the constituent areas. Specifically, there is a degree of overlap between the hMT+/V5 complex and LO (Kourtzi et al., 2003) that has been suggested to correspond to LST/FST (lateral superior temporal area/fundus of the superior temporal areas) (Nelissen et al., 2006). However, excluding the LST/FST region (identified as the overlapping voxels between hMT+/V5 and LO that were activated more strongly by structure-from-motion than random stimuli according to the independent IPS localizer) and separating MT from MST (Huk et al., 2002) confirmed the lack of significant accuracies for anticorrelated stimuli across subregions within hMT+/V5 (supplemental Fig. S2A, available at www.jneurosci.org as supplemental material). A remaining possibility is that biases for crossed versus uncrossed disparities across fMRI voxels are not sufficiently large to extract reliable information about disparities using pattern classification methods. This could arise from the fact that neuronal responses to anticorrelated RDS are more heterogeneous in their response profiles, with some neurons showing inverted tuning for anticorrelated stimuli and others showing the same tuning for correlated and anticorrelated stimuli (Krug et al., 2004). In addition, neurons responding to anticorrelated RDS may comprise a smaller proportion of the disparity-selective neurons within MT and respond less strongly to anticorrelated stimuli (Cumming and Parker, 1997). This possibility is supported by lower average responses across voxels (percentage signal change) for anticorrelated than correlated stimuli, consistent with a recent fMRI study (Bridge and Parker, 2007) showing that responses to anticorrelated RDS stimuli are generally weaker than responses to correlated stimuli, and this differential response is more pronounced in hMT+/V5 than early visual areas (V1 and V2) (supplemental Fig. S3A, available at www.jneurosci.org as supplemental material). In relation to area MST, the slight dip in decoding accuracy (supplemental Fig. S2A, available at www.jneurosci.org as supplemental material) could be attributable to the relatively small stimuli we used that are potentially suboptimal for MST (Takemura et al., 2001).
In the ventral stream, intermediate visual areas (V3v and V4) showed similar prediction accuracies when classifying crossed and uncrossed disparities on the basis of correlated and anticorrelated stimuli. Only area LO showed higher accuracy for correlated than anticorrelated stimuli, suggesting multivoxel pattern selectivity for perceptually relevant disparity information. This selectivity might reflect the encoding of specific disparities or, more generally, might reflect the particular depth configuration of the scene (i.e., two depth planes presented left and right of the fixation and positioned one in front and the other behind the fixation). The classification analysis so far presented exploits the union of voxels from both hemispheres rather than being limited to disparity-defined depth positions of stimuli on one side of space. However, performance based on a single hemisphere was similar to performance based on both hemispheres combined (supplemental Fig. S4, available at www.jneurosci.org as supplemental material). Furthermore, because neurons in higher visual areas can have receptive fields spanning both sides of visual space, it would be possible for the depth configuration of the scene to be encoded within a single hemisphere. We therefore used the retinotopic mapping data for early visual areas and MT to determine voxels within each ROI that responded to ipsilateral, as opposed to contralateral, visual presentation (i.e., stimuli displayed in the contralateral or ipsilateral visual fields). We found a significant number of voxels sensitive to ipsilateral stimulation in higher visual areas (17% LO; 24% hMT+/V5). Excluding these ipsilateral voxels from the set of voxels used for the pattern classification, we replicated the main results (supplemental Fig. S4, available at www.jneurosci.org as supplemental material) suggesting that any multivoxel pattern selectivity related to depth configuration is confined to the information contained on one side of space (i.e., a disparity-defined plane presented in front or behind the fixation plane).
The discriminable disparity responses we observed in area LO stand in contrast to the poor prediction accuracy in the more anterior ventral stream area pFs. This is perhaps attributable to our use of planar stimuli that are unlikely to excite circuits more concerned with object processing. For instance, it is known that macaque inferior temporal cortex (area TEs) is sensitive to more complex disparity-defined structures (Janssen et al., 2000), and previous human fMRI suggests that differentially informative fMRI responses might have been observed in pFs had we used slanted or curved stimuli (Chandrasekaran et al., 2007). In a similar vein, the effects we observed in parietal regions were generally weaker than those in earlier visual areas, potentially reflecting the use of stimuli that may not drive these areas optimally. Parietal regions have been reported to respond to more complex disparity-defined stimuli (e.g., slant and curvature) than those used here (Sereno et al., 2002; Tsutsui et al., 2002; Durand et al., 2007). Additionally, it is possible that these higher areas encode information about disparity-defined depth position using a sparser code than other areas, resulting in less biased voxel responses and thereby reducing the performance of the MVPA technique.
Finally, the ROI-based analysis used to select relevant voxels and quantify their utility in distinguishing different disparity-defined stimulus classes has the advantage of independent localization of areas that are predominantly known based on anatomical and functional delineation. However, to determine whether, by using this approach, we might have missed important regions outside those independently localized, we used a “searchlight” classification analysis method (Kriegeskorte et al., 2006). In particular, we moved a small aperture (9 mm, 123 voxels) sequentially through the entire cortex and conducted the near–far classification analysis using correlated stimuli. We thus generated a near–far prediction accuracy map for the whole brain that revealed that high classification accuracies were centered on dorsal visual areas (supplemental Fig. S3B, available at www.jneurosci.org as supplemental material) and restricted to the localized regions of interest, thus confirming our choice of these areas for the ROI-based MVPA.
Classification of parametric disparity manipulations
In our second experiment, we conducted a higher-resolution investigation of responses by visual cortical areas to parametric manipulations of disparity-defined depth position. Recording smaller voxels (1.5 × 1.5 × 2 mm) for a finer-grained analysis of visual cortex, we investigated the processing of specific disparity-defined positions (Note that increasing the slice resolution limited functional data acquisition to occipital and temporal regions.) In contrast to experiment 1 in which different magnitudes were randomly interleaved, we used six different conditions to investigate cortical responses evoked by disparity-defined planes presented at one of three positions in front of (−3, −9, and −15 arcmin) or behind (+3, +9, and +15 arcmin) the fixation point. We confirmed that pooling the data across near versus far positions produced results comparable with those from experiment 1 (supplemental Fig. S5, Table S2, available at www.jneurosci.org as supplemental material).
To move beyond contrasting crossed and uncrossed disparity signals and establish how areas differ in their response to finer-scale disparity information, we used a six-way classification technique. Here, the classifier was trained to distinguish the activity evoked by each stimulus disparity from the other five possible alternatives. We found that a viewed stimulus could be identified with better than chance accuracy (16.67% for this six-way discrimination) in all the areas tested (supplemental Table S3, available at www.jneurosci.org as supplemental material). However, more pertinent in relation to the representation of specific disparity signals was to examine the pattern of predictions made by the classifier when trained on a particular stimulus disparity. In particular, we wanted to determine whether the classifier would show a “tuned” response, mis-predicting stimuli with similar disparities more frequently than stimuli differing by a large amount of disparity. To do this, we calculated the proportion of trials on which the classifier predicted each stimulus disparity from the fMRI activity associated with each of the six different stimulus disparities. This gave us six predictions (i.e., the proportion of predictions for each of the possible alternatives) for every stimulus-evoked fMRI pattern (i.e., the fMRI responses evoked by each stimulus disparity). We plotted these 36 predictions as a function of the difference in disparity between the stimulus that evoked the fMRI response and the disparity predicted by the classifier. This allowed us to generate multivoxel pattern-based tuning functions for disparity in each cortical ROI and suggested differences in the way in which disparities are represented in different visual areas (Fig. 3).
fMRI pattern-based tuning curves. A, The proportion of predictions made to each of the stimulus disparity values in terms of the disparity difference between the viewed stimulus and the prediction. Each series corresponds to predictions made for activity evoked by a different disparity. The solid black line shows the best-fitting Gaussian to the data. Random predictions would correspond to a proportion of 0.167. B, The goodness-of-fit for the Gaussian model in each cortical region of interest. Error bars show 95% confidence intervals calculated from 1000 bootstrap samples. C, The full-width at half-maximum of the best-fitting Gaussian in each region of interest. Larger values correspond to a broader spread of predictions. Error bars show 95% confidence intervals calculated from 1000 bootstrap samples.
To quantify the pattern-based tuning of each area of interest, we fit a Gaussian to the set of predictions made in each area. This type of “tuned excitatory” (Poggio, 1995) model provided a good description of the performance of the classifier, although with a notable drop in the amount of variance explained by the model in area LO (Fig. 3B). Comparing the width of the Gaussian fit [quantified by the full-width at half-maximum (FWHM)] in each ROI suggested differences in the pattern-based tuning widths between areas (Fig. 3C, nonoverlapping 95% confidence intervals). Specifically, early areas (V1 and V2) and intermediate dorsal areas (V3d, V3A, and V7) exhibited sharper tuning (small FWHM), compatible with the notion that the responses of neural populations in these regions are specific to a narrow range of presented disparities. In contrast, pattern-based tuning appeared broader (larger FWHM) in intermediate ventral (V3v and V4) regions, with highest values seen in LO and hMT+/V5. This broader tuning in hMT+/V5 remained when we excluded voxels corresponding to subregions MST and LST/FST of the complex, consistent with electrophysiological data showing broader disparity tuning in monkey MT/V5 (DeAngelis and Uka, 2003).
The very broad tuning of the predictions of the classifier in LO, coupled with the poorer model fit, prompted us to test whether an alternative model for disparity selectivity might be more appropriate for describing the classification of responses by LO to planar stimuli. This alternative model was based on the assumption that the sign of the stimulus disparity (i.e., crossed vs uncrossed) would be more important than the magnitude of disparity [i.e., a “categorical” model based on the notion of “near” and “far” neurons (Poggio, 1995)].
To investigate this possibility, we compared accuracies for a binary classification between stimuli that differed by 6 or 12 arcmin in which this difference either crossed the fixation plane (“different-sign” condition: e.g., +3 arcmin discriminated from −3 arcmin) or did not (“same-sign” condition: e.g., +3 arcmin discriminated from +9 arcmin). Our expectation was that areas representing the categorical (near/far) structure of the stimuli would show higher classification accuracies for classification across the fixation plane than for classifications between stimuli positioned either both in front or both behind the fixation plane. This analysis (Fig. 4A) (supplemental Fig. S3B, available at www.jneurosci.org as supplemental material) revealed significantly higher prediction accuracies for the different-sign classification compared with the same-sign classifications across areas (F(1,7) = 7.382; p = 0.03), suggesting that all areas are influenced to some degree by a change of disparity sign. Furthermore, the results suggested that classification accuracies increased as the magnitude of the disparity change increased (F(1,7) = 123.631; p < 0.001). We observed an interaction between the magnitude of disparity change and the cortical region of interest (F(3,23) = 4.869; p = 0.008), indicating a dissociation in the pattern of results between areas. In particular, the magnitude of disparity (i.e., 6 vs 12 arcmin) had a significant effect on classification accuracies in all areas except LO, whereas a change of disparity sign had a significant effect on classification accuracies only in LO (supplemental Table S4, available at www.jneurosci.org as supplemental material).
Binary classifications: predicting the sign or magnitude of disparities. A, Prediction accuracy of classifiers distinguishing between two stimuli when those stimuli have either the same sign (red bars) or a different sign (blue bars). The magnitude of the difference in disparity between the two stimuli is illustrated by the bar saturation (less saturated colors indicate a 6 arcmin difference between stimuli, and bolder colors indicate a 12 arcmin difference). The dotted horizontal lines depict chance performance (0.5 accuracy). Error bars show the between-subjects SEM (n = 8). B, Prediction accuracy for a classification across the fixation plane as a function of the difference in disparity between the presented stimuli. The dotted horizontal lines depict chance performance (0.5 accuracy). Area LST/FST corresponds to the voxels that overlap between hMT+/V5 and LO. Areas labeled “MT & MST” correspond to the hMT+/V5 region excluding LST/FST. Error bars show the between-subjects SEM.
Additional support for categorical-type representations of disparity in LO came from comparing classification accuracies for binary near–far classifications when the classifier was trained to discriminate stimuli with different magnitudes of disparity change (i.e., steps of 6, 12, 18, 24, and 30 arcmin) across the fixation plane (Fig. 4B). We found that, in the majority of areas, prediction accuracy increased as the magnitude of the disparity difference between the two planes increased. In contrast, prediction accuracies in area LO remained constant across the different magnitudes of disparity change (supplemental Table S5, available at www.jneurosci.org as supplemental material). Together, these results suggest that activity in LO supports good discrimination of near versus far depth positions but does not support a reliable discrimination of different degrees of crossed or uncrossed disparity, suggesting a more categorical representation of depth position.
Interestingly, responses in V4 showed a response pattern intermediate between LO and the other visual areas. In particular, accuracies increased as the step size increased for initial steps but did not increase further beyond a step size of 18 arcmin. This might suggest relatively course encoding of disparity signals within these areas or, alternatively, a population response that is biased away from representing disparity information in the vicinity of the fixation plane. We noted a similar pattern of results in hMT+/V5; however, additional investigation suggested that the plateau in performance for larger step sizes was driven by a ventral contribution to the hMT+/V5 region. In particular, separating voxels corresponding to the LST/FST region from hMT+/V5 suggested that this region responds in a similar manner to LO (Fig. 4B). Excluding these voxels from hMT+/V5 revealed a pattern of responses in the remainder of hMT+/V5 (Fig. 4B, MT & MST) similar to the other dorsal visual areas, suggesting that the leveling off in classifier performance at larger disparities could be attributed to the more categorical properties of LST/FST. Furthermore, classification performance in MT and MST was very similar to the remainder of hMT+/V5 (supplemental Fig. S2C, available at www.jneurosci.org as supplemental material).
Together, our analyses suggest that dorsal areas show tuning to the specific disparity content of the viewed stimuli, although this tuning appears broader in hMT+/V5, consistent with a coarse representation of disparities (Uka and DeAngelis, 2006). In contrast, results in LO suggest a different type of disparity representation that is more driven by the sign of the viewed disparity than its magnitude. These findings demonstrate that, by using parametric disparity manipulations and multivoxel pattern analysis, we were able to extract reliable information related to perceived disparity-defined depth across human brain areas. The multivoxel classification methods we used can exploit weak voxel biases that relate to selectivity in the underlying neural populations. However, it is important to note that this fMRI pattern-based selectivity is derived from hemodynamic signals within a voxel pattern that relate to both the input and the output of the neural population rather than reflecting disparity selectivity as measured by the spike output of single neurons. Nevertheless, the similarities between our results and previous neurophysiological findings strengthen the link between the fMRI selectivity we observe and the underlying neuronal responses.
Voxel population analysis
MVPA prediction accuracies serve as a measure of the information related to specific depth positions within a specific cortical region. However, to gain more insight into disparity coding in each area, we examined the extent to which individual voxels showed a preference for near or far disparities. This preference likely reflects the extent to which the neurons within each voxel prefer particular disparities, with a strong preference (bias) suggesting neural populations tuned to a particular stimulus type (e.g., crossed disparity). Such clustering has been suggested previously by monkey electrophysiology (DeAngelis and Newsome, 1999; Chen et al., 2008).
In a first analysis, we examined the distribution of t values obtained by comparing fMRI responses [general linear model (GLM) analysis] for crossed versus uncrossed disparities, in which the t value indexes a biased preference of a voxel for near or far depth positions. The construction of our stimuli (planes of equal and opposite disparities to the left and right of the fixation point) meant that we performed this analysis on a per-hemisphere basis. This yielded two distributions of t values, one in which a preference for crossed disparities is represented by negative t values (black series in Fig. 5A) (supplemental Fig. S6A, available at www.jneurosci.org as supplemental material) and the other in which a preference for crossed disparities is represented by positive t values (gray series in Fig. 5A) (supplemental Fig. S6A, available at www.jneurosci.org as supplemental material). We examined three properties of these distributions: their central tendency, variability, and skew. Although a general population preference for crossed disparity was observed, the extent of this preference and the degree of skew in the voxel distribution differed between areas, with higher dorsal areas V3A, V7, and hMT+/V5 showing large degrees of bias (Fig. 5B) (supplemental Table S6, available at www.jneurosci.org as supplemental material). These findings are consistent with a general preference for near disparities in macaque V1 (Prince et al., 2002), V3 (Adams and Zeki, 2001), V4 (Hinkle and Connor, 2001; Watanabe et al., 2002; Hinkle and Connor, 2005; Tanabe et al., 2005), and MT/V5 (DeAngelis and Uka, 2003). As well as a small shift in the mean of the distributions toward near disparities, we also observed significant skew in the t value distributions toward near disparities, with strongest effects in V3d, V3A, and V7 (Fig. 5C) (supplemental Table S6, available at www.jneurosci.org as supplemental material). Finally, considering the variance of the distributions of t values suggested differences between the representations of disparities between ROIs (F(1,9) = 7.639; p = 0.015). In particular, areas V3A and V7 had more variable distributions, indicating that, in these areas, a higher proportion of voxels showed strong preferences for either crossed or uncrossed disparities (Fig. 5D). Based on the logic that the more similar the stimulus preferences of neurons within a voxel, the higher the fMRI bias, this suggests a greater spatial clustering of neurons tuned to particular disparities in these dorsal visual areas. This could arise from a preferential response from a large proportion of neurons within the voxel or, alternatively, from very strong tuning for a small proportion of neurons. On either basis, areas V3A and V7 appear to have greater spatial clustering of neural populations selective to particular disparities than other areas. These results are consistent with previous fMRI studies that have suggested strong disparity modulation in dorsal visual areas V3A and V7 (Backus et al., 2001; Tsao et al., 2003; Neri et al., 2004; Tyler et al., 2006).
Distribution of voxel biases. A, The distributions of t values yielded by contrasting crossed and uncrossed disparities in illustrative ROIs. The gray and black curves represent voxels in different hemispheres that have access to stimuli in the left and right visual fields, respectively. Vertical lines indicate the means of the distributions. Differences in the mean value represent the presence of a univariate signal. A preference for crossed disparities is shown by a rightward shift in the gray distribution and a leftward shift in the black distribution. Curves for all areas are shown in supplemental Figure S6 (available at www.jneurosci.org as supplemental material). B, The difference in the mean of the t value distributions for the left and right hemispheres for all regions. Values significantly greater than 0 are marked with an asterisk. Error bars are SEM across subjects (n = 8). C, The difference in the skew of the t value distributions. Values significantly greater than 0 are marked with an asterisk. Error bars are SEM across subjects (n = 8). D, The variance of the t value distributions. A higher variance indicates a wider spread of t values (i.e., a higher proportion of voxels with a strong preference for crossed or uncrossed disparities). Error bars show SEM across subjects (n = 8). E, The difference in the saturation rates for pattern size by prediction accuracy curves when voxels are ordered by their significance compared with being randomly sampled. A larger value indicates a larger difference in saturation rate and provides an indication that the prescribed voxel ordering is more critical. Error bars show SEM across subjects (n = 8).
It is interesting to note that the pattern of variances across ROIs resembles closely the pattern of classification accuracies for the discrimination of crossed versus uncrossed disparities (Figs. 2A, 5D) (Spearman's rank-correlation coefficient, 0.936; p < 0.001). This is expected because the voxels that are most informative for classification are typically those that show the most bias, and thus those areas that contain the most biased voxels produce the highest classification accuracies. The same t value distribution analysis on the data obtained in experiment 1 replicated the findings for correlated stimuli, but differences between ROIs in skew, central tendency, and variance were abolished for anticorrelated stimuli in most of the areas (supplemental Fig. S6B–D, available at www.jneurosci.org as supplemental material). This is a useful confirmation in suggesting that differences between areas are related to the processing of perceptually useful disparity signals rather than simply reflecting differences in the overall fMRI signals in different cortical areas.
In a second analysis, we explored the disparity-selective content of voxels in different cortical areas by performing a permutation analysis on the ordering of voxels considered for the near versus far MVPA classifier. In our standard analysis, we advanced the pattern size (number of voxels) by ranking voxels according to their response to visual stimuli (i.e., the t value from the contrast “all stimuli > fixation”). To examine the relevance of this ranking for the performance of the classifier, we ran the same classifier analysis by taking a simple random sample of the voxels that responded more to visual stimuli than fixation rather than by including them according to their ranked significance. We did this 1000 times for each subject to obtain a bootstrapped estimate of the rate at which classification accuracies increased as a function of the amount of data (number of voxels) available (supplemental Fig. S7, available at www.jneurosci.org as supplemental material). We fit the accuracy functions using a saturating exponential curve (Ostwald et al., 2008) and compared the time constants (i.e., rate of increase) of the curves for random ordering and standard ordering (Fig. 5E). We reasoned that shorter time constants (i.e., faster increases in prediction accuracy) for the rank rather than randomly ordered voxels would suggest neural populations that were less homogeneous in their disparity selectivity. In particular, a region may contain some highly selective voxels that contribute more to the performance of the classifier than other voxels that are included later in the ranking order. Our permutation analysis suggested differences in the information content of individual voxel responses between ROIs (F(9,63) = 2.112; p = 0.041). We found that the ordering of voxels was least important in areas V3A and V7 in which performance was similar when voxels were sampled systematically or randomly. In contrast, in other areas (e.g., V1, hMT+/V5, and LO), the ordering of the voxels was more important (the rate of increase for random ordering was much lower), suggesting that voxels responding preferentially to crossed versus uncrossed disparities make up a smaller proportion of the total voxel population in these regions.
Together, our analyses show finely tuned responses to binocular disparities that are represented across a large proportion of voxels within intermediate dorsal areas (V3A and V7), whereas ventral stream regions (e.g., LO) may contain more heterogeneous neuronal populations that evoke categorical-like responses to disparity-defined depth position in a smaller proportion of voxels. It is important to note that, because our classification method relies on the biased response of individual voxels, we measure the “strongest voice” within each area. Thus, a small number of disparity-selective neurons within a voxel or a weak nonsystematic response across the voxel pattern to disparity will be difficult to detect (as is the case with any fMRI method). For instance, LO may contain neurons that show more tuned responses to disparities, but they may be fewer in number or respond less vigorously, meaning that the classifier is guided by neurons with more categorical responses.
Control data and additional analyses
We took a number of precautions to avoid experimental artifacts and ensure that our data treatment was appropriate. First, to ensure that attentional allocation under different experimental conditions was equivalent, participants performed an attentionally demanding task on the fixation point during scanning. Second, eye movement measures suggested no systematic differences between experimental conditions (supplemental Fig. S8, Table S7, available at www.jneurosci.org as supplemental material), making an explanation of our results based on eye movements unlikely. The confines of the scanner meant that we were not able to measure eye vergence; however, our stimuli were designed to reduce the likelihood of vergence changes. In particular, (1) planes to the left and right of fixation had equal and opposite disparities, (2) a stable, low spatial frequency pattern in the plane of the screen surrounded the stimuli, and (3) participants were instructed to use the horizontal and vertical nonius lines to assist them in ensuring correct eye alignment at all times. Note also that changes in eye vergence per se are unlikely to account for the prediction accuracies we observe. In particular, because each viewed stimulus contained both crossed and uncrossed disparities (presented to the left or right of the fixation point), a change of vergence induced by the stimulus would not give rise to a differential cortical response between conditions that could be exploited by the classifier.
Third, to ensure that the differences we observed in classification accuracy related to aspects of disparity processing rather than the overall fMRI responsiveness to the stimuli for each ROI, we computed the functional signal-to-noise ratio (fSNR) for each ROI (supplemental Fig. S9A, available at www.jneurosci.org as supplemental material). This suggested that the fSNR did not provide a good account of the classification accuracies we observed: contrasting with our findings, fSNR was greatest in the early retinotopic visual areas and decreased toward higher areas. Furthermore, our consideration of the distribution of biased voxels (above) suggested the presence of a small univariate signal in some regions of interest (i.e., a non-zero mean t value for the contrast “crossed > uncrossed” disparities). To determine whether this univariate signal [confirmed by a standard GLM random effects analysis (Fig. S9B, available at www.jneurosci.org as supplemental material)] could account for the classification accuracies we observed, we ran a univariate classifier analysis. Specifically, we trained the classifier using the mean voxel response of each ROI (i.e., a univariate representation of the multivariate signal). Classification results using this approach were considerably lower (supplemental Fig. S9C, available at www.jneurosci.org as supplemental material), indicating that information used by the classifier was not limited to the overall population preference for near disparities observed in some areas. [Note also that a univariate signal explanation would be insufficient to account for classification performance with anticorrelated stimuli because a univariate response was almost universally absent for anticorrelated stimuli (supplemental Fig. S6B–D, available at www.jneurosci.org as supplemental material).]
Fourth, to ensure that our classification approach was not overpowered and did not suffer from any bias, we ran the classification with the data labels shuffled. Theoretically, this should result in classification accuracies at chance. The results for the classification of 5000 permutations of shuffled data for the crossed versus uncrossed classification were at chance (supplemental Table S8A, available at www.jneurosci.org as supplemental material), as were 1000 permutation of the six-way classifier (supplemental Table S8B, available at www.jneurosci.org as supplemental material). Furthermore, predictions made by the six-way classifier were equally distributed into the six categories, demonstrating the technique to be unbiased.
Finally, we tested our classifier using an image-based classification of monocular views of the stimuli to ensure the absence of monocular artifacts. Specifically, the classifier was trained on different disparity configurations given the image intensities of half a stereo pair (e.g., the image of the left eye only). The classifier was not able to make reliable predictions based on monocular views of the stimuli.
Discussion
Our study provides the first direct test of the fMRI-based selectivity for disparity-defined depth position in the human brain. We provide three main advances in understanding the neural representations underling depth perception. First, in contrast to the conventional distinction between visual pathways, we provide evidence that both higher dorsal areas V3B/KO, V7, and hMT+V5 and ventral stream area LO contain information that is highly diagnostic of perceptually useful, disparity-defined depth. However, we observed distinctions between dorsal and ventral stream responses to disparity at intermediate stages of processing. In particular, we find that dorsal retinotopic areas respond preferentially to the disparity information contained in correlated RDS stimuli that support the perception of depth structure. This contrasts with intermediate ventral areas (V3v and V4) in which fMRI responses contain information sufficient to decode the viewed stimulus for anticorrelated RDS that do not support the perception of coherent depth. This suggests that selectivity for signals relevant for perceived depth develops at comparatively earlier stages in the dorsal stream. Second, we show that, whereas higher ventral stream region LO appears selective for perceptually useful disparities, this multivoxel pattern selectivity differs from that in dorsal areas. In particular, dorsal areas contain information selective for the specific disparity content of the viewed stimulus, whereas LO appears to encode disparity in a categorical manner with a generalized response to planes with the same disparity sign. Finally, by considering the distribution of voxel responses within each cortical region, we find that, in contrast to ventral stream areas, dorsal areas (mainly V3A and V7) have a marked preference for crossed disparities and contain a large proportion of voxels with selective information about the disparity content of presented stimuli. These different properties in dorsal and ventral stream areas may support different types of perceptual task, with dorsal activity informative for considerations of the specific depth structure of the viewed scene and ventral stream activity supporting invariant recognition of object configurations across depth position.
Relation to electrophysiological recordings
The first visual region investigated was the LGN in which pathways from the two eyes remain segregated, and sensitivity to binocular disparity is not expected. Reassuringly, we found no significant classification accuracies in the LGN, providing a useful control with which to evaluate our classification analysis. In area V1, in which neurons first receive direct input from both eyes, we found significant prediction accuracy for both correlated and anticorrelated stimuli, consistent with the finding that both types of stimuli are encoded by early binocular neurons (Cumming and Parker, 1997).
Higher up the visual hierarchy, responses to anticorrelated RDS have been measured in areas MT/V5 (Krug et al., 2004) and MST (Takemura et al., 2001), in apparent distinction with our finding that anticorrelated stimuli do not support reliable prediction accuracies. Despite this discrepancy, our pattern-based tuning functions were broader in hMT+/V5 than other visual areas, consistent with electrophysiological recordings in MT/V5 (DeAngelis and Uka, 2003).
Within the ventral pathway, recordings in macaque V4 indicate that a slightly lower proportion of neurons show significant tuning to disparities defined by anticorrelated RDS than are found in V1 (Tanabe et al., 2004; Kumano et al., 2008). Our results suggest very similar prediction accuracies for correlated and anticorrelated stimuli in human V4 (Fig. 2B). This could, in principle, arise from a smaller, yet significantly tuned, population that responds to anticorrelated stimuli, consistent with electrophysiological recordings. Additionally, there is potential for a noncorresponding nomenclature between species. Specifically, studies of disparity processing in monkey V4 have recorded from the dorsal portion of V4 (Watanabe et al., 2002; Tanabe et al., 2005) (Fig. 1B,C). In humans the locus of a dorsal area V4 is debated (Tootell and Hadjikhani, 2001; Wade et al., 2002; Hansen et al., 2007), but the smaller proportion of cells responding to anticorrelated stimuli in macaque dorsal V4 are consistent with the low prediction performance for anticorrelated stimuli that we observed in area V3B/KO (which falls within the V4d–topologue region). If so, it remains an open question whether the ventral portion of macaque V4 encodes disparity information from anticorrelated stimuli. Our fMRI evidence suggests that it might.
Higher in the ventral pathway, classification accuracies in area LO contrasted with those from earlier ventral areas in that we observed no significant multivoxel pattern selectivity for anticorrelated stimuli. This is consistent with recordings from macaque inferior temporal cortex (area TEs) in which disparity selectivity for anticorrelated stimuli appears absent (Janssen et al., 2003). The result is intriguing, however, in that fMRI responses to anticorrelated stimuli do not appear to be progressively abolished in the ventral hierarchy. Rather, this seems to be a property of the dorsal stream (Fig. 2B), raising the question of how disparity responses in LO arise. fMRI does not enable us to determine the nature of the intracortical interactions that give rise to responses in different areas, so it is possible that our measures do not capture the important contributions to disparity processing made by intermediate ventral areas V3v and V4. Alternatively, disparity responses in LO may be derived from activity in the dorsal stream and exploit cross-connections between streams [e.g., V3A and parietal connections to temporal–occipital area TEO, (Webster et al., 1994), anterior intraparietal area connections to inferior temporal cortex (Borra et al., 2008)]. Even if this is the case, our results suggest that the representation of disparity information in LO differs from that in the dorsal stream. In particular, responses appear specialized for the sign of binocular disparity rather than its magnitude, at least for the stimuli (i.e., disparity-defined planes) and range (±15 arcmin) within which we have tested. That is, prediction accuracy was higher for near versus far classifications across the fixation plane, but it did not change as a function of the disparity step size. Thus, the representation of depth in LO may reflect the categorical distinction between near or far depth positions rather than the specific disparity content of the stimuli. This is consistent with reports that a large proportion of disparity-selective neurons in macaque IT show responses classified as “near” or “far” rather than tuned responses (Uka et al., 2000). These results suggest a degree of invariance in the response properties of voxels within LO that would be useful encoding property for object recognition. Previous electrophysiology (Uka et al., 2005) and human brain imaging (Chandrasekaran et al., 2007) studies have suggested a correspondence between cortical responses and perceptual judgments of disparity in temporal cortex. In the future, it would be of interest to determine how selectivity in this region is modulated when subjects make perceptual judgments on disparity-defined stimuli.
Functional distinctions between dorsal and ventral pathways
Understanding the different cortical areas involved in processing disparities and the different uses to which the information is put is a considerable open challenge. Modular processing has distinct computational advantages (Marr, 1976), and several hypotheses regarding the functional specialization disparity processing of the brain have been proposed, while emphasizing that any broad scheme for the roles of the visual streams is necessarily a simplification (Neri, 2005; Parker, 2007). One current view is that the ventral stream is more specialized for disparity signals that are perceptually useful [e.g., attenuated responses for anticorrelated stimuli and a preference for relative disparities (Janssen et al., 2003; Neri et al., 2004; Tanabe et al., 2004; Umeda et al., 2007; Kumano et al., 2008)]. In contrast, disparity selectivity in the dorsal stream may be more relevant for the visual control of action. However, here we observe signals that are perceptually relevant in both streams. Moreover, intermediate ventral areas appear to encode anticorrelated disparity signals, whereas dorsal visual areas show the progression of responses expected on the basis that disparity estimates are hierarchically refined to remove perceptually irrelevant signals. The categorical-type selectivity we observe in LO may be appropriate for encoding depth configurations (i.e., surfaces in front or behind their neighbors) to support the invariant recognition of objects across different positions in depth. In contrast, the metric-type activity in dorsal areas (V3A and V7) may be more appropriate for judgments requiring fine positional discriminations and the fine control of body movements, whereas the broader pattern based tuning in hMT+/V5 might suggest activity consistent with the use of signals for coarse depth discriminations (Uka and DeAngelis, 2006).
In summary, using multivoxel pattern classification methods and high-resolution measurements with parametric stimulus manipulations, our study characterizes fMRI selectivity for disparity-defined depth position across the human ventral and dorsal pathways. The link between the fMRI pattern-based selectivity we observe and the underlying neural code is strengthened by similarities between our results and previous neurophysiology findings. Furthermore, our study identifies two main findings of interest for additional investigation with physiology: (1) the development of a perceptually relevant code for disparity at earlier stages of processing in dorsal rather than ventral areas, and (2) a categorical code for disparity processing in temporal areas in contrast to metric tuning in the dorsal stream. Finally, our methods and results establish a solid ground for future studies investigating whether neural selectivity for disparity as revealed by fMRI patterns is directly linked to, and may predict, human behavior in tasks that exploit disparity information for different functional purposes.
Footnotes
-
This work was supported by Biotechnology and Biological Sciences Research Council Grants [C520620, E027436] (A.E.W., Z.K.) and a summer vacation bursary from the Engineering and Physical Sciences Research Council (A.E.W.). Thanks to Rachael Ludford, Ruth Button, and Rishwinder Sethi for help with pilot work and preliminary analysis. We are grateful to Andrew Parker, Holly Bridge, Rufin Vogels, and Suzanne McKee for their valuable comments on a preliminary version of this manuscript.
- Correspondence should be addressed to Andrew E. Welchman, School of Psychology, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK. a.e.welchman{at}bham.ac.uk