Abstract
Three-dimensional (3D) shape is important for the visual control of grasping and manipulation and for object recognition. Although there has been some progress in our understanding of how 3D shape is extracted from motion and other monocular cues, little is known of how the human brain extracts 3D shape from disparity, commonly regarded as the strongest depth cue. Previous fMRI studies in the awake monkey have established that the interaction between stereo (present or absent) and the order of disparity (zero or second order) constitutes the MR signature of regions housing second-order disparity-selective neurons (Janssen et al., 2000; Srivastava et al., 2006; Durand et al., 2007; Joly et al., 2007). Testing the interaction between stereo and order of disparity in a large cohort of human subjects, revealed the involvement of five IPS regions (VIPS/V7*, POIPS, DIPSM, DIPSA, and phAIP), as well as V3 and the V3A complex in occipital cortex, the posterior inferior temporal gyrus (ITG), and ventral premotor cortex (vPrCS) in the extraction and processing of 3D shape from stereo. Control experiments ruled out attention and convergence eye movements as confounding factors. Many of these regions, DIPSM, DIPSA, phAIP, and probably posterior ITG and ventral premotor cortex, correspond to monkey regions with similar functionality, whereas the evolutionarily new or modified regions are located in occipital (the V3A complex) and occipitoparietal cortex (VIPS/V7* and POIPS). Interestingly, activity in these occipital regions correlates with the depth amplitude perceived by the subjects in the 3D surfaces used as stimuli in these fMRI experiments.
Introduction
Processing of three-dimensional (3D) shape is important for object recognition and control of grasping and manipulation. 3D shape can be recovered from several visual cues, such as motion, texture, shading, or stereoscopic information (Todd et al., 2004). Although there is a growing body of evidence that 3D shape from monocular cues is processed both in the dorsal and ventral visual system of man (Orban et al., 1999; Paradis et al., 2000; Taira et al., 2001; Georgieva et al., 2008), little is known about how 3D shape is extracted from disparity in the human brain. The only study performed thus far (Chandrasekaran et al., 2007) used stereoscopic 3D stimuli and symmetry judgments, making it difficult to disentangle the processing of 3D shape from disparity from that of symmetry.
In the monkey it has been shown that spatial variations in depth along surfaces, which we shall hereafter refer to as “depth structure,” are processed in areas TEs and AIP, which house neurons selective for second-order disparity (Janssen et al., 1999, 2000; Srivastava et al., 2006, 2007). By “second-order disparity,” we refer to surfaces curved in depth, and by “first-order disparity,” to surfaces slanted in depth. Since the TEs and AIP neurons are also selective for 2D shape in frontoparallel planes, they carry all the information needed to represent 3D shape from stereo. Using fMRI in the awake monkey we showed that these two regions are sensitive to the interaction between surface order and disparity, responding more to binocular curved 3D surfaces than to binocular flat surfaces at different depths, relative to their respective monocular controls (Durand et al., 2007; Joly et al., 2007). We thus have a MR signature for second-order disparity-selective neurons and the aim of the present experiments is to use this signature in human subjects to chart the regions of the human brain that may house similar neurons.
We presented exactly the same stimulus conditions to human subjects as those used in the monkey fMRI including the stereo conditions and their controls and conditions testing for 2D shape sensitivity (Kourtzi and Kanwisher, 2000; Denys et al., 2004). It is worth pointing out that in the monkey we also have an MR signature for first order selective neurons (Durand et al., 2007), but this was not used in the present experiments.
In the monkey, five regions are sensitive to depth structure from stereo: TEs, AIP, anterior LIP, V3, and a small subpart of premotor F5a (Nelissen et al., 2005), possibly corresponding to the location of canonical neurons described by Rizzolatti and coworkers (Rizzolatti et al., 1988; Raos et al., 2006). We have proposed (Orban et al., 2006) that human DIPSM and DIPSA correspond to monkey anterior LIP and posterior AIP respectively, with anterior AIP corresponding to the putative human AIP (phAIP), localized on the basis of grasping movements (Binkofski et al., 1999a,b). Therefore, we expect sensitivity to 3D shape from stereo in human DIPSM and DIPSA. In addition these experiments should provide invaluable information about the location of possible human homologues of TEs and F5a, and thus about the human cortical region housing canonical neurons (Grèzes et al., 2003).
Materials and Methods
Participants
We performed functional MR measurements on 22 right-handed healthy human volunteers (10 male and 12 female, mean age 23 years, range: 19–35). All participants had normal or corrected-to-normal vision using contact lenses, and were drug free. None of them had any history of mental illness or neurological disease. Written informed consent was obtained from each subject before participating in the study in accordance with the Helsinki Declaration and the study was approved by the Ethical Committee of the Katholieke Universiteit Leuven Medical School.
Three additional criteria were used to select the participants (Fig. 1). Given the large variation in individual stereo capabilities, we required a strong sensitivity to depth structure from disparity. This was assessed by a psychophysical experiment undertaken outside the scanner in which subjects adjusted a curve to match the depth profile of 3D surfaces and the adjustments were then correlated with the actual depth profiles. A total of 38 volunteers were tested, from whom we selected the first 22 subjects having an average correlation between perceived and actual depth profiles of 0.85 or better (Fig. 1A). The second criterion relates to the susceptibility artifacts caused by the air in the temporal sinuses that degrade the MR signal in anterior parts of the temporal lobe. We required these susceptibility artifacts to be located far enough from the presumed location (Janssen et al., 2003) of a depth structure-sensitive region in inferior temporal gyrus (Fig. 1B). All 22 subjects who were selected based on the first criterion also satisfied the second one. These selections were made before scanning the main experiment. Finally, to minimize vergence eye movements, we verified a posteriori that all subjects had indeed maintained excellent fixation during the scanning sessions. The 22 subjects who were included in the main experiment are listed in Table 1.
During the scanning sessions subjects were instructed to maintain fixation on a small yellow cross, composed of two perpendicular bars (0.04 × 0.19°), centered in a small black zero disparity region (0.38 × 0.38°) in the middle of the stimuli. When subjects were performing the high-acuity task (Vanduffel et al., 2001), the target was replaced with a yellow bar whose size was adapted for each subject based on psychophysical testing in the scanner (Sawamura et al., 2005). To reduce the amount of head motion during the scanning sessions, the subjects were asked to bite an individually molded bite bar fixed on the scanner table. Movements of the left eye were recorded (at 60 Hz) during all of the fMRI experiments using the MR-compatible ASL eye tracking system 5000 (Applied Science Laboratories).
Visual stimuli and experimental design
The stimuli were projected with a liquid crystal display projector (Barco Reality 6400i, 1024 × 768, 60 Hz refresh frequency) onto a translucent screen positioned in the bore of the magnet at a distance of 36 cm from the point of observation. Subjects viewed the stimuli through a mirror, tilted 45°, that was attached to the head coil.
The visual stimuli (Fig. 2) were derived from those used by Janssen et al. (1999) and are exactly the same as used in an fMRI experiment in the behaving monkey (Durand et al., 2007). In brief, four 2D shape outlines (average size 5.6 × 5.6°) filled with a yellow random dot texture (50% density, dot size 0.1°), were displayed through red/green filter stereo glasses. A two-by-two factorial design with factors presence or absence of binocular disparity and high-order (second-order) versus lower-order (0-order) disparity, identical to the design used in the previous macaque fMRI study (Durand et al., 2007), defined the four main stimulus conditions (Fig. 2A). In one condition, six depth profiles (1, ½, ¼, sinusoidal cycle and their anti-phase counterparts) were defined by vertical sinusoidal gradients of binocular disparity and portrayed at +0.5°and −0.5° disparity (Fig. 2B). This condition, in which second-order disparity surfaces were presented, was labeled the 3D curvature or “curved stereo” (CS) condition. On the basis of the disparity distribution of the 3D curvature condition (Fig. 2C), we defined 12 positions in depth (±0.46°, ±0.35°, ±0.23°, ±0.11°, ±0.04°, and twice 0°, the fixation plane) at which the texture-filled 2D shapes were presented as flat, frontoparallel surfaces in depth. This condition, in which 0-order disparity stimuli were presented, was labeled the 3D position in depth or “flat stereo” (FS) condition. In the two corresponding no-disparity, 2D control conditions the surfaces were presented in the fixation plane. This was obtained by presenting the left or right monocular images of the stereo conditions to both eyes in turn: curved monocular (CM) and flat monocular (FM) conditions. A fifth, fixation (fix) condition in which only the fixation cross was presented on a dark background, provided a baseline.
The four stimulus conditions (CS, CM, FS, FM) and the fixation condition were presented in a block design and repeated 4 times in a fixed order within a single time series or run. Each block lasted 24 s and consisted of 8 functional images, yielding 160 functional volumes per time series. Within a stimulus block 24 different stimuli were randomly presented for 1 s without an interstimulus gap. Thus all 48 stimuli per condition were divided randomly into 2 blocks and repeated once within a given time series. Presentation order of the conditions was randomized between the time series. Twelve time series were recorded in each subject, divided over two sessions. Sensitivity to 3D shape from binocular disparity was assessed by the interaction between stereo and curvedness (stereo interaction): (CS − CM) − (FS − FM). Sensitivity to binocular disparity (main effect of disparity or stereo) was obtained by the contrast (CS + FS) − (CM + FM). The sensitivity to the visual stimuli used in the main experiment was assessed by the contrast (CS + CM + FS + FM) − 4Fix.
Control experiment: high-acuity task (n = 6)
To validate the data from the main experiment we performed an additional control experiment with two purposes: (1) to minimize the vergence eye movements and (2) to control for variations in attention among the different conditions that may have influenced the results of the main experiment. In the main experiment, subjects were passive and simply fixated the cross in the middle of the screen. Six of these subjects (1, 3, 19, 20, 21, 22 in Table 1) were scanned while performing a high-acuity task (Vanduffel et al., 2001) during which the stimuli used in the main experiment were presented. They were required to interrupt a light beam with their right thumb when a small yellow bar, presented in the middle of the screen changed orientation (for 1 s) from horizontal to vertical at random intervals between 2 and 10 s. Psychophysical tests performed in the scanner indicated that performance level (percentage correct detection of orientation change) decreased and reaction time increased with decreasing bar size, suggesting that these are sensitive indicators of the subjects' attentional state (Sawamura et al., 2005). Before the scanning session subjects performed a psychophysical test in the scanner in which the level of performance at different bar sizes was tested. The bar size that corresponded to 80% correct detection was selected for each subject, whose individual values ranged from 0.11 × 0.04° to 0.19 × 0.08°. Six time series were tested in a single scan session for each subject.
Retinotopic mapping
The retinotopic stimuli consisted of clockwise rotating wedges and expanding rings with one full cycle completed in 32 steps per 64 s. The eccentricity of the stimuli ranged from 0.25 to 7.75°. We used monochromatic, 6 Hz counter phasing checkerboard stimuli [16/(2π) azimuthal spatial frequency]. The radial size and spatial frequencies of the eccentricity rings were scaled according to a log(r) law, as was the radial extent of the checkerboards within the eccentricity rings and polar angle wedges. The sizes of both the polar angle wedge in the azimuthal direction and the expanding circles in the radial direction were designed to illuminate points on the screen for 8 s. In middle level areas this procedure optimizes the time between two concurrent activations during which the response can return to the baseline before being activated in the following cycle, which is determined by the receptive field size of the local neurons. Each time series/run consisted of four cycles lasting 256 s. Each experiment for examining eccentricity and polar angle consisted of four runs, yielding a total of 8 runs collected in a single session. These experiments were performed in subjects 1–5 (Table 1). We performed test and retest measurements of the retinotopy experiments in subjects 1 and 2. To calibrate the color scale for polar angle distribution we performed an independent test presenting fixed stimuli of the vertical (±6° polar angle) and horizontal (±3° polar angle) meridians filled with the same checkerboard design. A comparison of the location of the vertical versus horizontal activation provided a calibration of the color wheel for the primary visual areas. A possible second-order systematic error due to a difference in lag of the hemodynamic response function for lower and higher visual areas is small compared with the systematic error from averaging clockwise and anticlockwise runs due to receptive field size and voxel size.
2D shape and motion sensitivity
To assess the 2D shape sensitivity, all 22 participants performed a passive fixation task during the presentation of 2D shapes. The standard (Kourtzi and Kanwisher, 2000; Denys et al., 2004) 2D shape localizer included four conditions: grayscale images and line drawings (12 × 12°) of familiar and nonfamiliar objects as well as scrambled versions of each set. The contrast “intact − scrambled 2D shapes” was used to reveal 2D shape sensitivity. For the 2D shape localizer we tested three time series.
For localizing the motion-sensitive areas we used the result (random effect analyses) from an additional experiment performed with 40 subjects (including 7 of the 22 subjects from the present study). These localizer scans contrasted a moving with a static random texture pattern (7° diameter) (Sunaert et al., 1999) in two time series.
Psychophysical experiments: 3D shape adjustment task (n = 38)
Thirty-eight subjects performed a shape adjustment task adapted from previous psychophysical investigations by Koenderink et al. (2001) and Todd et al. (2004), and used in our previous imaging study (Georgieva et al., 2008). On each trial, a single image of a 3D shape defined by binocular disparity was presented together with five equally spaced dots positioned along a vertical scan line through the center of the depicted surface. An identical vertical line of dots was also presented below the 3D shape on the same display screen. These latter dots, presented in the fixation plane, could be moved independently in a horizontal direction with a handheld mouse. Observers were instructed to adjust the dots of the second line to match the apparent depth profile of the surface along the designated scan line. Once they were satisfied with their settings, observers initiated a new trial by pressing the return key on the computer keyboard. All participants wore red/green filter stereo glasses and their head movements were restricted using a chin rest.
Two of the 4 shape outlines used in the imaging were presented in combination with all possible profiles (n = 12) (Fig. 2B) and 3 different disparity amplitude ranges (±0.23, ±0.5 and ±0.7), yielding 72 (2 × 3 × 12) shapes in total. The middle amplitude range corresponds to that used in the fMRI experiment. The 3D shapes with different disparity amplitudes were presented in separate blocks. Each shape appeared once in a block, in a random order, for only 300 ms to prevent possible disparity-driven eye vergence movements. Subjects were able to redisplay the same shape as many times as they wanted before completing the profile adjustment. To familiarize the observers with the task and to provide some training, subjects performed the task once for each amplitude, with the 3D shapes present for an unrestricted time. After the training trials, two blocks of each amplitude were presented in a random order, with a total of 144 scan lines per subject. These two blocks were considered in the analyses. The adjusted profiles were compared with the actual curvature using a simple linear correlation. Correlation coefficients and slopes of the regression were calculated for each subject and for each of the 3 amplitudes (Table 1). The correlation served as a criterion of the quality of subject's stereovision and the slope provides a measure of the perceived depth amplitude. The distributions of the correlations averaged over the 3 amplitudes are plotted in Figure 1A for all 38 subjects tested. Given the distribution of correlations obtained, those subjects with an average correlation exceeding 85% (Fig. 1A, arrow) were considered to have acceptable stereovision and the first 22 to achieve this criterion were allowed to participate in the imaging part of the study.
Imaging data collection
Data were collected with a 3T MR scanner (Achieva, Philips Medical Systems). The functional images consisted of gradient-echoplanar whole brain images with 50 horizontal slices (2.5 mm slice thickness; 0.25 mm gap), acquired every 3.0 s (TR), echo time (TE) 30 ms, flip angle 90°, 80 × 80 acquisition matrix (2.5 × 2.5 mm in-plane resolution), with a SENSE reduction factor of 2. A 3D high-resolution T1-weighted image covering the entire brain was acquired for each subject (TE/TR 4.6/9.7 ms; inversion time (TI) 900 ms, slice thickness 1.2 mm; 256 × 256 matrix; 182 coronal slices; SENSE reduction factor 2.5). The scanning sessions lasted for 90 min, including shimming and anatomical and functional imaging. For the retinotopic session scanning parameters were adjusted as follows: 36 tilted coronal slices (2 mm thickness, 0.2 mm gap), TR of 2.0 s, 96 × 96 acquisition matrix (2 × 2 mm in-plane resolution), and SENSE factor of 2.5.
A total of 58,032 functional volumes were acquired in this study: 42,240 volumes in 22 subjects for the main experiment, 5760 volumes in 6 subjects for the control experiment (high-acuity task), 10,032 volumes in 22 subjects for the 2D shape localizer, and 5120 volumes in 5 subjects for the retinotopy.
Imaging data analysis
Main experiment.
Image processing was performed using Statistical Parametric Mapping Software (SPM2, http://www.fil.ion.ucl.ac.uk/spm, Wellcome Department of Cognitive Neurology), implemented in MATLAB (The MathWorks). The preprocessing steps included the standard SPM procedure: realignment, coregistration of the anatomical images to the functional scans, and spatial normalization into the standard space of the Montreal Neurological Institute (MNI). The data were subsampled in the normalization step to 2 × 2 × 2 mm and spatially smoothed with an isotropic Gaussian kernel of 6 mm (8 mm for the 2D shape localizer) before the statistical analysis.
Statistical analyses were performed at two levels: a fixed-effects analysis using the general linear model (GLM) for single subjects followed by a random-effects analysis for the group. For every subject, the design matrix was composed of five regressors modeling the five conditions plus six regressors obtained from the motion correction in the realignment preprocessing step. The latter were included to account for voxel intensity variations due to head movement. In the group analysis, the 3D shape from stereo activation was obtained by the interaction (CS − CM) − (FS − FM) and the main effect of stereo by the contrast (CS + FS) − (CM + FM). The contrast images corresponding to these two subtractions were calculated for each of the 22 subjects. These contrast images were then entered into a random-effects group analysis (second-level analysis) using a simple t test model (Henson and Penny, 2003). The result of this analysis for the main effect was used as an inclusive mask at p < 0.05 uncorrected in the second level interaction effect analysis. The aim of this masking is to reduce the interaction effect arising from strong MR activity in the flat monocular condition, emphasizing the effect of a large activity in the curved stereo condition. The statistical results for the stereo interaction and the stereo main effect were thresholded at p < 0.001 uncorrected. This test is more stringent for the interactions than the main effect, since an interaction is a difference of differences and thus includes two variances in the statistical calculations compared with only one for a main effect. For both the interaction and the main effect the p < 0.001 uncorrected level was more stringent than the false discovery rate (FDR) correction for multiple comparisons of 0.05 (Genovese et al., 2002). For the interaction the p < 0.001 uncorrected corresponded to a T-score of 3.53, compared with 2.6 for the FDR. For the main effect these T-scores were 3.53 and 2.8, respectively. The FDR correction ensures that on average no more than 5% of the activated voxels are expected to be false positive results for a given contrast. For control purposes, a fixed-effects analysis was performed for the group of five subjects, in which the retinotopic organization was mapped. The threshold for the stereo interaction was set at p < 0.001 uncorrected.
In the single-subject analysis, the 3D shape from stereo activation was obtained by the interaction (CS − CM) − (FS − FM), the main effect of stereo by the contrast (CS + FS) − (CM + FM) and the sensitivity to the visual stimuli by the contrast (CS + CM + FS + FM) − 4Fix. In this analysis the interaction was again masked inclusively by the main effect at p < 0.05 uncorrected. The same thresholds were applied to these subtractions as for the group analysis. We used these single-subject analyses to calculate the number of single subjects for whom a given depth structure-sensitive region, as revealed by the random effect group analysis (Table 2), was present. We considered such an activation site to be present in a single subject if the difference between the MNI coordinates of the group and subject activation sites did not exceed three voxels (i.e., 6 mm) in any direction for any coordinate. In a number of subjects (1–5) the non-normalized T-score volumes, corresponding to the individual subtractions, were projected onto the flattened hemispheres (see below). The volumes were either 6 mm smoothed as in the original analysis, or minimally smoothed (by 1 voxel of reconstruction space) for comparison with the retinotopic maps.
Relationship between psychophysical and fMRI response in main experiment.
We conducted a linear regression analysis to search for areas in which intersubject differences in the magnitude of fMRI activation in the main experiment were correlated with intersubject differences in magnitude of perceived relief, as measured by the slope of the linear regression obtained in the 3D adjustment tasks (Table 1). In this analysis, we used the SPM2 simple regression model with the contrast images of the interaction and the psychophysical slope as covariate. We used only the slopes defined from those blocks of the psychophysical experiment where 3D shapes were presented with a disparity of ±0.5°, as in the fMRI experiment. The result was masked inclusively by the main effect of stereo (p < 0.05, uncorrected in random effect) to restrict the correlation to voxels processing stereo information and thresholded at p < 0.001 uncorrected for multiple comparisons. Within the local maxima of this SPM regression analysis, we then examined the relationship across subjects between the fMRI activity and the slope values from the psychophysics.
Control experiment.
Preprocessing and statistical analysis followed the same line as the main experiment but instead of a second-level random-effects analysis, we performed a fixed-effects analysis. To equalize the number of time series the six time series collected in the high-acuity task were compared with 6 time series randomly selected out of the 12 obtained for each of the 6 subjects in the main experiment. The data from each subject were divided into two groups separating the runs with and without the high-acuity task. For both sets of data we computed the contrasts for the interaction and main effect of disparity as done for the main experiment. To ensure that the brain regions reflected 3D shape-from-disparity-driven activation, we again inclusively masked the interaction by the stereo main effect at p < 0.05 uncorrected, that is, [(CS − CM) − (FS − FM)] inclusively masked by [(CS + FS) − (CM + FM)].
The same thresholds as in the main experiment (p < 0.001 uncorrected) were used for the control experiment. The p < 0.001 uncorrected level corresponds to a T-score of 3.09 for the interaction compared with 3.3 and 2.6 for the FDR level in passive and active conditions, respectively. The local maxima in the T-score maps of the interaction were attributed to the activation sites of the main experiment (Table 2) in the same manner as for single subjects.
Analysis of retinotopic data.
FREESURFER tools (Fischl et al., 1999) were used for segmentation and registration of the EPI volumes to the anatomical volumes. Each voxel in the functional volumes was analyzed for phase-shift information, which is related to degrees of eccentricity or polar angle. Here, we found that averaging several runs into one data set gave better results than normalizing and concatenating the runs. Consequently, all runs of the eccentricity or polar angle experiments acquired within a given session were averaged into a single run of length 128 time points. Maps of the phase-shift information and corresponding p values were then calculated and registered to the anatomical volumes. The phase-shift data were projected on several surfaces that represent cortical layers parallel to the pial surface. We chose layers at distances of 0.25–0.75 times the local gray matter thickness below the pial surface in steps of 0.25. The resulting maps that arose from the different cortical depths were averaged across the cortical thickness and painted onto the inflated and flattened surfaces. Further smoothing with a kernel size of 1 voxel was applied on the surface during the painting process. An uncorrected p value of 10−3 was used as a threshold for all experiments. The eccentricity and polar angle maps have two probabilities attached to each voxel, related to the real and imaginary components of the phase angle. In a first step we require the combined probabilities of each map to pass the threshold in the two maps. In a second step we require the individual probabilities to exceed the threshold in a single map. The analysis of the primary visual areas V1–V3, which form a sequence of quarterfield representations, involves the field sign map and is described by Sereno et al. (1995). In V1–V3, the locations of the representations of meridians coincide with the borders of regions of opposite field sign. For hemifield representations, however, the vertical meridian coincides with the border between regions in the field sign map and the horizontal meridian represents the centroid of a region. Hence, the field sign map is less useful for identification of areas beyond V1–V3. For these areas we exploited the combined information from the polar and eccentricity angles to identify areas. Generally, we identified possible foveal representations and associated circular or semicircular gradients in the eccentricity maps and looked for convergence of meridians in the polar angle maps which, combined, would result in a hemifield representation. We further used midperipheral eccentricity values to define borders between hemifield representations, which join at the peripheral borders, so-called eccentricity ridges (e.g., between V3A and V3B).
Data visualization
Standard (local) activity profiles.
The raw MRI data were converted to percentage signal change, plotting the response profiles relative to fixation condition. In the standard procedures, activity profiles were calculated for small volumes of 27 voxels centered on the local maxima. For data analyzed with random-effects analysis, profiles from the single subject were averaged and SEs were calculated. For data analyzed with fixed-effects analysis, activity profiles were obtained directly from the group analysis.
Group activation flat maps.
The fMRI data (T-score maps) were mapped onto the human Population-Averaged, Landmark- and Surface-based (PALS) atlas surface in SPM2-MNI space using the Caret software package (Van Essen et al., 2001; Van Essen, 2005). Caret software and the PALS atlas are available at http://brainmap.wustl.edu/caret and http://sumsdb.wustl.edu:8081/sums/directory.do?id=636032. Retinotopic borders (Van Essen, 2004) of areas V1, V2, and V3 were superimposed onto the flat maps. For illustration purposes the voxels exceeding the FDR level (t = 2.6) in the interaction were color coded. For the main effect of stereo and the 2D shape localizer the p < 0.0001 uncorrected (t = 4.5) level was used to generate a contour of the activation peaks. To isolate the peaks of the T-score map of the motion localizer, corresponding to V3A and MT/V5+, this map was thresholded at a very high level (t = 9).
Single-subject activation and backprojection flat maps.
To analyze the individual activation maps, their T-score volumes were imported into FREESURFER and registered to the anatomical volumes. As done for the retinotopic maps, the T-score data were projected on several surfaces that represent cortical layers parallel to the pial surface. Here, we chose layers at distances of 0.1–0.9 times the local gray matter thickness below the pial surface in steps of 0.1. Next, the maximum T-score value was determined along the normal vector to the pial surface projected on the flattened surface. As for the group maps, voxels exceeding the FDR level (t = 2.8) for interaction were color coded, as were those exceeding p < 0.0001 uncorrected (t = 3.75) for the main effect of stereo, the sensitivity to the visual stimuli, and the 2D shape localizer.
To identify the areas that contribute to the local maxima found in the group analysis, we calculated back-projection maps. These maps show those areas A on the flat map, whose activation, which was spread out by smoothing, would be sampled at the location of a local maximum S above a certain threshold (Fig. 3A). To calculate the back-projected maps a data volume was created with same dimensions and registration as the EPI data volumes. The value of the voxel that represents the location of a local maximum, was set to 1, all others were set to 0, and the result was smoothed with the same kernel as the original data. This procedure creates a 3-dimensional sampling function for each local maximum, as indicated in Figure 3B, yielding a weight factor for each activation value that contributed to the total activation at the sampling point. Final maps were derived by projecting the sampling volumes on the flattened maps as described above for the T-scores and thresholding the volumes at different percentages of the maximum (Fig. 3C). The 10% contour was generally used to define the backprojection region of a group site (Fig. 3D).
Definition of DIPSM, DIPSA, and phAIP.
The outline of the three areas in the left and the right parietal cortex were defined (Jastorff et al., 2007) from a meta-analysis of local maxima derived from previous studies. The outline is the confidence interval of all local maxima attributed to that area in previous studies. The mean across all local maxima served as the center of the outline. DIPSM and DIPSA were defined on the basis of the following papers: Claeys et al. (2003), Denys et al. (2004), and Orban et al. (1999, 2003, 2006). The outline of phAIP was defined on the basis of several publications: Begliomini et al. (2007), Binkofski et al. (1999a,b), Cavina-Pratesi et al. (2007), Culham et al. (2003), Frey et al. (2005), and Króliczak et al. (2007).
Susceptibility artifacts
One limitation of the fMRI technique is the loss of signal in brain structures adjacent to regions having markedly different magnetic susceptibilities (Frahm et al., 1988), such as bone and air sinuses. This attenuation leads to a decrease in the ability of fMRI to detect signal changes. These effects are localized mainly in the inferior frontal regions and the inferior lateral temporal lobe (including part of fusiform and inferior temporal gyri) bilaterally (Ojemann et al., 1997).
In an additional analysis, we investigated the susceptibility-induced signal loss in the temporal lobe. For this purpose the anatomical locations and spatial extent of the susceptibility artifacts were defined for each subject and superimposed on the PALS flattened representations of left and right hemispheres (Fig. 1B). To quantify the magnitude of the signal loss in the right hemisphere, we defined two equal-sized ROIs: one near the artifacts with center of gravity 48, −39, −22 and a second, control ROI near the posterior part of the LOC with center of gravity 46, −68, −7. Finally the average fMRI signal intensity was calculated within these ROIs for each of the participants and expressed as percentage of the average signal intensity over the entire brain.
Results
Behavioral performance
For all 22 subjects there was a strong correlation between the adjustments of perceived depth profiles and the true depth profiles along the 3D test surfaces (Table 1, subjects 1–22). For all subjects the average correlation exceeded 0.85, and in all but 3 subjects the correlations for the individual amplitudes exceeded 0.9. For these subjects the slope was generally smaller than unity indicating that subjects underestimated the depth amplitude of the 3D surfaces. For example, at the amplitude used in the scanner (±0.5°) the slope of the regression averaged 0.71, indicating that subjects perceived only two thirds of the actual depth variation. In addition, the tendency to underestimate the depth variation increased with the amplitude of the depth profiles. It is interesting to note that subjects who were not selected because of low correlation (Table 1, subjects d–p) also perceived less depth modulation in the 3D surfaces.
All 22 subjects fixated well during the main experiment. On average subjects made 6.2 saccades per block (SD 0.8) and there were no significant differences between conditions (one-way ANOVA p > 0.10).
The mean percentage correct and reaction time for the acuity task in the scanner (n = 6) were 73% (SD 3.5%) and 651 ms (SD 26 ms), respectively. These values were not significantly different among conditions (one-way ANOVA p>0.89 and p > 0.61, respectively). Subjects made few saccades per block of the control experiment, averaging 4.3 (SD 0.7) saccades. These numbers did not differ significantly (one-way ANOVA) among conditions (p > 0.83).
Main experiment: group analysis
The interaction between disparity and order of disparity, or curvedness in depth, reached significance in occipital, temporal, parietal, and premotor regions bilaterally. Figure 4 shows the regions active for this interaction, displayed on the rendered brain and in the flat map of the PALS atlas. The local maxima are indicated by brown dots and numbers in Figure 4B and correspond to those in Table 2. Occipital regions include a site in right lingual gyrus, which might correspond to ventral V3 in right hemisphere. This was confirmed by the retinotopic mapping, but a contribution from V2v could not be excluded (see below). Two pairs of local maxima are located in symmetrical occipital regions of both hemispheres. The two sites (2 and 3) in the middle occipital gyrus (MOG) might correspond to V3B, according the coordinates of Larsson and Heeger (2006) while the two sites near the transverse sulcus (4 and 5) might correspond to V3A, in keeping with their motion sensitivity (Tootell et al., 1997). Retinotopic mapping, detailed below, suggests these sites all belong to the V3A complex, labeled V3A*, the organization of which might be more complicated than was initially suggested (for review, see Wandell et al., 2007). The temporal activation is bilaterally in posterior ITG, extending to middle ITG in the left hemisphere. Parietal activation sites include bilateral DIPSM (sites 13 and 14) and DIPSA (sites 15 and 16), as predicted, but also VIPS and POIPS (Orban et al., 1999) bilaterally, and what is referred to as the putative homolog of AIP (phAIP) (Binkofski et al., 1999a,b) bilaterally (sites 17 and 18). The retinotopic mapping (see below) indicates that VIPS corresponds to either V7 (Tootell et al., 1998) in the posterior bank of the occipital part of IPS or a neighboring area in the depth of the IPS, hence the label VIPS/V7*. Finally, in the premotor cortex there are two pairs of bilateral activation sites, a dorsal pair (sites 19 and 20) and a ventral pair (vPrCS, sites 21 and 22) in the posterior bank of the precentral sulcus. Notice that although the activation pattern is bilateral, significance levels are slightly stronger in the left hemisphere (Table 2).
Main experiment: single-subject analysis
Also indicated in Table 2 is the number of single subjects in whom a local maximum was present for each of the activation sites. The interaction sites were present in 50–80% of the subjects, with the exception of right V3v, left mid ITG, both dorsal precentral sulcus regions, and left ventral precentral sulcus region.
To identify the occipital regions of activation, we performed retinotopic mapping in five subjects (1–5 in Table 1). We performed a control fixed-effects analysis of the stereo interaction in these five subjects to ascertain that they were representative of the group of 22 subjects. The activation pattern of the stereo interaction in these five subjects (supplemental Fig. S1, available at www.jneurosci.org as supplemental material) is indeed very similar to that of the entire sample (Fig. 4). Figure 5 shows the polar and eccentricity maps of the two hemispheres of subject 4. The maps included all major regions described so far, including V4 and LO1/LO2 (Wandell et al., 2007), with two exceptions. In every hemisphere we find a map of MT/V5 defined by a representation of central vision distinct from that of V1–V4, with eccentricity increasing dorsally, and a polar map with a lower vertical meridian posterior and an upper vertical meridian anterior, exactly as described recently in the monkey (H. Kolster, L. B. Ekstrom, J. Arsenault, J. B. Mandeville, L. L. Wald, and W. Vanduffel, unpublished work). Second, the organization of the V3A complex (V3A*) is more complicated than initially described. Rather than including a more posterior/dorsal part V3A and a more anterior/ventral part V3B joined by a common central representation (Wandell et al., 2007), it generally consists (8/10 cases) of four components which we tentatively refer to as V3A, V3B, V3C, and V3D. The dorsal areas of the complex, V3C/D, abut V7. The detailed description of the retinotopic organization will be presented in a later publication, but it is worthwhile to point out the importance of the eccentricity maps for this description. These eccentricity maps reveal landmarks of organization, not only central or paracentral visual representations (Wandell et al., 2007) but also eccentricity ridges (Fig. 5, purple). Such eccentricity ridges segregate V4 from the VO complex, MT/V5 from LO2 (Fig. 5B), but also V3A from V3B (Fig. 5D; supplemental Fig. S2, available at www.jneurosci.org as supplemental material) and in some instances V3D from V3 (Fig. 5B). Critical to the segregation of the V3A* into four parts was the observation that there is a central representation in the posterior bank of occipital IPS, separate from V3B and V7, which is shared by V3C and V3D (Fig. 5B,D). The relative location of V3C and V3D is variable across hemispheres explaining that in some hemispheres upper and lower fields are segregated over most of the extent of the V3A* (Fig. 5C; supplemental Fig. S2C, available at www.jneurosci.org as supplemental material) but not in others (right hemispheres in Fig. 5A and supplemental Fig. S2A, available at www.jneurosci.org as supplemental material). The retinotopic distinctions are further supported by the sensitivity to the visual stimuli of the main experiment and the activity in the 2D shape localizer (supplemental Fig. S3, available at www.jneurosci.org as supplemental material).
Despite the fact that the present technique reveals a more detailed retinotopic organization than has been described so far, we observe only patches of organization in the polar or eccentricity maps at the level of the IPS, beyond V7 (Fig. 5; supplemental Fig. S2, available at www.jneurosci.org as supplemental material). These patches were too inconsistent across hemispheres to allow a systematic description. Yet they suggest that multiple regions may exist in the IPS, probably >4 (Swisher et al., 2007; Konen and Kastner, 2008; Saygin and Sereno, 2008). In particular just beyond V7, where IPS1 has been localized, we obtained evidence scattered across subjects for the existence of 2 or 3 retinotopic regions (Fig. 5; supplemental Fig. S2, available at www.jneurosci.org as supplemental material).
Projection of the stereo interaction T-score maps onto the retinotopic maps of subject 4 (Fig. 6) revealed bilateral activation of central V3A and V3B and of V3C as well as left-sided activation of V3D and V7 in the stereo interaction. Several sites in IPS and ITS were also activated. Further examples of activation of the different parts of the V3A complex can be observed in Figure 7, A and C. These comparisons in the five subjects, for whom retinotopic maps were available, indicate that the different parts of the V3A complex are involved in the processing of depth structure in more than half the hemispheres, as are ventral V2 and V3, IPS regions beyond V7, and regions near the ITS/ITG. V7 was activated in only four hemispheres, but in an equal number of hemispheres an area just in front of V7, in the depth of the IPS, was activated.
We also used the retinotopic maps to back project the location of the group activation sites onto the maps (see Materials and Methods and Fig. 3). Depending on the local anatomy the back projection includes a single or multiple regions in opposing banks of a sulcus or in opposite flanks of a narrow gyrus. Which of these regions contributed in a given hemisphere can be determined by the overlap with the activation map in that hemisphere. For example the VIPS site in right hemisphere (site 10) of subject 5 could potentially arise from the two banks of IPS and even from the POS, yet the activation in this subject was restricted to the posterior bank of IPS, overlapping with V7, which is thus the region of this hemisphere which contributed to the group VIPS site (Fig. 7B,D). In 9 of the 10 hemispheres the VIPS (sites 9, 10) back projection included V7, which in four of the hemispheres overlapped with an activation, as in the right hemisphere of subject 5 (Fig. 7C). The only exception was a region just in front of V7 in the depth of the IPS, which in 3 hemispheres overlapped with an activation, as illustrated by the left hemisphere of subject 1 (Fig. 7A). In fact the retinotopic maps (Fig. 5; supplemental Fig. S2, available at www.jneurosci.org as supplemental material) suggest the presence at that location of a retinotopic region with the same separation of upper and lower field as V7, but with a reverse eccentricity gradient. Thus the VIPS sites (Table 2, sites 9 and 10) correspond to V7 and in some instances to a neighboring region, hence we refer to these sites as VIPS/V7*. The sites in the transverse sulcus (sites 2 and 3) and middle occipital gyrus/bottom IPS (sites 4 and 5) could be attributed in the vast majority of cases to the V3A complex with a tendency for the MOG sites to correspond to V3B or V3C and the transverse sulcus sites to V3A, V3B, or V3C (Fig. 7). Given the small number of hemispheres studied thus far, we can only be confident about the more general identification, the V3A*, which is used in Table 2. The majority of back projection regions of the V3v site included V3v. Unfortunately, none of these regions overlapped with an activation. Therefore, we tentatively kept the identification of this site as V3v.
Main experiment: additional group analysis
The group analysis indicates that the main effect of stereo is most significant in early visual areas V1–V3 and the V3A complex, but extends forwards along the IPS and ITG (Fig. 4B, black outline). As suggested by these group results, the main effect of stereo was significant in single hemispheres most frequently in the early visual areas V1–V3, in LO1 and LO2, the V3A complex, V7, and the middle of horizontal segment of IPS (Fig. 6). Yet, the large majority of the stereo interaction sites have a significant main effect of disparity (Table 2), and, because of the masking used in the analysis, the main effect in all of them reaches at least the p < 0.05 uncorrected level.
The activity profiles of the interaction sites confirm that the interaction reflects a strong MR activity in the curved stereo condition rather than strong activity in the monocular flat condition (Fig. 8). Those regions in which the main effect does not reach significance usually have equal MR activity in the flat stereo conditions and the two monocular control conditions. Thus, the interaction really identifies regions in which the MR activity is stronger in the curved than in the flat stereo condition and in which this difference cannot be accounted for by the monocular difference, exactly as intended in the design of the experiments.
Most of the significant stereo interaction sites also exhibit 2D shape sensitivity, as indicated by the blue outlines in Figure 4B and the T-scores in Table 2. Exceptions are right V3, bilateral dorsal precentral sulcus regions and the left ventral precentral sulcus region. Thus most regions listed in Table 2 are in fact sensitive to 3D shape from stereo, since they are sensitive to both depth structure from stereo and 2D shape. On the other hand, the group analysis (Fig. 4B, blue outlines) and the single-subject analysis (supplemental Fig. S3, available at www.jneurosci.org as supplemental material) show that the activation in the 2D shape localizer is much stronger in the ventral regions than in the parietal regions in agreement with earlier studies (Malach et al., 1995; Kourtzi and Kanwisher, 2000; Denys et al., 2004).
Signal loss in the temporal cortex
In Figure 4B the hatched region indicates the union of the regions in individual subjects where the MR signal falls to zero (Fig. 1B). The activation by the 2D shape localizer (Fig. 4B, blue outlines) suggests that the LOC (Kourtzi and Kanwisher, 2000; Denys et al., 2004) extends all the way to this zero-signal region. Yet, the stereo interaction in the main experiment extends further forward in the left hemisphere (Fig. 4B, site 6) than the right. To investigate whether this might be related to signal loss, we defined a square region of interest in the right hemisphere (Fig. 9A, blue square), centered on the symmetrical position of the left mid ITG site (see Materials and Methods). For comparison we defined a second ROI near the posterior part of LOC where signals are strong (Fig. 9A, green square) (see Materials and Methods). While the MR signals in the latter ROI exceeded 80% of the average intensity in all subjects, the variation across subjects was much larger in the ROI neighboring the zero-signal region (Fig. 9B). Thus it is conceivable that signal loss in some subjects prevents the interaction from reaching significance in those individual subjects, and hence in the random-effects analysis of the group also. Therefore, we divided the subjects into three groups of six members each: “best,” “worst,” and “middle.” In the best group (subjects 2, 10, 11, 15, 17, 22) fMRI signal intensity was at least 88% of the average brain intensity; in the worst group (subjects 1, 5, 7, 8, 12, 21), fMRI signal intensity was <80% and in the middle group (subjects 3, 6, 9, 18, 19, 20) signals ranged between 81 and 86% (Fig. 9B, red, blue, and green bars, respectively).
As one would predict if signal intensity does indeed limit significance in the stereo interaction, the interaction reached significance in the six best subjects at a site (54, −46, −14, t = 2.81) (Fig. 10A, red dot) symmetrical to the left mid-ITG activation in the random effect analysis of the complete group (Table 2). The activity profile confirms the presence of a clear interaction (Fig. 10B). The interaction at this site was not significant in either of the other two groups. Since the local maxima might be in slightly different positions in different subjects, we searched for local maxima in the blue square ROI for the middle and worst groups. The middle group yielded a local maximum at 46, −36, −26 (Fig. 10A, green dot), which although not significant still exhibited a trend toward interaction (Fig. 10B). For the worst group, the highest T-score was only 1.29 (Fig. 10A, blue dot), but the profile averaged over the 27 voxels yielded no interaction (Fig. 10B). This indicates that the absence of interaction reflects signal loss and not individual variability.
When applying the same procedure to the 2D shape-sensitivity in the LOC we observed no effect of the signal loss. In fact, the activation by the 2D shape localizer was basically very similar for the 3 groups (Fig. 10C, white area). Since the edge of the LOC traversed the blue ROI diagonally, no local maximum was present in the ROI. Using the most significant voxel for the best group (46, −40, −24) (Fig. 10C, black dot) actually yielded stronger activation in the middle and worst group (Fig. 10D). Thus the modest signal loss due to susceptibility artifacts had no effect on the LOC activation, confirming the results shown in Figure 4, yet this signal loss had a strong effect on the stereo interaction. This dissociation between the stereo interaction and 2D shape localizer was also observed at the single-subject level. Subjects 1 and 5 belonged to the worst group defined above and in both subjects the back projections of the sites with interactions in the best group yielded regions in the ITS ∼5–10 mm from the bottom of the flat map (white triangle). In this region no stereo interactions were observed (Fig. 7C), but 2D shape activation was still very significant (supplemental Fig. S3B,D, available at www.jneurosci.org as supplemental material).
Control experiment: high-acuity task
The six subjects who performed the control task displayed the same pattern of stereo interaction as the group in the main experiment, although some parietal and premotor interaction sites reached significance only in the right hemisphere (Table 3). This is not surprising, since we used only half as many runs in this analysis compared with the group analysis of the main effect. These activations were maintained when these subjects performed the high-acuity task. If anything, the interactions were more significant (Table 3), perhaps because of better control of fixation. The activity profiles (Fig. 11) confirm that interactions were maintained during the task, despite some decrease of signal level (except in DIPSM) (Denys et al., 2004), probably reflecting withdrawal of attention. Figure 11 shows that in the two regions where the interaction only reached FDR level in these six subjects (left V3A* in the passive task and right vPrCS in the high-acuity task), the activity during the CS conditions was still clearly stronger than in the other 3 conditions, indicating the presence of an interaction.
Relationship with perception
In two sites the stereo interaction correlated with the perceived depth amplitude in the 3D surfaces, estimated from the slope of the relationship between the subjects' settings and the actual curvatures in the 3D shape adjustment task. One region (−46, −78, 10, T-score 4.95) was located in the MOG of left hemisphere, the other (−22, −90, 28, T-score 4.22) was located on the border between V3A* and VIPS/V7*, also in the left hemisphere. In neither of these regions did the interaction as such reach significance: T-score was 1.97 in the MOG region and 2.25 in the VIPS/V3A* region and the profile shows a weak average interaction (Fig. 12). Hence, they appear to be located at the edge of those regions with significant interaction in Figure 4B (yellow dots). Back projection indicated that the MOG site corresponded to the anterior edge of the ventral V3A complex and the VIPS/V3A* site to the dorsal edge of V3A* (V3D). In both regions, the MR signal of the interaction correlated significantly (for both r = 0.64) with the psychophysical slope: the large variation in slope, i.e., in perception, explains about half of the variation in the MR signal among subjects (Fig. 12). These results suggest that the occipital and occipitoparietal regions displaying an interaction, particularly the V3A* complex, are more extensive in subjects who perceive strong depth modulations in the stimuli.
Discussion
Our results show that three occipital regions, five IPS regions, an inferior temporal region and two premotor regions process depth structure from stereo. Control experiments showed that these results did not depend on differences in attention or fixation quality between conditions. Finally, activity in occipital cortex and ventral IPS, at the edges of the V3A complex correlated with the amplitude of the depth variation that subjects perceived in the stimuli.
Comparison with earlier studies
While several studies have studied the processing of zero-order disparity stimuli (Backus et al., 2001; Neri et al., 2004), stereo checkerboards (Tsao et al., 2003), and slanted stereo surfaces (Welchman et al., 2005), only one study has so far addressed the issue of 3D shape from disparity processing (Chandrasekaran et al., 2007). In this study, subjects judged the axis of symmetry of the objects, a task for which they could use both the 2D shape of the outline as well as the depth structure of the 3D surface. Both hMT+ and LOC were found to correlate with discrimination when stereo coherence was manipulated. Even if the LOC ROI was relatively similar to the post-ITG region in our study and TEs neurons can be selective for the orientation of curvature in depth (Janssen et al., 2001), it is difficult to conclude that this activation reflects 3D shape processing. Indeed, it could as well reflect the processing of the contour, which also depended on the disparity coherence. Inferotemporal (IT) neurons are known to process contours from disparity (Tanaka et al., 2001). This factor may explain the hMT+ contribution in their study, while hMT/V5+ displayed no significant stereo interaction in the present investigation.
Comparison with monkey studies
In humans the extraction of 3D shape from stereo activates a larger set of regions than in the monkey, a distinction reminiscent of what we have observed for structure from motion (Vanduffel et al., 2002). Yet the basic plan is similar in the two species: 3D shape from stereo involves occipital, parietal, occipitotemporal, and premotor regions in both species.
The small ventral premotor site, which is also sensitive to 2D shape, might be part of the homolog of monkey F5a, an area known to be activated by observation of action (Nelissen et al., 2005). This view is consistent with the extent of the ventral premotor activation by the observation of actions in humans (Jastorff et al., 2007). It implies that canonical neurons are housed in this small vPrCS region in humans, in agreement with the study of Grèzes et al. (2003).
From the monkey data (Durand et al., 2007) we expected two parietal activation sites, DIPSM and DIPSA. These regions did indeed display significant interactions in humans, supporting the homology between DIPSM and anterior LIP and between DIPSA and posterior AIP (Orban et al., 2006). Yet we obtained three additional sites. The most anterior of these can be identified as phAIP, since its maxima in both hemispheres fall within the confidence limit of maxima derived from earlier studies identifying this area through the execution of grasping movements (Fig. 13). Our results thus suggest that together DIPSA and phAIP constitute the human homolog of monkey AIP, which thus seems to be expanded in humans, perhaps in relation to the increased manipulative activity of humans. Whether DIPSA and phAIP have to be considered two regions as the separation between the local maxima (15–18 mm) suggest, or two parts of a single area, a predominantly visual part and a predominantly motor part, as suggested by Culham et al. (2003), remains an open question. The two other parietal regions activated in humans are VIPS/V7* and POIPS, the activation of which also had no counterpart in monkey posterior IPS for 3D-SFM (Vanduffel et al., 2002; Durand et al., 2007). These various parietal regions are functionally defined: phAIP by grasping studies (see Materials and Methods) and VIPS/POIPS/DIPSM/DIPSA by sensitivity to motion, 3D-SFM, and 2D shape (Orban et al., 1999; Sunaert et al., 1999; Denys et al., 2004). Retinotopic mapping in the present study allowed identification of the most ventral region, VIPS, as V7 or a neighboring area. Attempts have been made to retinotopically map the IPS beyond V7, describing 4 regions, IPS1–IPS4 (for review, see Wandell et al., 2007). These studies used only polar maps and the present results indicate that retinotopic organization cannot be described without eccentricity maps. Thus the retinotopic identification of the other parietal regions described here will require further effort.
Is the posterior ITG region active in the stereo interaction in our human subjects the homolog of monkey TEs? TEs is located in the lower bank of the rostral end of the STS (Janssen et al., 2000), ∼20 mm anterior to MT/V5. The dorsal edge of IT in the monkey is marked by the dorsal edge of the 2D-shape sensitivity located in the fundus of the STS (Nelissen et al., 2006). Thus we expect to find the homolog in humans along the dorsal edge of the 2D-shape sensitivity (Fig. 4B, blue outlines) at ∼25 mm or more from hMT/V5+. That is indeed where the posterior ITG site is located. But in the monkey TEs is located rather rostrally in the IT complex, while in humans LOC seems to extend substantially further forward beyond the posterior ITG site. This might, however, reflect an artifactual difference between species as the signal loss due to susceptibility artifacts affects the weak stereo interaction much more than the 2D-shape activation. Hence the anterior border of the posterior ITG site might be located more rostrally than what we could reliably measure in the present study, despite the fact we selected subjects with minimal susceptibility artifacts in the temporal region. This implies that TEs, just as AIP, to which it is connected in monkeys (Rozzi et al., 2006), has substantially enlarged during the evolution of hominoids.
Finally, the occipital activation is also expanded, insofar as two sites in the V3A complex are active in humans in addition to V3, which is active in both monkeys and humans. This adds yet another functional change of human V3A compared its monkey counterpart, in addition to motion and 3D-SFM sensitivity (Tootell et al., 1997; Vanduffel et al., 2001, 2002). The retinotopic results suggest that in addition to the functional change, human V3A is incorporated into a complex of areas, the homology of which is at present uncertain. It is noteworthy that those strongly modified occipitoparietal regions are the very ones which in humans support the perception of depth structure, at least when captured by the 3D shape adjustment task. Thus, these middle areas carry signals precise enough to support such a task. Whether these signals correspond to neurons selective for second-order disparity, as is the case in AIP and TEs (Janssen et al., 2000; Srivastava et al., 2006), and by inference, in DIPSA, phAIP, and possibly posterior ITG, is yet unclear.
While the present study focused on the stereo interaction, revealing the regions involved in the analysis of depth structure from stereo, it also revealed robust activation in the stereo main effect. The latter regions, in particular early areas V1–V3, which are homologous across primates (Kaas, 2004; Orban et al., 2004), likely house lower-order disparity selective neurons (Poggio et al., 1988). Such neurons have been modeled extensively and were shown to sustain 3D surface representations (Grossberg and Howe, 2003; Cao and Grossberg, 2005). These surfaces are simply frontoparallel planes at different depths and are different from 3D surfaces curved in depth, that define the depth structure of objects. Higher-order disparity selective neurons have yet to be modeled, but computer vision algorithms have been devised to extract this type of information (Li and Zucker, 2006).
Comparison with depth structure from other cues
The pattern of activation by depth structure from stereo is remarkably similar to that obtained for the extraction of 3D SFM (Orban et al., 1999; Vanduffel et al., 2002; Peuskens et al., 2004). Indeed, parietal regions and V3A* are activated similarly by both cues. The activation of DIPSA in 3D SFM is confirmed by the results of Murray et al. (2003) revealing a parietal shape area that overlaps DIPSA (Fig. 13, yellow dots). The posterior ITG region is also recruited by the motion cue especially by 3D surfaces: the blue dot (Fig. 13) indicates the location of the mid-OTS region defined in Orban et al. (2006). The stereo cue also matches the texture cue relatively well (Georgieva et al., 2008), but far less the shading cue, which activates only a small part of post-ITG (Fig. 13). Thus, the extraction of depth structure from stereo, motion, and texture overlaps in many regions, but no supramodal region has emerged where the local maxima for all four cues overlap: the activation sites for depth structure from shading and from stereo barely overlap. Yet the small region in ITG, where all four cues reach significance, is close to the region involved in tactile shape processing (Amedi et al., 2002) (Fig. 13, white dot), possibly reflecting an abstract 3D shape representation (Amedi et al., 2007).
Footnotes
-
This work was supported by Fonds Wetenschappelijk Onderzoek Grant G 151.04 and Katholieke Universiteit te Leuven Onderzoeksfonds Grants GOA 2005/18 and EF 05/014. We are indebted to M. De Paep, W. Depuydt, P. Kaeyenbergh, G. Meulemans, and S. Verstraeten for technical support, to S. Raiguel for comments on an earlier version, and to P. Janssen and O. Joly for help with the stimulus generation.
- Correspondence should be addressed to Guy A. Orban, Laboratorium voor Neurofysiologie en Psychofysiologie, Katholieke Universiteit te Leuven, Faculteit Geneeskunde, Herestraat 49, bus 1021, 3000 Leuven, Belgium. guy.orban{at}med.kuleuven.be
References
- Amedi et al., 2002.↵
- Amedi et al., 2007.↵
- Backus et al., 2001.↵
- Begliomini et al., 2007.↵
- Binkofski et al., 1999a.↵
- Binkofski et al., 1999b.↵
- Cao and Grossberg, 2005.↵
- Cavina-Pratesi et al., 2007.↵
- Chandrasekaran et al., 2007.↵
- Claeys et al., 2003.↵
- Culham et al., 2003.↵
- Denys et al., 2004.↵
- Durand et al., 2007.↵
- Fischl et al., 1999.↵
- Frahm et al., 1988.↵
- Frey et al., 2005.↵
- Genovese et al., 2002.↵
- Georgieva et al., 2008.↵
- Grèzes et al., 2003.↵
- Grossberg and Howe, 2003.↵
- Hansen et al., 2007.↵
- Henson and Penny, 2003.↵
- Janssen et al., 1999.↵
- Janssen et al., 2000.↵
- Janssen et al., 2001.↵
- Janssen et al., 2003.↵
- Jastorff et al., 2007.↵
- Joly et al., 2007.↵
- Kaas, 2004.↵
- Koenderink et al., 2001.↵
- Konen and Kastner, 2008.↵
- Kourtzi and Kanwisher, 2000.↵
- Króliczak et al., 2007.↵
- Larsson and Heeger, 2006.↵
- Li and Zucker, 2006.↵
- Malach et al., 1995.↵
- Murray et al., 2003.↵
- Nelissen et al., 2005.↵
- Nelissen et al., 2006.↵
- Neri et al., 2004.↵
- Ojemann et al., 1997.↵
- Orban et al., 1999.↵
- Orban et al., 2003.↵
- Orban et al., 2004.↵
- Orban et al., 2006.↵
- Paradis et al., 2000.↵
- Peuskens et al., 2004.↵
- Poggio et al., 1988.↵
- Raos et al., 2006.↵
- Rizzolatti et al., 1988.↵
- Rozzi et al., 2006.↵
- Sawamura et al., 2005.↵
- Saygin and Sereno, 2008.↵
- Sereno and Tootell, 2005.↵
- Sereno et al., 1995.↵
- Sereno et al., 2001.↵
- Srivastava et al., 2006.↵
- Srivastava et al., 2007.↵
- Sunaert et al., 1999.↵
- Swisher et al., 2007.↵
- Taira et al., 2001.↵
- Tanaka et al., 2001.↵
- Todd et al., 2004.↵
- Tootell et al., 1997.↵
- Tootell et al., 1998.↵
- Tsao et al., 2003.↵
- Vanduffel et al., 2000.↵
- Vanduffel et al., 2001.↵
- Vanduffel et al., 2002.↵
- Van Essen, 2004.↵
- Van Essen, 2005.↵
- Van Essen et al., 2001.↵
- Wandell et al., 2007.↵
- Welchman et al., 2005.↵