7 Tesla fMRI Reveals Systematic Functional Organization for Binocular Disparity in Dorsal Visual Cortex

The binocular disparity between the views of the world registered by the left and right eyes provides a powerful signal about the depth structure of the environment. Despite increasing knowledge of the cortical areas that process disparity from animal models, comparatively little is known about the local architecture of stereoscopic processing in the human brain. Here, we take advantage of the high spatial specificity and image contrast offered by 7 tesla fMRI to test for systematic organization of disparity representations in the human brain. Participants viewed random dot stereogram stimuli depicting different depth positions while we recorded fMRI responses from dorsomedial visual cortex. We repeated measurements across three separate imaging sessions. Using a series of computational modeling approaches, we report three main advances in understanding disparity organization in the human brain. First, we show that disparity preferences are clustered and that this organization persists across imaging sessions, particularly in area V3A. Second, we observe differences between the local distribution of voxel responses in early and dorsomedial visual areas, suggesting different cortical organization. Third, using modeling of voxel responses, we show that higher dorsal areas (V3A, V3B/KO) have properties that are characteristic of human depth judgments: a simple model that uses tuning parameters estimated from fMRI data captures known variations in human psychophysical performance. Together, these findings indicate that human dorsal visual cortex contains selective cortical structures for disparity that may support the neural computations that underlie depth perception.


Introduction
Understanding cortical organization at the mesoscopic scale is an important step in characterizing local neural circuits. The hypercolumn concept (Hubel and Wiesel, 1974) has been extremely influential in modeling cortical functional architecture (Ts'o et al., 2001) and provides a framework to test the neural machinery that facilitates sensory processing. However, despite good knowledge from animal models, comparatively little is known about the local architecture of human cortex.
Here, we investigate the brain organization that may underlie our ability to make precise depth judgments based on binocular disparities. Extracting depth from disparity is a demanding, yet routine, neural computation, making it plausible that it is supported by systematically organized cortical structures. Recordings from cat and macaque cortex indicate that neighborhoods respond similarly to disparity; that is, electrophysiology and optical imaging showed that disparity populations are structured in V2 (Hubel and Livingstone, 1987;Roe and Ts'o, 1995;Ts'o et al., 2001;Chen et al., 2008;Kara and Boyd, 2009), V3/V3A (Adams and Zeki, 2001;Anzai et al., 2011;Hubel et al., 2013), and MT/V5 (DeAngelis and Newsome, 1999). In contrast, area V1 is reported to show, at best, weak clustering (LeVay and Voigt, 1988;Prince et al., 2002b).
Tests of disparity processing in the primate brain point to strong fMRI responses in area V3A (Backus et al., 2001;Tsao et al., 2003;Preston et al., 2008). This work complements evidence that macaque V3A contains clusters of disparity selective neurons (Adams and Zeki, 2001;Anzai et al., 2011;Hubel et al., 2013). Here, we therefore focus on responses to disparity in the dorsomedial region around V3A using human brain imaging. To benefit from improved signal-to-noise and high BOLD contrastto-noise ratios (van der Zwaag et al., 2009), we used ultra-high field (UHF) 7 tesla (7 T) fMRI. Recent UHF fMRI work indicates that it can link structures observed in animal models with representations in the human brain: organization for ocular dominance and orientation was reported in primary visual cortex (Cheng et al., 2001;, whereas structured responses to motion direction were observed in area MT/V5 . We report that human visual cortex is systematically organized for binocular disparity, with dorsomedial area (V3A/B) showing structure that relates to the functional characteristics of depth perception. First, we test for clustering of disparity response profiles, finding evidence for maps that are reproducible across imaging sessions. We then characterize the selectivity of individual voxels for disparities of different magnitudes. We find different profiles of voxel responses: some show fine-tuned responses whereas others have categorical responses (i.e., near vs far depth). By fitting Gabor models to voxel profiles, we show a relationship between the magnitude of disparity and the tuning width of voxels in V3A and V3B/KO, but not earlier visual areas. Finally, we demonstrate a similarity between these voxel responses and established models of the functional properties of human stereopsis. Together, our findings suggest that dorsal visual cortex (V3A, V3B/KO) contains specialized organization for disparity, which may support the neural computations underlying 3D perception.

Materials and Methods
Participants. Six subjects (three male, aged 25-38 years, including authors N.R.G., H.B., A.E.W.) participated in the study. Participants provided informed consent and procedures were approved by the University of Nottingham Medical School Ethics Committee. All participants had normal or corrected-to-normal vision and did not present stereo deficits. One participant (Participant 5) withdrew from the study after the second scan.
Stimuli and design. Stimuli were presented stereoscopically using red and green anaglyphs (the very tight confines of the head coil meant that other stereoscopic display techniques were not feasible). Participants viewed stimuli projected onto a screen located at their feet (viewing distance ϭ 242 cm). To view the screen from within the head coil, they wore prism glasses (to which the red and green filters of the anaglyphs were attached). Stimuli were rear-projected onto the display screen from an Epson EMP-8300NL using a Nivitar NuView long-throw lens. At the start of each scan session, we verified the correspondence between disparity sign (e.g., "negative") in software with that presented on the projection screen (i.e., perceptually "near"): an experimenter viewed the screen from the front while wearing the prism glasses with anaglyph filters attached.
Stimuli consisted of random dot stereograms (7°ϫ 7°) on a midgray background surrounded by a static grid of black and white squares intended to facilitate stable vergence. Dots in the stereogram followed a black or white Gaussian luminance profile, subtending 0.07°at half maximum. There were 108 dots/deg 2 , resulting in ϳ38% coverage of the background. In the center of the stereogram, 4 wedges were equally distributed around a circular aperture (1.2°), each subtending 3°in the radial direction and 70°in polar angle, with a 20°gap between wedges (Fig. 1A). We varied the depth of the wedges by modulating disparity levels in relation to the fixation point (3,9,12,15,24, and 36 arcmin Ϯ 0.5 arcmin jitter, crossed and uncrossed). At a given time point, all wedges presented the same disparity. To reduce adaptation, we applied a random polar rotation to the set of wedges such that the disparity edges of the stimuli were in different locations for each stimulus presentation (i.e., a rigid body rotation of the four depth wedges together around the fixation point). In the center of the wedge field, we presented a fixation square (side length ϭ 1°) paired with horizontal and vertical nonius lines.
BOLD responses to binocular disparity were estimated using a block design. During each block, stimuli were presented at one of the six disparity levels defined for that session. The block length was 15 s, with 10 stimuli presented for 1 s with an interstimulus interval of 0.5 s. During a run, six different blocks were presented (one for each disparity level) and each was repeated three times (18 blocks). In addition, there was a fixation block at the start and the end of each run. Each run lasted 300 s (20 blocks ϫ 15 s) and we collected eight or nine functional runs in each imaging session. On each run, we asked participants to fixate in the central fixation square while performing a Vernier detection task (Preston et al., 2008).
Imaging. Imaging sessions were performed at the Sir Peter Mansfield Magnetic Resonance Centre, University of Nottingham, on a 7 T Philips Achieva scanner with volume transmit and a 32-channel receive coil. Head motion was restricted by the use of foam padding and a vacuum pillow (B.u.W. Schmidt). Data were acquired using a 3D gradient echo echoplanar imaging [3D GE-EPI with SENSE factor 2.35 in the anteriorposterior (AP) direction and 2 in the foot-head (FH) direction, TE/TR ϭ 28/82 ms, FA ϭ 22°, EPI factor 45; 0.96 ϫ 0.96 ϫ 1 mm 3 ; matrix size: 160 ϫ 160 ϫ 36 (AP ϫ RL ϫ FH); volume acquisition time of 3 s; 100 volumes per run] to acquire blood oxygen level-dependent signals from a field of view (FOV) spanning the dorsomedial visual cortex. The reduced FOV required the use of outer-volume suppression in the phase encoding direction (AP) to prevent signal foldover. We assessed signal quality by contrasting all presented stimuli against fixation and found no reliable differences in signal quality between the separate imaging sessions.
Before 7 T imaging sessions, participants underwent localizer scans at the Birmingham University Imaging Centre (BUIC). A 3 T Philips Achieva scanner was used to collect fMRI data to standard retinotopy experiments (Preston et al., 2008). Retinotopic maps were later used to define regions of interest (ROIs) and to assist in positioning the acquisition volume over dorsomedial visual areas for 7 T data acquisition. An anatomical volume was also acquired (MPRAGE, 1 mm isotropic resolution) and used for surface reconstruction using FreeSurfer (Dale et al., 1999;Fischl et al., 1999). The resulting reconstructed white matter (WM) and gray matter (GM) surfaces were then used to compute cortical profiles. These cortical profiles were defined as vectors that connect corresponding vertices in WM and GM surfaces. These vectors were then used to sample functional data at different relative depths.
We analyzed functional data using mrTools (http://www.cns.nyu. edu/heegerlab) and custom MATLAB code (The Mathworks). We first coregistered functional scans to the anatomical volume used for surface reconstruction and subsequent analyses were performed in the individual native space. Preprocessing consisted of motion correction using linear interpolation and linear detrending. We modeled the BOLD signal using the six disparity levels as regressors of interest and estimated the model parameters (i.e., the ␤-weights) associated with each condition. Voxel preference was assigned on a "winner-take-all" basis with respect to the magnitude of the ␤-weights across conditions. Voxel disparity preferences were mapped onto the cortical surface by sampling the functional data (nearest neighbor interpolation) at intermediate depths along a cortical profile. Although measurements at UHF are less susceptible to vascular influences (Gati et al., 1997;Ogawa et al., 1998;Ugurbil et al., 2003), sampling at intermediate depths further minimizes the influence of superficial veins (Sánchez-Panchuelo et al., 2012) and improves spatial localization (Polimeni et al., 2010). Nevertheless, we also quantified vascular influences by calculating the mean BOLD signal amplitude across the cortical surface.
Multivoxel pattern analysis. To confirm that our target regions carried information about the stimulus dimension that we manipulated, we used standard multivariate analyses on activity from retinotopically defined areas in dorsomedial visual cortex (Preston et al., 2008). For each ROI, we converted voxel time series to z-scores and shifted the respective time course by 2 volumes (equivalent to 6 s) to account for the hemodynamic delay. We then averaged data points within each condition block and used a linear classifier (support vector machine, libsvm toolbox; Chang and Lin, 2011) to discriminate between different stimulus conditions. We ranked voxels according to the z-score of the comparison between all stimuli and the fixation blocks and then used the top 500 voxels in each ROI for the classification analysis. We followed a leave-one-run-out cross-validation procedure, resulting in eight or nine folds depending on the number of completed runs for each participant (we use the term "fold" to refer to the different combinations of independent subsets of the data). In particular, from seven/eight runs (of the total eight/nine runs), we extracted 126/144 patterns to train the classifier and then tested the classifier on 18 patterns extracted from the remaining run. This process was repeated so as to leave out each individual run in turn and the mean accuracy for each subject was computed across folds. We performed two-way (near vs far) and six-way (individual disparities) decoding analysis using this technique.
Calculating the probability of similar voxel preferences in a local neighborhood. To assess the degree of clustering in responses to disparity, we examined the distribution of disparity preferences in the local neighbor-hood of voxels surrounding a given target voxel. We first subdivided the data into two independent sets (one to find the disparity preference of the target, the other to find the preference of the surround) using a leavetwo-runs-out cross-validation procedure. For each individual voxel, the disparity preference was estimated by fitting a general linear model (GLM) to six/seven runs of the total of eight or nine runs acquired. We used the remaining two runs to estimate the disparity preferences of the voxels adjacent to the target voxel. Voxels were considered neighbors if they belonged to the 26-connected neighborhood that shared a vertex with the target voxel and if they were located within the ROI (e.g., V1) under consideration. We calculated the frequency of each disparity preference in the neighborhood of individual voxels and indexed the distribution to the disparity preference of the target voxel. We repeated this process using different subdivisions of the data for cross-validation ( 8 C 2 ϭ 28 or 9 C 2 ϭ 36 folds, depending on the number of experimental runs acquired) for each participant and pooled the resulting frequency distribution across subjects. Frequencies were converted to probabilities by dividing by the total number of adjacent neighbors and then averaged according to the preference of the central (target) voxel. This produced six probability distributions that describe disparity preferences in the Four disparity-defined wedges were simultaneously presented at 1 of 6 disparity-defined depths during each imaging session (Ϯ3, 9, and 15 arcmin in sessions 1 and 2; Ϯ12, 24, and 36 arcmin in Session 3). B, The depth of the wedges was defined by manipulating disparity in random dot stereograms, which were viewed through red-green anaglyphs attached to prism glasses. C, BOLD signals were acquired from dorsomedial visual cortex. Slice placement is illustrated here on a near midsagittal slice in Participant 1. D, Signal changes in response to stimulus delivery (stimulus vs rest) for Participant 1, showing that activity is localized to the gray matter. E, Mean percent-signal change for stimulation versus blank periods across all subjects and sessions (N ϭ 16). Error bars represent the SEM. F, Mean prediction accuracy for the discrimination of crossed "near" versus uncrossed "far" disparities across early and dorsal visual areas (two-way classification). Chance level (50%) is indicated by the dashed gray line. Error bars depict the SEM across subjects and sessions (N ϭ 16). G, Mean prediction accuracy for the discrimination of individual disparity conditions presented within each session (six-way classification). Chance performance (16.7%) is indicated by the dashed gray line. Error bars depict the SEM across subjects and sessions (N ϭ 16).
surround of individual voxels (one distribution per central disparity preference). To compensate for general biases in disparity preference, we divided each probability distribution by the overall disparity preference probability within that ROI. These six relative probability distributions were represented in matrix form, where each distribution is represented by a row vector. Rows indicate the (indexed) central preference and columns represent the local disparity preference.
Simulating columnar organization and local clustering. To quantify the extent to which disparity clustering at the neural level might reasonably be extracted by a coarser-scale sampling grid (i.e., fMRI voxels), we simulated cortical architectures with different spatial scales and then used the clustering analysis described in the previous section. In particular, we simulated cortical columns of different spatial periodicity by band-pass filtering 2D white noise (Rojer and Schwartz, 1990), a method that has been used previously to simulate orientation columns (Boynton, 2005). We started by generating a matrix in which elements were pseudorandomly extracted from a normal distribution, representing a 40 mm 2 patch of cortex. The noise matrix was band-pass filtered to preserve content with a specific periodicity, which determines the columnar width of the pattern. To test different levels of clustering, we simulated cortical columns varying in width between 1 and 4 mm. Having generated the neural map, we simulated the fMRI sampling procedure by placing a 2D grid of 1 mm squares (representing voxels) over the columnar map. We then assigned a preference to each "voxel" using a probabilistic approach. In particular, we defined the probability of voxel preference across trials as the distribution of the underlying neural preferences within each voxel. This provided us with a discrete probability distribution for each voxel, which we then used to generate 500 fMRI preference maps. We then assessed local disparity clustering as above (see "Calculating the probability of similar voxel preferences in a local neighborhood"). The only difference was that we considered the 8-connected neighborhood of each voxel, as the simulation was performed using a 2D representation, rather than the 3D data obtained from our empirical measurements.
Comparison of disparity preference maps across sessions. To determine whether disparity preferences revealed at the voxel level represented a stable property of cortical responses, we sought to compare disparity maps obtained from scans performed on different days. To this end, we first needed to identify those voxels that had reliable disparity responses within each session. We did this by estimating the disparity response of each voxel using a leave-two-runs-out GLM fitting approach. By iteratively leaving two runs out, we identified voxels that responded maximally to a given disparity on at least 50% of the GLM fits. Having identified voxels with stable within-session responses, we re-estimated the disparity response of each voxel using the full dataset (i.e., a GLM fit to all runs within a session). To coregister maps from different sessions into a common space, we transformed measurements from each participant's original functional space to their native anatomical space by applying the transformation matrix computed during anatomical-functional coregistration. We then computed the Pearson correlation between corresponding voxels (nearest neighbors) across sessions (bootstrapping, 10,000 samples). To ensure the stability of the correlations, we systematically varied the within-session repeatability criterion (from 50% to 80% of the same preference using the leave-two-runs-out GLM procedure). We found that our estimates of between-session correlations were stable across this range of within-session repeatability thresholds.
Because the coregistration procedure described above is subject to random error, we also used an additional alignment step to compensate for small misalignments between data acquired in different sessions. In particular, we recomputed correlations between voxels in different sessions after applying an additional iterative alignment procedure to improve coregistration. This procedure adjusted the position of one of the maps to minimize the differences in disparity preference across the ROI (we provide results with and without this extra alignment procedure). We defined the first session map as the reference and the second session map as the source. Let R and S represent the voxels belonging to the reference and source maps, respectively. Each of these is defined as an m-by-four matrix, where m is the number of voxels and each voxel is described by four features: their 3D coordinates (x, y, z) and their peak disparity response. We iteratively adjusted an affine transformation W to the source map S to minimize the disparity preference difference between corresponding nearest neighbors across sessions, d, which we can define as follows: where c(i) is the index of the closest voxel of WS in relation to R i . We restricted the transformation W to be as small as possible by penalizing large deformations. We thus defined our optimization function as follows: where I is the identity matrix and lambda is an empirically defined regularization weight equal to 0.1 that ensured convergence of the minimization algorithm while preventing gross distortions of the maps. The first term of the optimization objective minimizes overall differences in spatial organization of disparity preferences between maps and the second term restricts the spatial transformation to be as small as possible, weighted by lambda. The maximum number of iterations was set to 200 and the resulting spatial transformations were very close to identity. After this alignment step, we recomputed the Pearson correlation coefficient using bootstrapping (10,000 samples, as above).
So far, we have described using the Pearson correlation coefficient to quantify the correspondence between disparity maps obtained in different imaging sessions based on a linear relationship between variables. A more general approach is to ask how much information is shared between disparity maps obtained in different imaging sessions. To do so, we used mutual information (Shannon, 1948) that quantifies the reduction in uncertainty about a variable after the observation of another variable. In particular, we computed the reduction in uncertainty about disparity preference in one map after observing the disparity preferences in the other map (note that we performed this analysis on the data without the additional preference alignment step). In the discrete case, mutual information is defined as follows (Shannon and Weaver, 1949): where P XY (x, y) denotes the joint probability distribution of X and Y and P X (x) and P Y ( y) represent the respective marginal probability distributions. If X and Y are independent, then P XY (x, y) ϭ P X (x)P Y ( y), and therefore I(X;Y ) ϭ 0.
Modeling disparity responses of individual voxels using neuronal templates. To model the responses of individual voxels to different disparities, we used the disparity tuning templates proposed by Poggio (Poggio and Fischer, 1977;Poggio et al., 1988). In particular, we used linear regression to assess how tuned [tuned near (TN)/tuned far (TF)], categorical [near (NE)/far (FA)], and excitatory/inhibitory [tuned excitatory (TE)/tuned inhibitory (TI)] cell models explained individual voxel responses. Regressors consisted of discrete ideal responses for each model type at the preferred disparity of that voxel and an additional offset/ baseline term. Specifically, the discrete realizations of these models were as follows: (1) tuned model, a Kronecker delta shifted to the preferred disparity of the voxel; (2) categorical model, a square wave cycle between 0 and 1, odd around zero disparity, so that the positive step coincides with the preferred disparity of the voxel; and (3) excitatory/inhibitory model, a shifted triangle wave cycle even around zero disparity. The triangle wave had its peak around zero disparity if the disparity preference of the voxel was the smallest disparity magnitude presented and its trough otherwise. After assembling the regressors for each voxel, linear regression was performed using MATLAB. This produced a set of four weights per voxel (one for each tuning model plus the baseline term), which express the extent to which each model explained the response profile of individual voxels. For subsequent analysis, we selected voxels that were well modeled by this approach (R 2 Ͼ 0.8).
Modeling voxel responses to disparity using a Gabor model. The previous section used descriptive neuronal models to examine voxel responses.
Next, we sought to estimate disparity responses more parametrically. To this end, we used a 1D Gabor model that has been used to describe the response profiles of disparity selective neurons in early and extrastriate visual areas (e.g., V1, Prince et al., 2002b;V3/V3A, Anzai et al., 2011;MT, DeAngelis and Uka, 2003). In particular, we used a Gabor function to describe the response of voxels to variations of binocular disparity. For each voxel, we started by removing baseline differences in ␤-weights by subtracting the mean ␤-weight across all of the presented disparities. For each ROI, we then grouped voxels based on their preferred disparity (i.e., maximum ␤-weight), resulting in 60 groups (12 preferred disparities for five ROIs). We then fit a Gabor model to each group of voxels (using the data from all the voxels, rather than the averaged voxel response), where the response to a disparity, d, was defined as follows: where A is the amplitude, A 0 is the baseline, d 0 is the position of the Gaussian envelope, is the width of the envelope, f is the frequency of the cosine, and is the phase shift between the cosine and the center of the Gaussian envelope. We used constrained optimization (fmincon, MATLAB) to find the parameters of the Gabor model that best described each group of voxel responses (least-squares estimation). We constrained the minimizers ad hoc to sensible values given the disparity levels we presented, which ranged from Ϫ36 to 36 arcmin. First, because baseline correction was performed before fitting, we constrained the baseline shift to values between Ϫ1 and 1. Second, because voxels were grouped by their preferred disparity before fitting, the position of the Gaussian was constrained to a window of 10 arcmin around the preferred disparity of each voxel group. Third, the amplitude of the Gaussian envelope was constrained to 1.2 times the amplitude range of voxel responses and the width of the Gaussian was restricted between 5 and 12 arcmin to avoid overfitting the data. Finally, the frequency was allowed to vary between 0 and 1/(d max Ϫ d min ) cycles per arcmin, where d max and d min represent the maximum and minimum disparity presented during the experimental session, respectively. This constrained the frequency to remain below half the sampling frequency. Using these parameter limits enabled us to avoid gross overfitting that can arise from the oscillatory term of the Gabor (a combination of envelope width and frequency of the carrier). We quantified overfitting by contrasting the Gabor model fit against a piecewise linear fit to neighboring points in the response profile. In particular, we started by estimating the slope of the line that connects two consecutive points of the response profile. Next, we computed the maximum instantaneous variation of the fitted Gabor within the same interval and subtracted the slope of the linear fit. If the Gabor oscillates considerably between two consecutive points, there will be a large absolute difference between the maximum variation of the Gabor and the slope computed by linear approximation. In contrast, if the Gabor follows the linear trajectory between two consecutive points closely, this difference will be nearly zero.
Using the constraints described above, the optimization function found good fits across experimental conditions. Specifically, we assessed the quality of fits using a 2 goodness-of-fit test (DeAngelis and Uka, 2003). This test compares the variance of the residuals around the mean tuning profile with the variance of the residuals around the model fit. In particular, we computed the difference between each datum and the model value at that disparity, which provided us with a distribution of residuals around the model. We then compared the variance of this distribution against the variance of the residuals around the mean using a 2 test for equal variances. The fit is considered satisfactory if the variances of these distributions do not differ significantly.
The constraints of fMRI data acquisition meant that we were limited in the number of different disparities that we could measure reliably during each imaging session. Therefore, fits to voxel responses are limited in their resolution along the disparity domain. This presents a challenge in choosing and fitting the correct model to the data. Based on the electrophysiological literature, a Gabor model is a good descriptor of individual neuron responses within the visual cortex (see above). As an alternative, we also considered a Gaussian model that has the advantage of fewer parameters. However, comparison of the models indicated that the Gaussian model was insufficient to capture the different profiles of the voxels we measured. In particular, 30 (of a total of 60) fits did not pass the 2 goodness-of-fit test described above ( p Ͻ 0.05). In contrast, using the Gabor model only four of 60 fits failed this test. The superiority of the Gabor model was also confirmed using a hold-out cross-validation procedure in which half the data were used to fit the model and the other half used to compute the mean squared error around the fit. This is perhaps not surprising because the Gabor model has more free parameters than the Gaussian. Therefore, we also compared these models using the Akaike information criterion (AIC) and found that the mean AIC value across all fits was much lower for the Gabor model. We therefore adopted the Gabor model to describe the response profiles of our sampled voxels.
Using fMRI-based estimates to model disparity population characteristics. The modeling methods so far described allow us to describe properties of disparity-selective populations based on our fMRI recordings. Next, we investigated whether these estimates could be related to the characteristics of neural populations that underlie psychophysical depth judgments. Specifically, modeling and psychophysical investigations point to a relationship between disparity selectivity and disparity magnitude: as disparity magnitude increases, disparity detectors are thought to have larger receptive field sizes (Lehky and Sejnowski, 1990;Stevenson et al., 1992) and this relationship is well approximated by a linear increase for disparities (5-20 arcmin) near fixation (Stevenson et al., 1992). Motivated by these findings, we investigated whether the envelope size of the fitted Gabor models increased with disparity magnitude. In particular, we examined whether there is a correlation between the SD and preferred disparity in each ROI (Pearson's correlation, p Ͻ 0.05). If the correlation was significant, we used linear regression to estimate the best fitting trend that describes the variation of each Gabor parameter as a function of disparity magnitude; otherwise, the Gabor parameters were assumed to be constant and set to the mean value across disparity magnitudes.
We centered this analysis on the relationship between the SD of the Gaussian envelope and disparity magnitude because we were interested in changes in response profile width. The parameters of the cosine term (i.e., the frequency f and phase ) provide insight into disparity selectivity in terms of the presence of on-off subregions within a neuron's receptive field. However, in our case, the limited number of disparities sampled and the aggregated nature of the voxel measurements meant that it would be difficult to draw any strong conclusions from any observed relationship between these parameters; therefore, we limited our analysis to the relationship between the peak response and SD.
Using the estimates of the relationship between disparity magnitude and envelope size, we built a distributed population of disparity selective units. Here, the term "unit" describes a disparity detector, which, in this case, is derived from a population of voxels. For each ROI, the regression fits were used to build Gabor detector units at 17 equally spaced disparity magnitudes between Ϫ40 and 40 arcmin. We then simulated the ability of this bank of detectors to discriminate different disparities (for a thorough description, see Lehky and Sejnowski, 1990). In short, we estimated the smallest disparity difference that could elicit a significant change in activity across the population as a whole. First, we computed the responses of each Gabor unit to two disparity levels and derived the respective variances assuming direct proportionality. Specifically, if we let R ij be the response of unit i to stimulus j, its variance is then given by ij 2 ϭ kR ij with k ϭ 1.5 (for plausibility of this arbitrary parameter, see Lehky and Sejnowski, 1990). The number of SDs separating these responses was defined as follows: Large values of dЈ i suggest that changes in response of the i th unit were stimulus induced, whereas small values indicate chance fluctuations due to noise. Statistically, the probability of observing a stimulus-induced change in individual units was defined as follows: At the population level, however, each unit represents one of many dimensions. In this multidimensional space, assuming uncorrelated noise, variations in responses were tested using the joint probability as follows: Here, p represents the probability of changes across the whole population being stimulus induced. We finally computed the disparity discrimination threshold as the minimum disparity difference for which p Ͼ 0.5 (as in Lehky and Sejnwoski, 1990; see their erratum). Discrimination thresholds were evaluated at disparities between Ϫ40 and ϩ40 arcmin.

Results
We presented participants with disparity-defined wedges at a range of different depth positions ( Fig. 1 A, B) and recorded the BOLD signal from voxels spanning the dorsomedial visual cortex (Fig. 1C). We observed strong BOLD responses to manipulations of disparity that were well localized to the gray matter (Fig. 1D).
To provide a first analysis of these data, we quantified aggregated BOLD responses in the dorsal visual cortex using two approaches. First, we computed the change in the BOLD response for disparity-defined stimuli relative to the fixation baseline in each of the localized ROIs. This revealed large changes in the BOLD signal in V1 and dorsal extrastriate cortex (Fig. 1E), with magnitudes consistent with previous studies at UHF (Hoffmann et al., 2009;van der Zwaag et al., 2009;Polimeni et al., 2010). Second, we quantified responses in different ROIs using a multivoxel decoding analysis approach for disparity-defined stimuli (Preston et al., 2008). In particular, we calculated the accuracy of a support vector machine in predicting whether a stimulus was nearer or farther than the fixation point based on patterns of voxel activity. We found high accuracies for discriminating crossed ("near") versus uncrossed ("far") disparity (Fig. 1F ) that were, on average, highest in V3A and at accuracy levels comparable to previous work (Preston et al., 2008). We also performed a six-way classification analysis of the data, testing how well the presented disparity could be predicted from the six different types of stimuli (i.e., disparity values) presented. We observed performance well above chance, with highest mean performance in V3A (Fig. 1G). These results are consistent with previous work at 3 T in suggesting strong responses to disparity, particularly in areas V3A and V3B/ KO (Backus et al., 2001;Tsao et al., 2003;Preston et al., 2008). This initial examination of the data is confirmatory. However, our primary interest was not in aggregated voxel responses from within different ROIs, but rather whether 7 T fMRI would allow us to detect and quantify consistent spatial organization of individual voxel responses. In our next analyses, we therefore consider the response profiles of individual voxels as a proxy that summarizes the activity of a neural population centered on the voxel. To this end, we used the ␤-weights of the GLM model fit to the fMRI time series to determine how the different presented stimuli explain the activity of individual voxels. We start by defining the disparity preference of a voxel as the condition that yields the highest ␤-weight (i.e., winner-take-all labeling). Later, we consider other models that seek to capture the response profile of a voxel based on all of the estimated ␤-weights.

Spatial clustering of peak responses to disparity
Motivated by reports of disparity clustering in macaque extrastriate cortex (Anzai et al., 2011;Hubel et al., 2013), we tested for clustering within the human visual cortex. In particular, we examined the spatial distribution of disparity responses across the cortical surface by labeling individual voxels according to the disparity value that evoked the highest level of fMRI activity (i.e., maximum ␤-weight of the GLM) during each imaging session. To visualize the data, we color-coded the disparity "preferences" of individual voxels and mapped them onto flattened representations of the cortex. This produced cortical maps with an apparent organization: contiguous spaces across the cortical surface Coarse clusters of peak disparity responses do not overlap with the potential location of large veins. C, The same ROI in Participant 2, but now represented across 11 relative points through the entire range of the cortical sheet (0 to 1 relative depth, sampled at increments of 0.1). The flattened representations for each cortical depth were stacked together and an opacity gradient was applied to aid visualization of peak disparity response across the cortical depth. Note that to assist visualization the cortical depth, dimension is not drawn to scale. D, Sliced view of peak disparity responses in the same ROI (Participant 2, left V3A). Data are cut through the cortical depth along a line extending from the foveal representation of V3A up to the periphery near the border with V3d.
share similar disparity responses ( Fig. 2A). Importantly, these contiguous areas did not overlap with regions where the mean amplitude of the BOLD signal was low, suggesting that clustering was not simply due to macrovasculature (Fig. 2B). Moreover, there was reasonable consistency in the peak disparity response at different cortical depths (Fig. 2C,D; note that the scale of the cortical depth axis here is expanded relative to the cortical location plane to aid visualization).
Although these visualizations are useful in illustrating the general spatial profiles of responses, the process of mapping and interpolating the data from the (raw) native fMRI data space to a flattened representation of the cortical sheet can introduce overrepresentation (or underrepresentation) of individual datum. In particular, there are frequent one-to-many correspondences between voxels in the functional space and pixels visualized on the flat cortical surface (i.e., oversampling), which can inflate the extent of clustering observed on these flat maps. Therefore, we sought to evaluate response clustering by examining the disparity response of neighboring voxels in the (native) functional space, thereby ensuring no overrepresentation or underrepresentation of the data.
To quantify peak disparity response clustering, we assessed the similarity between the preference of a central target voxel and that of its neighbors. We did this by calculating the distribution of disparity preferences in the population of voxels that shared at least one vertex with the target voxel. Thereby, we calculated a probability map for the disparity preference of the neighborhood referenced to the disparity preference of the target voxel (see Fig.  3A for a schematic illustration). Our logical expectation was that if there is clustering in the disparity preferences, target voxels will be surrounded by neighbors with the same (Fig. 3A, top) or similar (Fig. 3A, middle) preferences, in contrast to randomly organized preferences (Fig. 3A, bottom). However, the extent to which this structure will be visible depends on the spatial scale of the underlying neural maps in relation to the fMRI sampling resolution. Before examining the empirical data, we therefore consider the extent to which clustering can be recovered based on a simulated dataset.
To test for clustering at the voxel level, we performed simulations using a model of cortical columns for orientation (Rojer and Schwartz, 1990) because there is no standard model for disparity organization. We supposed neural maps of different spatial scales (columns 1-4 mm in width) and then sampled these maps using a simulated 1 mm isotropic "voxel" grid (Fig. 3B). Thereafter, we computed the voxel similarity of each sampled voxel relative to its neighbors and then averaged together the neighborhood preferences of all voxels that had the same central voxel preference. This resulted in a similarity matrix that shows the statistical relationship between the preference of central voxels relative to their surround (Fig. 3B, bottom), where strong diagonal structure indicates a close relationship between central voxels and their local neighbors. (Note that the higher probabilities in the top left and bottom right corners of these plots arise because orientation is a circular dimension; we would not anticipate these A B C D Figure 3. Local clustering of peak voxel responses to disparity ("preferences") in simulated and empirical datasets. A, Simplified 2D illustration of clustered and disperse preferences for a given voxel. Individual voxels and their neighbors will often share similar peak responses if there is spatial clustering (top and middle). If there is no organization, no relationship should be observed between the peak responses of a target voxel and its neighbors (bottom). B, Simulation of columnar architectures for orientation (after Rojer and Schwartz, 1990) with periodicity varying from 1 to 4 mm (top) and the respective preference maps after simulating voxel sampling using six equidistant conditions (middle). Bottom, The correspondence between the preference of target voxels and their neighborhood is shown in form of a probability matrix for each columnar width. In each matrix, the i th row represents the average probability distribution of preferences around voxels preferring the i th disparity and the probability value is represented in grayscale (green horizontal line on the color bar indicates chance level, 0.167, given six response options). Maps that are clustered display a clear diagonal structure, demonstrating that nearby voxels tend to share similar preferences. C, The same matrix representation for empirical disparity maps from visual areas V1, V2d, V3d, V3A, and V3B/KO. Green horizontal bar on the color bar indicates chance level (0.167). A diagonal structure emerges along the dorsal cortical hierarchy. Note the different grayscale range from part B for empirical fMRI measurements. D, Results of a similar analysis after randomly shuffling the disparity preferences in V3A and V3B/KO. In this case, we do not observe any diagonal structure.
for binocular disparity, which is a more linear dimension.) These simulations indicate that, using 1 mm isotropic voxels, it is realistic to obtain information about the structure of underlying cortical organization if the scale of the neural maps is in the region of 3 mm. This corresponds to the estimated scale of disparity maps in human cortex based on scaling up measurements from macaque MT to account for overall brain size (DeAngelis and Newsome, 1999, and see the supplementary information from Ban et al., 2012).
Having demonstrated proof of concept, we now return to the empirical fMRI data. In principle, we could calculate clustering in exactly the same way as described for the simulations. However, real fMRI voxel responses are not temporally or spatially independent because of the point-spread function (PSF) of the BOLD signal, meaning that a more sophisticated method is required. In particular, we estimated the preference of the central target voxel and its neighbors using independent data subsamples (leave-tworuns-out cross-validation) such that shared preferences for a given measurement could not simply be due to the dependency of BOLD responses for nearby voxels. Although this strategy does not remove the influence of spatial blurring, it eliminates temporal correlations between neighboring voxels because we use different time courses for estimating the preference of central voxels and their surround. We computed preference similarity for each presented disparity value, creating matrices for each ROI (Fig.  3C). We found that diagonal structure in the preference similarity matrices became increasingly apparent for measurements at increasing levels of the dorsal cortical hierarchy. To quantify this observation, we used a reliability statistic that compared the mean probability along the positive diagonal of the matrix, with the distribution of mean values calculated from random sampling from all locations within the matrix (bootstrapping: 10,000 resamples of six values). We found evidence for significant clustering in V2d ( p ϭ 0.04), V3d ( p ϭ 0.01), V3A ( p ϭ 0.02), and V3B/KO ( p ϭ 0.003), but not in V1 ( p ϭ 0.11). As a control, we recomputed matrices after shuffling disparity preferences and found no systematic structure (Fig. 3D). This confirmed that evidence for clustering in higher dorsal areas could not somehow derive from differences in size and/or shape of different ROI. Together, these data provide evidence that clustering of disparity preferences is particularly marked in higher dorsal visual areas V3A and V3B/KO, in contrast to primary visual cortex. It is possible that preference clustering is much less pronounced in early visual areas. However, recall that, from our simulations of maps with different spatial scales, it is possible that there is systematic organization in early areas, but the fMRI sampling resolution does not allow this to be detected.

Testing for the reproducibility of disparity responses
The preceding analyses support the notion that responses to disparity are clustered in dorsal extrastriate cortex. However, to determine the extent to which this clustering represents genuine cortical structure, we next sought to test whether the spatial distribution of disparity preferences is a persistent property of neuronal responses. Specifically, we tested whether preference maps could be reproduced between different imaging sessions.
Comparing functional data across different imaging sessions, especially at very high resolution, is extremely challenging and previous UHF studies have therefore focused on repeatability within sessions (Cheng et al., 2001;. In particular, differences in voxel slab positioning in relation to the cortical sheet affect sampling (Cheng et al., 2001). Moreover, with a functional resolution of 1 mm (near isotropic), we expect to acquire approximately 2 points from a given location on the cortical sheet, meaning that additional discrepancies could arise from sampling at different cortical depths. As a result, we would not necessarily expect one-toone voxel correspondence between functional data acquired in different imaging sessions.
Nevertheless, we were able to capture similarities between individual disparity preference maps across sessions for four of the six participants who took part in repeated sessions (Fig.  4A-D). In these maps, disparity preferences appear to be coarsely organized into bands, which can be clearly identified in maps obtained in different imaging sessions (see the outlines in Fig. 4). For one participant (Participant 5; Fig. 4E), we found similar structures across sessions but with reversed disparity sign (note the correspondence between blue and red regions across sessions, particularly in the right hemisphere). This map inversion is consistent with a change from a rear-to front-projection setting, causing a left-right horizontal flip and thereby reversing the disparity sign presented during the experiment. We suspect that this was the result of restarting the projector immediately before this participant's scan due to technical problems. For the final participant, we did not find apparent correspondence between sessions, although we note that slice positioning was not optimal in the second session and a portion of V3A was omitted (Fig. 4F ).
To quantify the similarity between maps, we used voxelwise correlation in the native functional space. We first selected voxels that had a stable within-session preference and then brought the functional data from each session into common alignment (see Materials and Methods). We then computed the Pearson correlation between corresponding voxels (nearest neighbors) across sessions using bootstrapped resampling (10,000 samples). Confirming our observations from the flattened cortical representations (Fig. 4), we observed reliable correlations between disparity maps for four participants (Fig. 5A, Participants 1-4). In addition, we found reliable negative correlations for one participant (Fig. 5A, Participant 5), which is consistent with the apparent inversion of the disparity maps (Fig. 4E).
As discussed above, small spatial misalignments between sessions can lead to an underestimation of between-session repeatability. To ameliorate small misalignments, we considered an additional processing stage in which we incorporated a preference-based between-session alignment step. In particular, we calculated an affine transform between the 3D maps that sought to improve coregistration while minimizing nonlinear deformations (see Materials and Methods). We then recomputed correlations across sessions (Fig. 5B) and found a small improvement in correlation values for participants with previously reliable between-session correspondence. However, the method itself did not introduce significant correlations (e.g., Participant 6) when there was little common structure before alignment and, in general, the effect of this additional alignment step was slight. In particular, Figure 6 shows the V3A map for Participant 4 (that showed the maximum benefit from this alignment step) with and without the additional alignment step.
To provide an additional measure of reproducibility, we computed the mutual information (Shannon, 1948;Shannon and Weaver, 1949) between maps obtained in different imaging sessions (using the non-preference-aligned data). We compared the empirical mutual information (Fig. 5C) with bootstrapped estimates based on randomly permuted disparity preferences (Fig.  5C, horizontal lines). We found evidence of persistent information for five participants, confirming the presence of disparity selective structures. . Overlaid contours represent the edges of the uncrossed disparity/"far" (red) region from Session 1. These were calculated by binarizing the maps and then applying an edge detection algorithm. The outlines were omitted in D (right V3A) because the fine scale changes in this map mean that superimposed contours masked the data and therefore hindered visualization. E, The distribution of peak disparity responses in Participant 5 reveals similar structures between sessions, but the color map appears to be inverted. F, No correspondence between disparity maps is evident for Participant 6. Slice positioning in the second session was not optimal, with the result that not all of right hemisphere V3A was fully sampled (bottom right).

Quantifying voxel response profiles at different disparity magnitudes
In the previous sections, we tested for clustering of disparity preferences by assigning a single disparity preference to each voxel using a winner-takes-all labeling approach. This is an obvious simplification because neurons sensitive to disparity respond to a range of different disparity values. Moreover, the disparity tuning curves of individual neurons often vary greatly in morphology: some neurons may respond to a limited range of disparities, whereas others respond to a broad range of disparities and are only selective for disparity sign (Poggio and Fischer, 1977;Poggio et al., 1988). Further, fMRI measurements aggregate the activity of many individual neurons within a voxel, meaning that the response profile of an individual voxel may exhibit an even greater variety of responses. We therefore sought to quantify each voxel's response profile to the range of presented disparities. To do so, we first used the tuning templates described by Poggio (Poggio and Fischer, 1977;Poggio et al., 1988). These templates offer a descriptive approximation of the response profiles of many disparity selective neurons (DeAngelis and Newsome, 1999;Prince et al., 2002b) and are simpler (in terms of the number of parameters) than the Gabor models that we will use later. We used these simplified models of disparity selectivity to examine the responses of individual voxels that aggregate the responses of many individual neurons. This voxel-based sampling is quasirandom with respect to the underlying neuronal populations and, as we discuss above, the scale at which underlying neural representations are sampled clearly influences the information available at the voxel level. Nevertheless, on the basis that disparity representations are clustered, it is reasonable to ask whether the local population activity captured by voxels can be related to physiological models of disparity selectivity and how such models are distributed across the cortical surface. We considered three types of selectivity (Fig. 7A): (1) near/far tuned responses (tuned), (2) near/far categorical responses (categorical), and (3) excitatory and inhibitory tuned responses at zero depth (zero-tuned). For each voxel, we regressed the selectivity models aligned to the preferred disparity of the voxel (Fig.  7B) against the ␤-weight response profile of the voxel. This provided us with a set of three weights and a constant for each individual voxel. We selected voxels with activity that was well captured by the regression approach (R 2 Ͼ 0.8), resulting in the selection of approximately half of the voxels in each ROI (45 Ϯ 5%). Of these selected voxels, we found that 65 Ϯ 10% were best described as tuned, 15 Ϯ 6% as categorical, and 20 Ϯ 10% as zero-tuned (mean Ϯ SD).
Psychophysical data and models of stereo-acuity suggest that the selectivity of perceptual disparity detectors varies with dispar-A B C Figure 5. Quantifying correspondence between disparity maps acquired in different imaging sessions. A, Correlation between peak disparity responses. Data show the median Pearson correlation coefficients between voxels' peak responses in Session 1 versus Session 2 obtained by bootstrapping (10,000 resamples). Error bars represent 68% confidence intervals and stars indicate bootstrapped significance at the p Ͻ 0.05 level. B, As in A except that the alignment between the data in the two sessions was improved using an additional alignment procedure. C, Information shared between maps acquired in different sessions. Bars represent the mutual information between maps, with error bars covering the 68% confidence interval. Horizontal lines represent the 97.5 percentile for a bootstrapped control distribution (10,000 estimates) calculated after randomly shuffling labels of peak disparity response.
ity magnitude: detectors sensing larger disparities are thought to have larger receptive field sizes and broader tuning widths (Lehky and Sejnowski, 1990;Stevenson et al., 1992). In the context of our fMRI measurements, this led us to hypothesize that presenting larger disparities would cause an increase in the proportion of voxels identified as "categorical" in cortical areas related to stereopsis. To test this idea, we ran an additional experiment with a larger range of disparities. Four participants undertook a third imaging session during which disparity was varied between 12 and 36 arcmin (crossed and uncrossed). We then used the modelbased analysis of the voxel responses to test whether estimated profiles were affected by the increase in disparity magnitude. Specifically, we pooled the data across subjects for each disparity range and computed the median weight of each model (across voxels). Using this approach, an increased representation of a particular model is demonstrated by an increase in the weight assigned to that model by the regression approach. We visualized the weight for each model in a radar plot with three axes, one for each response model (Fig. 7C), and found an increased representation of categorical responses at greater disparity magnitudes, especially in areas V3A and V3B/KO (Fig. 7C, cf. red vs blue lines; Fig. 7D, purple elements). In other words, a greater proportion of the voxels are best explained by the categorical model when the range of presented disparities was larger. This increase in weights for the categorical model was accompanied by small changes in the distribution of tuned and zero-tuned weights (Fig. 7D, green and orange elements).  Poggio et al. (1988). B, Schematic representation of our model-based approach. For each individual voxel, we assembled regressors based on the hypothetical responses for each model type given the peak disparity response of the voxel. After linear regression, we obtain three weights that approximate the contribution of each model for the response profile of individual voxels. C, Model weights at different disparity magnitudes. The median weights (across voxels) for each model are mapped onto a radar plot with three axis (one for each model). Blue lines represent data from the first two imaging sessions (pooled), during which disparity ranged from 3 to 15 arcmin. Red lines represent the distribution of weights observed at disparities ranging from 12 to 36 arcmin (third session). D, Difference in medians between the distributions illustrated in C for all ROIs. Bars represents the median difference (in medians) obtained by bootstrapping (10,000 resamples). Error bars represent 95% confidence intervals.
Having estimated the type of response profile exhibited by individual voxels, we next sought to determine whether there was any structure in the way in which these voxels are distributed across the cortical surface. In particular, we mapped the weights for each selectivity model onto a representation of cortical surface for each individual (Figs. 8,9). We found three important features in these maps. First, we observed clustering in the weight maps, indicating that nearby voxels share similar disparity response profiles (e.g., voxels described each model are clustered together on the cortex, as shown by colocalized saturated colors). Second, the cortical locations described by categorical versus tuned models appear to be distinct (Figs. 8,9; note the complementarity of the green vs purple maps within session). Third, the consistency across all sessions was particularly marked for categorical disparity processing model (cf. purple maps across sessions one to three, particularly evident in Fig. 8A, but also apparent for the other participants), with enhanced categorical representations in session three as expected from the wider disparity range. In contrast, for the tuned and zero-tuned disparity models, we only observed correspondence across the first two sessions in which exactly the same disparity levels were tested (Figs. 8B, 9A). This highlights the systematic organization of disparity representations, and makes clear that "tuned" responses are very sensitive to the exact disparity presented, whereas "categorical" responses show tolerance to the disparity value.
Our data analysis so far has used relatively simplistic models of disparity selectivity to describe the responses of individual voxels. Biases in the representation of these models indicated an increase in categorical responses when participants viewed stimuli at higher disparity magnitudes. Next, we sought to examine this relationship in greater detail by fitting physiologically inspired models of disparity selectivity. In particular, we investigated whether changes in voxel response width could be observed between groups of voxels preferring different disparity magnitudes.
For each ROI, we grouped voxels according to their peak disparity response (3, 9, 12, 15, 24, and 36 arcmin, crossed and uncrossed) and then fit a Gabor tuning profile models to each of these 12 groups. Figure 10A shows representative responses of voxels with maximal responses for Ϫ3, Ϫ12, Ϫ15, and Ϫ36 arcmin in three different ROIs. For the higher dorsal areas, we observe that the Gabor fit (black line) to the response profile is broader for large disparities than it is for small disparities. We quantified this using the SD parameter of the Gabor model, plotting this as a function of the peak response of the voxels (Fig.  10B). In early visual areas (V1, V2, V3d), we found that the width of voxels' response profiles was not related to the overall disparity magnitude. In contrast, in areas V3A and V3B/KO, we observed significant relationships between the peak disparity response and the width of the Gabor fit.
One potential concern with this analysis is that such a relationship may be a consequence of the differential spacing between the presented disparities in different imaging sessions; that is, the envelope width is broader because of the wider stimulus spacing. Ideally, we would have used a fine spacing of disparities across a broad range of disparity magnitudes, which would rule out such a concern. However, the practicalities of obtaining a sufficient number of fMRI measurements within a timeconstrained imaging session meant that we could not do this. Nevertheless, we judge it unlikely that stimulus spacing per se accounts for the relationship we observe. First, we did not observe significant correlations in V1 to V3d, suggesting that the increase in disparity spacing alone does not result in a relationship between the peak and SD parameters. Second, we can contrast profiles for overlapping points in this space (Fig. 10A): the width of the fit to the Ϫ15 tuned units (from sessions 1 and 2 with 6 arcmin stimulus spacing) is wider than for Ϫ12 (from Session 3 with 12 arcmin stimulus spacing).
Increases in detector tuning width for larger disparity magnitudes are thought to be characteristic of neural populations that underlie human stereoscopic judgments (Lehky and Sejnowski, 1990;Stevenson et al., 1992;Fig. 11A). Based on our fMRI measurements, we sought to test how well estimates of human neural population responses to disparity could account for depth discrimination thresholds. To this end, we built a population of disparity-tuned A B Figure 9. Cortical representation of model weights in areas V3A and V3B/KO in the left and right hemispheres of Participant 3 (A) and Participant 6 (B). This figure follows the format presented in Figure 8. A, Evident correspondence between "tuned" and "zero-tuned" weights was identified across the first two imaging sessions. Correspondence was not observed for the third session, where disparity magnitude was increased. B, Apparent correspondence was absent for Participant 6. Note that the voxel slice placement for Session 2 meant that we omitted coverage of a considerable portion of V3A and V3B/KO in the right hemisphere.
units based on the estimated (linear) relationship between Gabor parameters and disparity magnitude (Fig. 11B). Using these values suggested that V1 responses were unlikely to account for disparity discrimination judgments (Fig. 11B); however, estimated populations in V3A and V3B/KO produce discrimination threshold curves that are qualitatively similar to previously reported behavioral results (Fig. 11C,D;Badcock and Schor, 1985) and stereo acuity modeling ( Fig. 11A; Lehky and Sejnowski, 1990). Although the overall shape of the curves are similar, a closer fit would likely require testing a wider range of disparities to account for the flanks of the curves and denser sampling near the fixation point to capture the fine trough near zero disparity. Stevenson et al. (1992) used psychophysical measurements to describe the relationship between tuning width of the perceptual mechanisms (parameterized as the FWHM) and disparity magnitude. Using data extracted from that study and the linear relationship that they estimated, we plotted our fMRI estimates of voxel response width (FWHM) together with their data and estimated linear relationship (Fig.  11E). This suggests a striking similarity between perceptual and fMRI estimates of variations in the tuning of units that respond to binocular disparity. Together, these results suggest an intriguing analogy between activity in V3A and V3B/KO and depth judgments (consistent with Preston et al., 2009;Dövencioglu et al., 2013;Murphy et al., 2013).

Discussion
Here, we use 7 T fMRI to test whether human dorsomedial visual cortex contains systematic organized representations of binocular disparity. Using a series of computational analysis approaches, we report three main advances in understanding disparity organization in the human brain. First, we show that disparity responses are systematically organized (Figs. 3C,4,8,9) and, importantly, that these responses persist between imaging sessions (Figs. 4,5,8,9). Second, we observed differences between the local distribution of disparity responses in early and dorsomedial visual areas (Figs. 3C, 7, 10), suggesting different properties of cortical organization. Third, by modeling the responses of individual voxels, we show a relationship between tuning width and disparity magnitude (Fig. 10B), indicating more broadly tuned responses to larger disparities, which is consistent with psychophysical and modeling work that posits such a relationship as a characteristic property of neural populations involved in stereopsis (Fig. 11). Together, these findings indicate that human V3A and V3B/KO contain selective cortical structures that are likely to be important in stereoscopic depth processing.

Cortical organization of disparity preferences
Understanding of the cortical structures that support disparity processing is largely informed by recordings in the macaque brain. For example, by systematically assessing disparity preferences at locations across the cortical surface of area MT/V5, A B Figure 10. Voxel response profiles at different disparity magnitudes. We modeled voxel responses using Gabor filters and examined the relationship between the Gabor parameters and preferred disparity magnitude. A, Pooled voxel responses in areas V1, V3A, and V3B/KO modeled by Gabor filters for four preferred disparities (Ϫ3, Ϫ12, Ϫ15, and Ϫ36 arcmin). Gabor models were fit to sets of voxels sharing the same preferred disparity, resulting in 12 groups per ROI. Error bars represent the SD across voxels. B, Relationship between response profile width (SD of the Gaussian envelope) and peak disparity response for early and dorsal visual areas. Each datum represents a group of individual voxels that share the same disparity preference (1 of the 12 preferred disparities examined in our experiments: Ϯ3, 9, 12, 15, 24, and 36 arcmin). A significant positive trend between tuning width and disparity magnitude was found in V3A and V3B/KO, but not in earlier visual areas.
DeAngelis and Newsome (1999) demonstrated smooth changes in preferred disparity across the cortex that indicate systematic organization. More recent electrophysiological evidence indicates that other dorsal visual areas, including V3A, contain clustered representations of disparity (Anzai et al., 2011;Hubel et al., 2013). Based on previous human imaging work (Backus et al., 2001;Preston et al., 2008), dorsomedial visual cortex was likely to show strong responses to disparity-defined stimuli in human participants. We find evidence that similar disparities are clustered together, particularly in areas V3A and V3B/KO (Fig. 3C), and that correspondence between maps can be observed even when the presented disparities differ (Figs. 8, 9, categorical responses). Although we have concentrated on area V3A, it is interesting that our findings suggest cortical organization for disparity that is similar in its basic properties to macaque MT. While these two areas appear to have distinct functional properties for binocular disparity (Cottereau et al., 2011), they receive a large portion of inputs from common areas (Felleman and Van Essen, 1991). In particular, MT and V3A receive inputs from V2 and V3, where disparity organization has been described previously (Roe and Ts Although our imaging data suggest clustered responses, it is clearly not possible for us to infer that the underling organization is columnar. For example, it is possible that the persistent structures that we observe across sessions represent a coarser spatial bias in disparity responses rather than a periodic columnar structure. Nevertheless, it is encouraging that we observed a difference between dorsal visual areas and responses in primary visual cortex. This appears consistent with macaque electrophysiology in suggesting that V1 has only a weak tendency for clustering (LeVay and Voigt, 1988;Prince et al., 2002b).

Benefits and limitations of 7 T imaging for mapping
Our understanding of cortical organization to date is predominantly informed by animal models typically using neurophysiological and optical imaging methods that provide a high level of detail at the cost of invasiveness. Recent advances in UHF fMRI make it possible to investigate mesoscopic properties of the human cortex noninvasively (Cheng et al., 2001;Zimmermann et al., 2011). However, issues in the interpretation of neuroimaging data are usually introduced by potential biases from large vascular structures, which are poorly related to local A B C D E Figure 11. Population encoding mechanisms and stereo-acuity at different disparities. A, Distributed encoding model proposed by Lehky and Sejnowski (1990). A population of 17 nonuniform, largely overlapping units (left) produces a disparity discrimination curve (right) similar to stereo-acuity judgments made by human observers (Badcock and Schor, 1985). B, Interval encoding model derived from voxel response profiles in V1. A population of 17 units with uniform, narrow tuning produces a disparity discrimination curve uncharacteristic of the human visual system. C, D, A neural encoding model derived from the voxel response profiles in areas V3A and V3B/KO (left), and the simulated discriminative performance of these models (right). The performance of these models is more similar to the idealized patterns of psychophysical performance (part A) than a model derived from V1 activity (part B). E, Plot of disparity magnitude against detector tuning width based on psychophysical data published by Stevenson et al. (1992) and our fMRI estimates in V3A and V3B/KO. The trend line reproduces that fit by Stevenson et al. (1992), with blue data points representing their published data (their Fig. 7) as obtained by a "data thief" procedure implemented in MATLAB. Red data points represent fits from on our fMRI measurements. The dashed portion of the fit extends the line of best fit beyond the range of disparities tested by Stevenson et al. (1992).
cortical activity, and insufficient spatial resolution. Here, 7 T fMRI revealed that disparity preference representations are well clustered in human dorsal visual cortex. Although we may be able to map disparity preferences at lower spatial resolution, the increased BOLD CNR of 7 T is likely a requirement to do so. We have previously attempted to investigate disparity maps at 3 T, but without success (N.R.G., H.B., A.E.W., unpublished observations). By imaging at 7 T, the contributions from large vessels, which could mask the distribution of disparity responses, are reduced (Gati et al., 1997;Ogawa et al., 1998;Ugurbil et al., 2003). This gain in spatial specificity is the fundamental benefit for mapping genuine properties of neural subpopulations. Although our 7 T imaging improves spatial specificity, additional care is necessary to avoid the influence of large vessels, especially when mapping functional data onto cortical flattened maps: mapping large veins onto flat maps can result in the emergence of spurious structures unrelated to local activity. We therefore chose to sample functional activity predominantly from the central layers of the cortex to avoid large surface vessels Sánchez-Panchuelo et al., 2012) and improve spatial localization (Polimeni et al., 2010). This is particularly important given that we used a gradient-echo sequence, which is more susceptible to surface macrovascular contributions compared with spin-echo-based sequences (De Martino et al., 2013). In addition, we verified that regions where the mean BOLD amplitude was higher (which could derive from larger vessels) were not colocalized with coarser structures found in preference maps (Fig. 2B).
Finally, it is necessary to consider the possibility that clustering is enhanced by the PSF of the 3D GE-EPI sequence. The PSF of the BOLD signal can reach 2 mm in extent (Gaussian FWHM; Shmuel et al., 2007), meaning that voxel responses may be significantly influenced by the activity of their neighbors. However, it is unlikely that the PSF is a major barrier to the interpretation of our data. First, even a sequence with a broad PSF can be used to map cortical properties provided that the CNR is sufficient . Second, limiting our analyses to voxels from central layers of the cortex (i.e., away from large draining vessels on the cortical surface) is likely to have reduced the spatial spread of the BOLD response (Polimeni et al., 2010). Finally, our data point to differences in disparity clustering between visual areas (Fig. 3C), suggesting that our measurement approach has sufficient dynamic range that we can capture changes related to the underlying structure of the cortical organization.

Disparity selectivity and stereopsis
Models of human stereo-acuity have posited a relationship between the tuning width of disparity-sensitive units as a function of the magnitude of disparity; that is, neurons selective for fine disparities have smaller receptive fields, whereas units preferring coarser disparities have larger receptive fields ( Fig. 11A; Lehky and Sejnowski, 1990). Psychophysical measurements support this conclusion (Stevenson et al., 1992). In our study, we found that the population-estimated responses in human V3A and V3B/KO follow this relationship and a model based on fMRI estimated tuning widths as a function of presented disparity is able to discriminate disparities in a manner similar to the human visual system. This is captured by the slope of the disparity discrimination curves between small and large disparity magnitudes for V3A and V3B/KO (Fig. 11C,D), resulting in greater stereo-acuity for fine rather than coarse disparities. Conversely, a population with invariant tuning properties produces nearly constant disparity dis-crimination thresholds, implying constant stereo acuity for a wide range of disparity magnitudes (Fig. 11B) To examine disparity responses, we used well defined tuning templates to group voxels according to their response type (e.g., tuned vs categorical). These templates can be seen as simplifications of the tuning classes suggested by Poggio et al. (1988) more than two decades ago. Since then, it has been suggested that disparity selectivity is better described by Gabor models with a (continuous) parameter space that explains previously posited discrete types of disparity tuning (Prince et al., 2002a). When we used Gabor models to describe the voxel responses for each disparity level, we found that changes in the envelope width along the disparity domain can be well approximated by a linear function ( Fig. 11E; consistent with Stevenson et al., 1992), suggesting that tuning width varies gradually with disparity magnitude (at least within the range we have tested).

Conclusion
Using 7 T fMRI, we show here that human dorsal visual areas contain systematically organized structures for disparity processing. The responses of these structures vary with disparity magnitude, which aligns well with previous quantifications of stereoscopic perceptual judgments. Together, our results suggest that areas V3A and V3B/KO contain selective, organized structures that support stereoscopic processing in the human brain.