Abstract
Contrary to the conventional assumption that humans perceive shapes of rigid objects as constant despite retinalimage variations caused by changes in orientation and position, we show that the depths of threedimensional (3D) textured shapes appear to vary when the image is rotated. In paired comparisons of static stimuli, depth amplitude was perceived to be greater at vertical than at oblique orientations. A similar oblique bias was found for simple twodimensional (2D) obtuse angles. Using projective geometry to link angle magnitude to the orientation flows that convey 3D shape from texture cues, we show quantitatively that the 2D bias predicts the 3D bias. Our finding that perception of angular magnitude is dependent on orientation has broad implications for shape constancy because orientation flows have also been implicated in 3D perception from reflections and shading, and contour curvature is fundamental to uncovering depth and partstructure of shapes. We examined the roles played in the observed biases by anisotropies in numbers and tuning widths of orientationtuned cells in striate cortex as well as the distribution of oriented energy in natural scenes. An optimal stimulus decoding model for 2D angles revealed that the narrower tuning of cells for horizontal orientations combined with crossorientation inhibition explains the orientationdependent angle distortion and hence the 3D shape inconstancy. Variations in properties within neural populations can thus have direct effects on visual perceptions and need to be included in neural decoding models.
Introduction
Shape is the geometrical attribute of an object that is invariant to location, rotation, and scale effects (Kendall et al., 1999). The ability to perceive the shape of a rigid object as constant, despite differences in the viewing angle, has been considered an essential component of representing objects in the visual world accurately. Because the visual system cannot generally discount perspective distortions, shapes of certain classes of threedimensional (3D) objects are not perceived as constant across viewpoints (Pizlo and Stevenson, 1999; Griffiths and Zaidi, 2000), but shape constancy is expected to hold for simple shapes under twodimensional (2D) rotations of the image plane (Lawson, 1999). In striate cortex, cells tuned to orientation are sampled unevenly, with greater concentration, as well as narrower tuning, near horizontal and vertical (Mansfield, 1974; Li et al., 2003). These anisotropies raise questions about whether shape constancy can survive image rotations. This report describes two new empirical findings about the orientation dependence of visual shape perception and uses an elaboration of cortical stimulus decoding models to test whether variations in properties within neural populations have significant effects on visual perceptions.
Figure 1 depicts four shapes that appear triangular in depth because of texture cues. The concave and convex 3D wedges with vertical axes (top) appear deeper than the corresponding wedges with oblique axes (bottom), especially when viewed monocularly. However, when the page is rotated counterclockwise by 45°, the bottom shapes appear deeper than the top. The bottom images are simply rotated copies of the top images, revealing that perceived depth depends on shape orientation. Surprisingly, the same violation of constancy also occurs when viewing a single slowly rotating textured wedge, despite cues to object continuity.
Li and Zaidi (2000, 2004) showed that the perception of 3D shapes from texture cues depends critically on the orientation modulations around the axis of maximum curvature. When these orientation flows are visible, distinct patterns of orientation modulation automatically determine perceived signs of curvatures and directions of slants. When the flows are not visible, texture gradients such as density and spatial frequency are not sufficient to convey even a qualitatively accurate shape in many conditions. The critical orientation flows, visible in Figure 1 and highlighted in Figure 3a, form obtuse angles that bow inward in the center of the perspective image of the concave wedge and bow outward in the center of the image of the convex wedge. Changes in the magnitudes of angles above and below eye height determine the perceived depth. This finding was used to test whether the 3D inconstancy results from anisotropy in perception of 2D angles.
Materials and Methods
Experiment 1: failures of 3D shape constancy.
To quantify the orientation dependence of depth perceptions, two 3D wedges (one oriented at 45° and the other oriented at 90°) were viewed successively in random order, and observers identified which appeared greater in depth. 3D wedges were formed by simulating a texture pattern folded into a depth triangle and projected in perspective. Wedges varied in depth within a circular outline 8° of visual angle in diameter and centered on a dark screen. The grayscale texture pattern consisted of 21 sinewave gratings at three spatial frequencies (1, 3, and 6 cycles per degree) and seven orientations (0, ± 22.5, ± 45, and ± 67.5°) with respect to the axis of maximum curvature. The contrast of each grating was 1/21 so that the contrast of each texture pattern was ∼1.0. Ten different texture patterns were generated for each shape by adding the gratings in random phases. By rotating the orientation of the 3D axis, the same stimuli were used for 90 and 45° presentations.
Stimuli were presented in a dark room to minimize frame effects, centered at eye level, and viewed monocularly at 1.0 m. To equate orientation expectations, a central fixation dot and two reference dots were displayed as axis orientation cues. A trial (Fig. 2a) consisted of the following progression: (1) fixation dot (500 ms), (2) axis orientation cue 1 (500 ms), (3) test shape 1 (500 ms), (4) axis orientation cue 2 (500 ms), (5) test shape 2 (500 ms), and (6) fixation dot (until response). The observer's twoalternative forced choice task was to indicate by button press which shape was greater in depth.
Vertical shapes were compared in depth to oblique shapes for 25 convex–convex and 25 concave–concave comparison conditions. Vertical shapes of each of five depths, equivalent to 1, 1.5, 2, 2.5, or 3° of visual angle at the presented distance, were compared with oblique shapes of the same five depths. Each observer ran 20 trials of each condition, presented in random order.
Four paid naive observers and author E.H.C. participated. All had normal or correctedtonormal vision and exhibited no more than minimal levels of astigmatism (<0.5 cylinders). All observers were given extensive training in the experimental task.
Experiment 2: failures of 2D angle constancy.
In experiment 2, using the same observers, procedures, stimulus extents, and statistical analyses as in experiment 1, we tested whether there is a corresponding anisotropy in the perception of obtuse 2D angles when angles symmetric around 90° are compared with angles symmetric around 45° (Fig. 3b,c).
Test angles were white lines against a gray circular background. Aliasing was equated in the two orientations by adding noise to all lines through repeated image rotation. Angles were presented in one of two ways, either intersecting the fixation dot or set above it, and were randomly varied to point either toward the top or bottom of the screen. Trial sequences were identical to experiment 1, but with 2D oblique angles as the test stimuli. The observer's task was to indicate which angle was sharper.
Results
Experiment 1: failures of 3D shape constancy
Psychometric functions (Wichmann and Hill, 2001) were fit to each observer's data, and the 50% points estimated the depths of vertical wedges judged subjectively equivalent (PSE) to fixed depths of oblique wedges. The average ratios of subjectively equivalent vertical to oblique depths for convex and concave shapes were nearly equivalent, 0.766 (SE, 0.006) and 0.781 (SE, 0.010), respectively. Data from concave and convex trials combined (Fig. 2b) show that almost all PSEs fall below the unit diagonal (mean, 0.771; SE, 0.007). Physically identical shapes were perceived as deeper when oriented vertically than when oriented obliquely; consequently, 3D shapes are not perceived as constant even across rotations in the image plane.
Experiment 2: failures of 2D angle constancy
Figure 3d shows that all PSEs fall below the unit diagonal (i.e., 2D angles were perceived to be sharper at vertical than at 45°). The average subjectively equivalent vertical angle was 4.5° (SE, 0.38°) shallower than the oblique angle. Consequently, 2D angles are not perceived as constant across plane rotations.
The relationship between 2D angles and 3D curvature
For a 3D wedge at distance d with depth a and width 2w, the 2D slope s of the critical orientation flow, at height y above or below eye height, is equal to ay/wd (Li and Zaidi, 2004). Because distance and width are the same for the two orientations, the equation s_{vert}/s_{oblq} = a_{vert}/a_{oblq} holds for all y. The average ratio of perceptually equivalent 2D slopes, calculated from the PSEs for 2D angles, was 0.862 (SE, 0.001), similar to the ratio of perceptually equivalent 3D depths (0.771; SE, 0.007), suggesting that 3D depth inconstancy can be explained by anisotropy in perception of 2D features. The small discrepancy between ratios may reflect greater accuracy with the less noisy angle stimuli, or interactions between angles in the complex shape stimulus.
Perceived angles decoded from cortical responses
Having traced 3D perceptual anisotropy to an oblique bias for 2D angles, we used a probabilistic stimulus decoding model (Sanger, 1996) to test whether this 2D bias could be explained by anisotropies in numbers or tuning widths of cortical cells tuned to different orientations (Li et al., 2003), or the anisotropic distribution of oriented energy in images of natural scenes (Hansen and Essock, 2004). We first derived the probabilities of numbers of spikes from individual orientationtuned cells in response to an angle stimulus. Using Bayes' formula, we then decoded the most probable angle given the population response. To compare the predictions of the model with the experimental measurements, we assumed that the observer perceives an angle equal to the optimally decoded angle.
Population decoding model
To show how we decode angles from population responses, we begin with decoding the orientation of a single line. We make the assumption that a cell tuned to the orientation θ_{i} gives an action potential in response to any orientation θ with the following probability: where f_{i}(θ)is the tuning curve of the cell with respect to θ (i.e., the average firing rate for each value of θ). If the probability of firing is a constant within time intervals of the same length, then the firing is governed by a Poisson distribution. The probability of the cell tuned to θ_{i} firing n spikes is given by the following: If neurons can be treated as independent (i.e., all correlated firing is attributable to overlap in receptive fields or tuning curves), then the probability of n_{i} spikes each from a population of k cells is given by the product of the probabilities: Using Bayes' formula, the optimal estimate of the stimulus can be decoded from the population response: where P(θ) is the prior probability distribution of orientations in natural scenes and C is a combination of terms that are not functions of θ. The maximum of the posterior probability distribution provides an optimal estimate of the stimulus orientation θ given the population response (Box and Tiao, 1992). It is computationally easier to use the log of the probability distribution (Jazayeri and Movshon, 2006): If there are a considerable number of cells d_{i} tuned to each orientation θ_{i}, and if the average responses of these cells are n̄_{i}, then the total response for each orientation is d_{i}n̄_{i}, and Equation 5 can be grouped within m orientations as follows: Because the average responses are given by the values of the tuning curves, Equation 6 can be used to make predictions without running probabilistic simulations.
We now consider angles Ω composed of two lines, θ_{p} and θ_{q}. The orientation tuning functions of cells in striate cortex have been fit satisfactorily by circular Gaussians (Swindale, 1998). However, when more than one orientation is present in the receptive field of a cell, the response to a preferred orientation can be reduced even by an orientation to which the cell does not respond (Morrone et al., 1982). The tuning width of the crossorientation inhibition is considerably broader than that of the orientation tuning of the cell, but more than one mechanism may be involved and estimates of the extent vary (Bonds, 1989). In the model, we match the excitatory orientation tuning, g_{i}(θ), to average widths for cat striate cortex (Li et al., 2003) (i.e., halfheight widths varying from 29° for horizontal orientations to 38° for oblique orientations). We explored widths of the crossorientation inhibition, h_{i}(θ), ranging from two times the excitatory width to broader. For each orientationtuned cell, we derived a matrix valued tuning function for angles: where ε is a tiny constant used to make F_{i}(Ω) allpositive as is required for the Poisson distribution in Equation 2. We could not find published data giving frequencies of a sufficient sample of angles P(Ω) in natural scenes, so using P(θ), the prior probability distribution of orientations in images of natural scenes (Hansen and Essock, 2004), for all combinations of orientations θ_{p} and θ_{q}, we made the following rough approximation: Angles can be decoded from the population responses of orientationtuned cells using an equation analogous to Equation 6: We took estimates of the distribution of orientationtuned cat striate cells (Li et al., 2003) for d_{i}. Using n̄_{i} equal to average responses to the stimulus angle, we found that decoded angles differed from stimulus angles.
Our psychoneural linking hypothesis is that the optimal estimate of the perceived angle is the angle at which the posterior probability is maximum. Figure 4 presents the posterior probability distribution functions of angles in the neighborhoods of vertically and obliquely oriented angles of 140° for one choice of inhibitory tuning width. The decoded vertical angle is 138°, and the decoded oblique angle is 142°, so the oblique bias of 4° is similar to the empirically measured bias. For the range of angles used in experiment 2, the decoded oblique angles were wider than the decoded vertical angles. In numerous simulations, as long as the anisotropy in the excitatory bandwidths and a constant ratio of excitatory to inhibitory tuning widths was maintained, the oblique angle was decoded as broader than the vertical angle. The model will thus make correct predictions even if tuning widths of human cells differ from cat cells. The anisotropy in numbers of cells, maximum at horizontal, and 28% less at oblique (Li et al., 2003) tended to pull the posterior estimates of the arms of the angles toward the horizontal, creating a bias in perceived angles that is opposite to the empirical results but weaker than the bias resulting from tuningwidth anisotropy. The posterior probability was insensitive to the prior probability of image angles (i.e., a uniform prior led to the same predictions as the empirical frequency distribution). The model thus shows that the narrower tuning of cells for horizontal orientations and inhibitory orientation effects explain the orientationdependent angle distortion and hence the 3D shape distortion.
Justifications, alternatives, and implications of the model
To obtain the decoding solution, a number of assumptions were made: (1) shapes of orientation tuning curves are not constrained but are assumed to be invariant to signal strength, based on orientation tuning curves in V1 being contrast invariant (Sclar and Freeman, 1982); (2) the variation in firing rates of cortical neurons is described by Poisson statistics, but more general Poissonlike exponential distributions would suffice (Ma et al., 2006); and (3) the assumption that responses of cells tuned to different angles at different orientations are independent leads to a simple Bayesoptimal solution, but noise in the cortex is correlated across cells. However, a decoding model incorporating the structure of neural correlations (Ma et al., 2006) requires empirical estimates that do not yet exist, and the relatively constant variability observed across cortical stages suggests that noise correlations may be propagated by downstream neurons (Shadlen and Newsome, 1998). In addition, a neural correlation function based on similarities between preferred stimuli changes the variance of the likelihood function but not measures of central tendency (Jazayeri and Movshon, 2006).
Equation 7 provides a way to simulate cells sensitive to specific angles at specific orientations and could be used to predict tuning curves for such cells in V2 and V4 (Pasupathy and Connor, 1999; Ito and Komatsu, 2004). This equation takes into account crossorientation inhibition between V1 cells, so although responses of angleselective cells are assumed to be independent in the model, V1 cells are not. V1 cells also have anatomical links to cells with cooriented and coaxially aligned receptive fields (Bosking et al., 1997). Such longrange excitatory connections could facilitate the extraction of curved contours (BenShahar and Zucker, 2004). Because there are more cooriented cells available for connections to horizontal cells than to oblique cells, if cooriented excitation replaces crossorientation inhibition in the model, it enhances the effect of anisotropy in numbers of cells, which leads to predictions that overestimate the vertical angle and underestimate the oblique angle (i.e., distortions opposite to the observed perceptual bias). A stronger divisive gain for horizontally tuned cells than for obliquely tuned cells (Hansen and Essock, 2004) attenuates the effect of the anisotropy in numbers, but a multiplicative gain cannot reverse the effect, so crossorientation inhibition is still needed to account for the observed perceptual bias.
This model can be viewed as a formal embodiment of Mach's (1914) idea of “contrast in directions,” which he proposed as an explanation for why obtuse angles tend to appear contracted and acute angles tend to appear expanded (Wundt, 1862). In the simulations, we found that contraction of obtuse angles is not a general result of orientation contrast, as presupposed by Blakemore et al. (1970), but occurs only for certain relative widths of excitation and inhibition. Mach's (1914) second explanation for angle distortions invoked the projective tendency of acute angles in the image to originate from 3D angles that are greater than their projections and obtuse projections to arise from smaller 3D angles (quantified by Nundy et al., 2000). To explain our results, this hypothesis requires that the 3D angles in the world that project to oblique obtuse angles be wider, on average, than the 3D angles that project to vertical obtuse angles. We used Equation 8 as an approximation to the frequency of image angles in natural scenes. The model was insensitive to P(Ω). It is likely that tuningwidth anisotropies will also explain Bouma and Andriessen's (1970) result that the magnitude of the induced effect on the perceived orientation of a line segment depends on the orientations of the inducing and test lines.
Discussion
Perhaps because investigations of oblique effects concentrated on detection and discrimination (Appelle, 1972; Cohen and Zaidi, 2007) or memory of oriented information (Essock, 1980), the oblique bias for angles remained undiscovered. The oblique bias has direct consequences for a variety of shape and space constancies. First, it suggests that 2D shapes defined by contours will also not be perceived as constant across axis orientations. In addition, contour curvature is known to be fundamental to uncovering depth (Koenderink, 1984) as well as representation of the partstructure of 2D and 3D shapes (Hoffman and Richards, 1984; Cohen and Singh, 2006), and cells in area V4 have been shown to be selective for angles and curves in particular orientations (Pasupathy and Connor, 1999). Our finding that perception of even a single angle is dependent on image orientation has broad implications for object–shape perception.
The link between cortical processing and 3D shape inconstancy revealed by this study relied on the previous identification of 2D orientation flows critical for 3D perception from texture cues (Li and Zaidi, 2000, 2004; Knill, 2001). Because orientation flows have also been implicated in 3D perception from reflections and shading (Breton and Zucker, 1996; BenShahar and Zucker, 2001; Fleming et al., 2004), it is likely that shapes defined by these cues also show orientation dependence. The result that the perceived shape of a simple 3D stimulus is not invariant to rotation of the image plane suggests that object identity computations overcome shape distortions caused by neural processing.
As yet, there has been no electrophysiological examination of how outputs of V1 cells are combined by extrastriate cells to extract orientation flows. The 3D orientation dependence revealed here demonstrates that neural extraction of orientation flows is more than simple combination of oriented energy and involves crossorientation inhibition, some of which may be attributable to compressive contrastresponse functions in the lateral geniculate nucleus (Li et al., 2006; Priebe and Ferster, 2006) or even in ganglion cells.
The decoding model shows that the anisotrotropic distribution of orientation tuning widths of cells creates orientationdependent 2D and 3D shape distortions. Population decoding models generally ignore anisotropies in distributions of properties of neurons to simplify the expressions for the likelihood or posterior probability by omitting the equivalent of the last terms of Equations 6 and 9 (Jazayeri and Movshon, 2006). Our results show that it is essential to include distributions of neural properties, and we present a method that links these variations to the decoding of complex 2D and 3D stimuli.
Footnotes

This work was supported by National Eye Institute Grants EY07556 and EY13312 (Q.Z.).
 Correspondence should be addressed to Dr. Qasim Zaidi, Vision Science Department, State University of New York, College of Optometry, 33 West 42nd Street, New York, NY 10036. qz{at}sunyopt.edu