Abstract
Size-invariant object recognition—the ability to recognize objects across transformations of scale—is a fundamental feature of biological and artificial vision. To investigate its basis in the primate cerebral cortex, we measured single neuron responses to stimuli of varying size in visual area V4, a cornerstone of the object-processing pathway, in rhesus monkeys (Macaca mulatta). Leveraging two competing models for how neuronal selectivity for the bounding contours of objects may depend on stimulus size, we show that most V4 neurons (∼70%) encode objects in a size-invariant manner, consistent with selectivity for a size-independent parameter of boundary form: for these neurons, “normalized” curvature, rather than “absolute” curvature, provided a better account of responses. Our results demonstrate the suitability of contour curvature as a basis for size-invariant object representation in the visual cortex, and posit V4 as a foundation for behaviorally relevant object codes.
SIGNIFICANCE STATEMENT Size-invariant object recognition is a bedrock for many perceptual and cognitive functions. Despite growing neurophysiological evidence for invariant object representations in the primate cortex, we still lack a basic understanding of the encoding rules that govern them. Classic work in the field of visual shape theory has long postulated that a representation of objects based on information about their bounding contours is well suited to mediate such an invariant code. In this study, we provide the first empirical support for this hypothesis, and its instantiation in single neurons of visual area V4.
Introduction
The ability to recognize objects regardless of scale is a fundamental feature of primate vision, and is thought to be mediated by neurons at the final stages of cortical form processing that encode objects in a size-invariant manner (Schwartz et al., 1983; Desimone et al., 1984; Gross et al., 1993; Sáry et al., 1993; Ito et al., 1995; Logothetis and Sheinberg, 1996; Tanaka, 1996; Hikosaka, 1999; Brincat and Connor, 2004; Liu et al., 2009; Rust and DiCarlo, 2010). Here, size-invariant coding refers to neuronal tuning that is independent of stimulus size; size may modulate the response magnitudes of neurons but not their stimulus preferences. It remains unclear, however, how this tuning invariance emerges across successive stages of cortical processing. To probe its basis in the primate cortex, we studied the scale dependence of neuronal tuning for object shape in visual area V4, an area enriched with shape-selective neurons, and the major source of feedforward inputs to the inferotemporal cortex (IT), where size-invariant signals have been reported.
Many V4 neurons are selective for the shape of local contour segments along an object's boundary; convex or concave segments at specific positions relative to object center (Pasupathy and Connor, 1999, 2001). To determine their putative contributions to invariant object representation and recognition, we asked whether V4 neurons sensitive to boundary form encode their preferred contour segments in a size-invariant or size-dependent manner. When an object is scaled, its shape is preserved but the curvature of its bounding contour changes; for example, the curvature of a circle halves when its radius doubles. Thus, a neuron that encodes objects in terms of local contour curvature will show systematic changes in stimulus preferences across size, and thus cannot mediate a size-invariant representation of objects. This raises the question of whether V4 neurons that have been previously reported to signal boundary form are indeed selective for absolute curvature, a size-dependent parameter defined as the rate of change of the tangent angle per unit of contour length (Fig. 1A), or whether they might be selective for a size-invariant transform of curvature. For example, curvature can be rendered size-invariant when normalized by object size; curvature defined as the rate of change of the tangent angle per unit of angular length yields a size-invariant representation of boundary form (Fig. 1A). Thus, a neuron that encodes objects in terms of curvature that is normalized by stimulus size, ie, normalized curvature, can mediate a size-invariant representation of objects.
Previous work has demonstrated that the selectivity of V4 neurons for local contour segments can be explained qualitatively and modeled quantitatively in terms of curvature (Pasupathy and Connor, 2001). However, because these measurements were based on neuronal responses to stimuli at a single scale, their results cannot address whether V4 neurons encode these contours in terms of absolute or normalized curvature. Here, using a set of visual stimuli in which we systematically varied the curvature and scale of object contours within individual neuronal receptive fields (RFs), we determined which of these two coding schemes, absolute or normalized curvature, provided the best quantitative account of contour-based object representation in area V4. In so doing, our work provides new insights into the neural foundations of invariant object recognition in the primate cortex.
Materials and Methods
Neurophysiology
Extracellular neural recordings were performed in two adult male rhesus macaques (Macaca mulatta) using epoxy-coated tungsten microelectrodes (250 μm, FHC) lowered into cortex through an 8-channel acute Microdrive system (Gray Matter Research). Voltage signals were amplified and bandpass filtered (0.1–8 kHz) using a 16-channel recording system (Plexon Systems); the waveforms of single units were isolated manually using spike-sorting software (Plexon Systems). Recording chambers were centered on dorsal area V4 along the prelunate gyrus, extending between the lunate sulcus and the superior temporal sulcus. RFs were in the contralateral, inferior visual quadrant at 3°–12° of eccentricity (median 5°), and were 4°–11° in diameter (median 6°). We recorded data from 186 well isolated neurons in total (82 and 104 from Monkeys 1 and 2, respectively); data from 80 of these neurons are included in this report (50 and 30 from Monkeys 1 and 2, respectively). Our inclusion criteria were as follows: (1) that data from at least five repeats of the stimulus set were collected (5–16 repeats; median 10); and (2) that the neuron was visually responsive and selective for shape stimuli: to be included, a neuron had to respond to one or more stimuli with a firing rate at least four times baseline (across neurons, the maximum response was, on average, 20 times baseline). Most neurons excluded from our analyses were removed due to insufficient number of stimulus repeats (18 and 69 neurons from Monkeys 1 and 2, respectively, thus sparing data from 64 and 35 neurons from each subject). Of the remaining neurons, a smaller subset was excluded due to weak shape selectivity (14 and 5 neurons from Monkeys 1 and 2, respectively). All experimental procedures conformed to NIH and USDA guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Washington.
Visual stimulation
Stimuli were presented on a gamma-corrected CRT monitor (ViewSonic VS11135) positioned directly in front of the animal (at 49 and 53 cm for Monkeys 1 and 2, respectively); the display had a resolution of 1600 × 1200 pixels, a refresh rate of 97 Hz and a mean luminance of 5.4 cd/m2. Animals were required to passively view the visual stimuli while maintaining fixation of a 0.1° central target within a window of radius 0.75°; eye position was monitored using an infrared tracking system (Eyelink 1000, SR Research; 1 kHz). Stimuli were presented for 300 ms each with an interstimulus interval of 200 ms, and were randomly interleaved. On a subset of trials, no stimulus was presented to allow for the measurement of baseline activity. Stimulus and behavioral events were controlled by custom, Linux-based software written in Python (Pype, originally developed by J. L. Gallant and J. Mazer).
We used parametric shape stimuli with systematic variations along one portion of the bounding contour (Fig. 1A), some with convex projections and others with concave indentations, herein referred to as “convex” and “concave” shapes, respectively (Fig. 1B). Each convex and concave shape was presented at 8 rotations (0°–315°; 45° steps). Most neurons (65/80) were tested with all 13 shapes, as shown; the remaining neurons (15/80), recorded in our earliest sessions, were tested with a smaller subset of nine shapes that excluded the two most extreme convex and concave shapes. In the “scale test” (Fig. 1C), we presented all the shapes at several scales within the neuron's RF; stimulus scales were expressed as fractions of RF size (typically 0.4, 0.6, 0.8, and 1.0×). Most neurons were tested with four or five stimulus scales (41/80 and 36/80, respectively; additional scale was 0.2×); a few neurons, recorded in our earliest sessions, were tested with three scales (3/80). In the “position test” controls (Fig. 1D), only convex shapes at a single stimulus scale (0.8×) were presented at locations that matched the positional shifts induced by scaling. Trials of the scale test and the position test were randomly interleaved in the same experimental sessions to ensure identical recording conditions.
Initial characterization of units
For each neuron recorded, we first identified the shape, rotation and color of a stimulus that evoked a strong neuronal response. We then used this stimulus to obtain a qualitative and quantitative estimate of the neuron's spatial RF. For most neurons recorded (56/80), we mapped the RF by presenting the chosen stimulus at a grid of locations centered on the qualitative estimate. The number of grid locations tested depended on the qualitative estimate of RF size: it ranged from 7 × 7 to 10 × 10, and was typically 8 × 8. We fit the resulting response surface with a 2D Gaussian function, with the major and minor axes constrained to be the same, and used the fit's center coordinates and SD as quantitative estimates of the RF position and size, respectively. The RF radius was largely guided by the SD from the quantitative mapping procedure (RF radius = 2 × SD), but also by the expected RF radius at the corresponding eccentricity, in accordance to published measurements (Gattass et al., 1988). For neurons for which we did not map the RF quantitatively (24/80), we ensured that there were no visually evoked responses beyond the qualitatively estimated RF. For all neurons recorded, the largest scale tested was always within the RF. After this initial characterization, we proceeded to the main experiment in which we presented shape stimuli at several scales and positions within the RF (Fig. 1C,D).
Analysis of neural data
For each neuron recorded, and for each stimulus condition, we computed the firing rate over the entire stimulus presentation time (0–300 ms), averaged across stimulus repeats. These responses formed the basis for several analyses as described below. Baseline responses were computed from trials in which the stimulus was absent; because these responses were typically low (median 1.6 ips), we based our results on average neuronal responses without baseline subtraction.
Optimal stimulus rotation.
Given that V4 neurons are selective for the position of a preferred contour segment relative to object center (see Figs. 5, 6), we first identified the optimal stimulus rotation for each neuron as that which included the peak response. For almost all neurons (90%), this rotation also showed the largest dispersion of responses at a particular stimulus size, where dispersion was computed as the variance-to-mean ratio of responses. This procedure allowed us to identify a set of shapes that evoked strong and differential responses from each neuron. We then quantified the change in the neuron's tuning curve as a function of stimulus size by computing the tuning curve centroid.
Tuning curve centroid.
To assess whether neurons changed their stimulus preferences as a function of size, we identified the average preferred stimulus at each scale tested. We computed the centroid of the neuron's tuning curve at the optimal stimulus rotation, and only for the responses to either the convex or concave shapes, whichever had the highest cumulative response. The tuning centroid was computed as follows:
where ri is the response to the ith shape, and xi is the position (ie, numerical order) of the ith shape, and N is the number of stimuli. For this analysis, we only included stimulus scales for which the neuron was visually responsive, ie, scales where the average response to at least one shape exceeded the baseline response. For each neuron, we then examined whether the tuning centroids change across scales (see Fig. 3), performing a linear regression and extracting the slope as a metric for the change in stimulus tuning.
Separability index.
To assess whether neuronal selectivity was independent of stimulus size, we computed a separability index based on singular value decomposition of the tuning curves at all scales (Peña and Konishi, 2001; Mazer et al., 2002), as follows:
where λ(1) and λ(i) are the magnitudes of the first and ith singular values, respectively, and N is the total number of scales tested. Index values are bound between 0 and 1: with 0 indicating complete lack of a separable component and 1 indicating full separability. We computed the separability index in two ways: for tuning curves at only the optimal stimulus rotation and at all rotations. We also computed a related quantity, the size consistency index, which has been used previously to assess size invariance in IT cortex, and which yielded identical results (Brincat and Connor, 2004; Rust and DiCarlo, 2010).
Model fitting.
We identified each neuron's preferred contour segment by fitting the curvature model, as detailed previously (Pasupathy and Connor, 2001). Briefly, we modeled the neuron's shape preferences at a single stimulus scale (typically 0.8× RF size, except for 8 neurons that responded most strongly to stimuli at the smallest scale) as a function of several parameters of contour shape. In this model, the neuron is operationalized as a filter in a multidimensional shape space defined by curvature and angular position.
Each shape was decomposed into eight contour segments, representing approximately equal proportions of the overall contour length (for details, see Pasupathy and Connor, 2001); these included the different convex projections and concave indentations along the boundary. Each segment was characterized by two parameters: angular position and curvature. Curvature was computed as the rate of change in tangent angle with respect to contour length. Following previous work, we used a squashing function to map raw curvature values onto a scale from −1.0 (sharp concave) to +1.0 (sharp convex), as follows:
where c and craw represent the magnitudes of squashed and raw curvature, respectively, and α determines the slope of the sigmoidal squashing function. We allow α to vary across neurons in the current study, instead of keeping it fixed, to account for different neuronal sensitivities to curvature.
A neuron's predicted response r to a particular shape described by p boundary segments is given by the product of N-Gaussians, N being the number of stimulus dimensions contributing to the response:
where k is the amplitude of the multidimensional Gaussian, and Gi is the Gaussian along the ith dimension of stimulus space (eg, curvature, adjoining curvature, etc), of the form:
where xip represents the value of the stimulus parameter along the ith dimension, and μi and σi are the mean and SDs of the Gaussian, respectively. As done previously (Pasupathy and Connor, 2001), we implemented a version of the curvature model, which incorporated four dimensions of shape space, as follows:
where Ga is a von Mises probability density function along the dimension of angular position, and Gc, Gccw, and Gcccw are Gaussians along the dimensions of primary curvature, adjacent contour curvature, clockwise, and counterclockwise, respectively.
We searched for the best fitting model parameters using a nonlinear least-squares optimization routine, implemented in MATLAB (lsqnonlin). The model included 10 parameters (k, α, and several μi and σi terms). Depending on the number of shapes tested (N = 9 or 13), these parameters were used to fit a total of 65 or 97 responses, where the number of responses was equivalent to [(N − 1 shapes × 8 rotations) + (1 circle shape × 1 rotation)]; the circle shape was only shown at a single rotation due to its rotation invariance. To avoid overfitting, we used a cross-validation approach, training the model with 75% of the data and testing the model on the remaining 25%. The model's goodness-of-fit was computed as the average Pearson's correlation coefficient between observed and predicted responses for all iterations of the cross-validation procedure (N = 50 iterations). The best fitting model was chosen to have the highest cross-validated performance on both the training and testing datasets; this was the model most commonly returned by different iterations of the cross-validation procedure.
Model predictions.
We used the neuron's best fitting model parameters to generate predicted responses to stimuli at other scales. To generate the absolute curvature model predictions, we first calculated the raw curvature values at each stimulus size by scaling the curvature value by an amount proportional to the inverse of stimulus size; for example, a convex contour segment with a curvature value of +0.5 at scale 0.8× corresponds to a curvature value of +1.0 at scale 0.4×. To generate the normalized curvature model predictions, we used the same set of raw curvature values to fit the responses to stimuli at each scale; ie, we did not scale the raw curvature values for different stimulus sizes. Importantly, the equations for the curvature model, shown earlier, governed the responses of both competing models (absolute and normalized curvature), with the only difference being how the curvature parameter c changes as a function of stimulus size; c scales inversely with size in the absolute curvature model but it does not scale with size in the normalized curvature model. To account for possible response gain modulations across scale transformations, we fit a scaling parameter to the observed responses at each scale.
Partial correlations.
To assess which of the two competing models (absolute or normalized curvature) provided a better account of the observed neuronal responses at different scales, we compared the performance of the two models directly using partial correlation analyses, as was done previously to classify the responses of MT neurons to the motion of complex patterned stimuli (Movshon et al., 1985; Smith et al., 2005). For each neuron, we generated two response predictions: an absolute curvature prediction and a normalized curvature prediction, as described earlier. We then computed the partial correlations between the observed responses and each prediction, and transformed these correlations into normal deviates using Fisher's r-to-Z transform. In accordance with previous studies (Movshon et al., 1985; Smith et al., 2005), the r-to-Z transform was normalized by the degrees of freedom (df), equivalent to the number of independent samples that contribute to the neuronal response, minus three. In our case, the number of independent samples was the number of shapes multiplied by the number of stimulus scales tested; ie, df was computed as follows: [(number of shapes × number of scales) −3]. We used data from all stimulus rotations to better constrain the model fitting procedure, but we did not incorporate the number of rotations tested into the calculation of df because most neurons responded primarily to stimuli at one rotation (see Figs. 5, 6); this is a conservative strategy because including all rotations would increase df and thus cause more neurons to achieve statistical significance. We used the difference between the Z-transformed normalized curvature correlation (Zn) and the Z-transformed absolute curvature correlation (Za) as a classification index. To be classified as selective for normalized curvature, the index must be positive; to be classified as selective for absolute curvature, the index must be negative. Furthermore, the magnitude of this index had to exceed a criterion value of 1.28 (equivalent to p < 0.1); otherwise, neurons were unclassified. The main results from these analyses (see Fig. 8) were based on neuronal responses at all stimulus rotations and scales. However, we also implemented several other variants of these analyses in which we: (1) excluded responses to the stimulus scale on which the model fitting was constructed, (2) included responses only at the smallest and largest stimulus scales tested, and (3) included responses at only the optimal stimulus rotation and at all scales tested. All of these variations yielded qualitatively similar results, and we therefore focus on reporting the results based on neuronal responses at all stimulus rotations and scales.
Results
To determine whether V4 neurons represent objects in a size-dependent or size-invariant manner, we recorded well isolated single units in two awake, fixating macaques (N = 80; see Materials and Methods), characterizing their responses to visual shape stimuli presented at different scales within the RF.
Experimental paradigm
To probe neuronal preferences for the bounding contours of objects, we used a set of parametric shape stimuli with systematic, local variations in boundary form (Fig. 1A; see Materials and Methods). Some shapes had convex projections whereas others had concave indentations (Fig. 1A, red and blue segments); we will herein refer to these as convex and concave shapes, respectively. The convex and concave shapes (Fig. 1B) provided a finer sampling of contour curvature than in previous studies (Pasupathy and Connor, 2001; Bushnell et al., 2011). Importantly, this dense sampling along the dimension of curvature ensured substantial overlap in the contour curvature values tested at different stimulus scales, allowing us to distinguish putative size-dependent and size-invariant codes for object representation. The convex and concave shapes were each presented at eight rotations. In the main experiment, the scale test (Fig. 1C), we presented all stimuli at several scales inside the neuron's RF; scales were expressed as fractions of RF diameter, typically 0.4–1.0× (see Materials and Methods).
Experimental design. A, Parametric shapes with systematic variations in contour curvature. When a shape is scaled, the absolute curvature of a contour segment (convex, red; concave, blue) changes but its normalized curvature does not. Diagonal lines of constant absolute curvature are only illustrative; the empirical slope of these lines depends on the stimulus scales tested and the neuron's curvature sensitivity. Schematics for computing absolute and normalized curvature of a contour segment are shown. B, Overlaid convex and concave shapes (left and right) show that the convex projection and concave indentation at the top were varied systematically to span a range of curvature values while maintaining the rest of the boundary similar across stimuli. C, In the scale test, all shapes were presented at several scales within the RF (dashed circle), typically 0.4–1.0× RF size. D, In the position test, only convex shapes at a single intermediate scale (0.8×) were presented at different locations to control for the positional shifts that were induced by scaling. For concave shapes, the position of the concave indentation did not change as a function of stimulus size (C).
Response predictions
When a shape is scaled, the absolute curvature of the bounding contour changes, but its normalized curvature remains the same (Fig. 1A). If a neuron encodes absolute curvature, we therefore expect that its stimulus preferences will change as a function of size. Alternatively, if a neuron encodes normalized curvature, we expect that its stimulus preferences will not change as a function of size.
The two predictions are illustrated graphically for a neuron that responds preferentially to convex shapes. If the neuron encodes absolute curvature (Fig. 2A), its tuning curve will change systematically as a function of size; the neuron will respond best to different shapes at different scales. Because of how we order convex and concave shapes along the shape identity axis (abscissa), the tuning curve will shift leftward with increasing size for a neuron that prefers a convex projection (referred to as convex-preferring), and rightward for a neuron that prefers a concave indentation (concave-preferring). Alternatively, if the neuron encodes normalized curvature (Fig. 2B), the tuning curve will not change as a function of size; the neuron will respond best to the same shape at all scales. Importantly, although both coding schemes can account equally well for responses measured at a single stimulus scale, they make different predictions for responses to the same stimuli at different scales. Note that neither model precludes nor predicts changes in the overall response magnitude across scale. We therefore focus primarily on assessing systematic changes in neuronal selectivity for the contour segment whose curvature we manipulate across the stimulus set (ie, horizontal shifts along the abscissa, and changes in the shape of the tuning curve). One way to assess tuning invariance analytically is to evaluate the centroid of the tuning curve at each scale, and to examine the slope of the tuning centroids across scale (Fig. 2C; see Materials and Methods). This analysis is most interpretable when applied to neuronal responses to shapes presented at the optimal stimulus rotation, where the contour segment that drives the responses is varied systematically. The tuning centroid is intended to capture any systematic relative changes in the neuron's stimulus preferences across scale, including changes in the position of the tuning curve peak and changes in the shape of the tuning curve that cannot be explained by multiplicative gain modulation. Note that although we graphically illustrate the expected shifts in the stimulus preferences of a neuron that encodes absolute curvature as a slope of a particular value (Figs. 1, 2), the empirical magnitude of the slope will depend on the neuron's sensitivity along the dimension of contour curvature, a fit parameter of the curvature model (see Materials and Methods).
Response predictions. These predictions are illustrated for a hypothetical neuron selective for shapes with a convex projection at the top. A, If the neuron encodes absolute curvature, we expect systematic shifts in stimulus preference across scale, accompanied by changes in the tuning centroid (triangles). B, If the neuron encodes normalized curvature, we do not expect systematic shifts in stimulus preference or the tuning centroid across scale. C, Tuning centroid across scale. The absolute curvature model predicts systematic shifts in the tuning centroid as a function of size, yielding a line with a non-zero slope, whereas the normalized curvature model predicts a line with a near-zero slope.
Example responses
Consistent with previous studies (Pasupathy and Connor, 1999, 2001), the V4 neurons we recorded showed strong and differential responses to shape stimuli presented at some rotations, but not others (see Fig. 5, 6). Thus, we first identified the optimal stimulus rotation for each neuron (see Materials and Methods), and then evaluated whether the neuron's preferences for shapes at that rotation changed as a function of stimulus size. The responses of four example V4 neurons to shapes at the optimal stimulus rotation and at different scales are shown (Fig. 3, left). The responses of the first example neuron (Fig. 3A) resembled the absolute curvature prediction; the neuron's stimulus preferences changed gradually as a function of size. For the largest scale (1.0×), the tuning centroid corresponded to a shape with a sharp convexity (see lightest gray triangle). For the smaller scales (0.4–0.8×), the tuning centroid shifted systematically toward shapes with broader convexities (see darker shaded triangles). The neuron's overall response magnitude also grew with increasing scale, although this observation is not constrained by the absolute curvature prediction. The responses of the second example neuron (Fig. 3B) were qualitatively similar: for larger scales, the tuning centroid shifted systematically toward shapes with sharper convexities, although there was little change in its response magnitude. Thus, the responses of these example neurons were consistent with selectivity for the absolute curvature of the preferred contour segment. In contrast, the responses of the third (Fig. 3C) and fourth (Fig. 3D) example neurons resembled the normalized curvature prediction. The tuning centroid remained unchanged as a function of size (triangles are superimposed and indistinguishable), consistent with selectivity for the normalized curvature of the preferred contour segment.
Example responses to the scale test. A–D, Data from four example neurons. Average firing rates to shapes at the optimal stimulus rotation (left). Responses to stimuli at different scales are color-coded (grayscale; expressed as a fraction of RF size); baseline firing rates are also plotted (dashed lines). Tuning centroids across scale (right; also triangles on the left). The responses of neurons in A and B were consistent with the absolute curvature prediction. The responses of neurons in C and D were consistent with the normalized curvature prediction.
Given that the tuning centroid will shift systematically for a neuron that encodes absolute curvature but not for a neuron that encodes normalized curvature (Fig. 2C), this metric's trend across scale provides a useful way to classify neurons: a neuron sensitive to absolute curvature will have a non-zero slope whereas a neuron sensitive to normalized curvature will have a zero slope. For each neuron, we examined the tuning centroids across scale, and computed the slope of the linear regression (Fig. 3, right). For the first and second example neurons, the slopes were large and negative in sign (−1.11, p ≤ 0.05; −0.94, p ≤ 0.05, respectively), indicating systematic shifts toward sharper convexities (ie, higher curvature values) for larger stimulus scales. The negative slope signs were consistent with selectivity for the convex projection, as described earlier. For the third and fourth example neurons, the slopes were near zero (−0.12, p = 0.40; +0.09, p = 0.34, respectively).
Population data
Across all the neurons recorded (Fig. 4A; N = 80, N = 50, and N = 30 from each animal, respectively), we found that most V4 neurons had near-zero slopes (median = −0.06, triangle), indicating that they largely maintained their stimulus preferences across scale transformations, as per the normalized curvature prediction. Only a small subset of neurons had large slopes, indicating that their stimulus preferences changed gradually and systematically across scale transformations, as per the absolute curvature prediction; of these neurons, only a small number had significant regression slopes (N = 13/80; 16%; p ≤ 0.05; black). Consistent with this finding, the responses of neurons in our dataset were also strongly size-invariant when assessed with a more traditional, model-agnostic metric of linear separability based on singular value decomposition (Peña and Konishi, 2001; Mazer et al., 2002; Brincat and Connor, 2004; Rust and DiCarlo, 2010), both for data at the optimal rotation and at all rotations (Fig. 4B,C; see Materials and Methods).
Population analysis of changes in neuronal stimulus preferences across scale. A, Distribution of slopes of the tuning centroids as a function of size for all neurons recorded (N = 80). Only a small subset of neurons showed systematic shifts in their stimulus preferences, as indicated by significant linear regression slopes (N = 13/80; black). B, C, Distribution of the size separability index derived from responses at the optimal stimulus rotation and at all rotations (B and C, respectively). In both cases, neurons showed high separability indices (median = 0.97 in B; 0.95 in C; triangles).
Model comparisons
Several factors determine the magnitude of the expected systematic changes in shape tuning as a function of stimulus size. These include: the exact contour segment preferred by individual neurons, how the curvature of this contour segment varies across the convex and concave shapes at a given stimulus size, and how sensitive each neuron is to curvature (operationalized as the width of its tuning along the curvature dimension). To estimate the expected changes, and to confirm that our stimulus design and quantitative analyses were sufficiently sensitive to reveal systematic changes in stimulus preferences across size, we turned to the absolute curvature model prediction and asked: for example neurons, how much of a shift in the tuning centroid across scale does this model actually predict?
We generated neuron-specific model predictions as follows. First, we fit the curvature model to the neuron's responses to shapes presented at an intermediate scale (0.8×) and at all stimulus rotations, following well established methods (see Materials and Methods). Next, we used the resulting best fit to predict the neuron's responses to the same stimuli at different scales (see Materials and Methods). From the model's predicted tuning curves at the optimal stimulus rotation (the same rotation as for the observed data), we then computed the tuning centroid at each stimulus scale and the linear regression slope, exactly as we had done for the observed data, and compared the predicted and observed slopes.
The observed and predicted responses (symbols and lines, respectively) for an example neuron consistent with sensitivity to absolute curvature are shown (Fig. 5A; same neuron as in Fig. 3B). Here, unlike in previous figures, we show neuronal responses to shapes presented at each of the eight stimulus rotations tested (panels). This neuron, like many others in V4, was exquisitely sensitive to stimulus rotation; only a few rotations elicited strong responses whereas many others elicited near-baseline responses, consistent with position-specific tuning for boundary form (Pasupathy and Connor, 2001). The optimal rotation was 135° (Fig. 5A, row 1, left panel; see Materials and Methods). The shape stimuli that elicited strong responses all contained a convexity pointing to the lower left. This preference was consistent with the optimal contour segment identified by the model fitting procedure: a sharp convexity pointing to the lower left, adjoined by a concavity in the counterclockwise direction (Fig. 5A, inset shape). The neuron's observed responses and the model's predicted responses at an intermediate scale (0.8×) were well matched; the model's performance, computed as the average Pearson's correlation coefficient between observed and predicted responses for all iterations of the cross-validation procedure, was 0.92 (see Materials and Methods). Focusing on the neuron's tuning curves at the optimal stimulus rotation, both the observed and predicted responses shifted systematically as a function of stimulus size. Recall that the curvature model was only fit to the responses to stimuli at one scale (0.8×); nevertheless, it could be used to accurately predict responses to stimuli at other scales by appropriately scaling the curvature values. The model's predicted tuning curves at each scale, normalized by the magnitude of the highest response, are shown (Fig. 5B). For the convex shapes, the predicted tuning curve shifted leftward with increasing stimulus scale, indicating a systematic change in the neuron's stimulus preferences toward sharper convexities. From these predicted tuning curves, we extracted the tuning centroid at each scale. The observed and predicted tuning centroids are shown superimposed (Fig. 5C, black and red, respectively); the observed and predicted slopes were well matched, both in sign and in magnitude.
Absolute curvature model predictions for a neuron consistent with sensitivity to absolute curvature. A, Observed and predicted responses at all stimulus rotations (symbols and lines, respectively; same neuron as in Fig. 3B). Each panel shows the neuron's tuning curves at one of eight stimulus rotations (0°–360°; 45° steps). The preferred contour segment identified by the fitting procedure was a sharp convexity pointing to the lower left, adjoined by a shallow concavity in the counterclockwise direction (see inset shape). B, The model's predicted responses at the optimal stimulus rotation, normalized to the maximum predicted response for each stimulus scale. C, Observed and predicted tuning centroids (black and red); the tuning centroid shifted systematically across scale, for both the observed and predicted responses.
Observed and predicted responses for a second example neuron consistent with sensitivity to normalized curvature are shown (Fig. 6A; same neuron as in Fig. 3C). The optimal rotation was also 135° (Fig. 6A, row 1, left panel). As in the previous example, the observed neuronal responses and the model's predicted responses to stimuli at an intermediate scale (0.8×) were well matched; the average Pearson's correlation coefficient was 0.97. The optimal contour segment identified by the fitting procedure was a sharp convexity pointing to the lower left, adjoined by two concavities on either side (Fig. 6A, inset shape). Focusing on the neuron's tuning curves at the optimal stimulus rotation, the observed responses did not shift appreciably as a function of size, although the model did predict modest systematic shifts in stimulus tuning. To see these shifts more clearly, we normalized the model's predicted tuning curves at each scale (Fig. 6B). For the convex shapes, the predicted tuning curve shifted leftward with increasing stimulus scale, indicating a systematic change in the model's stimulus preferences toward sharper convexities. The observed and predicted tuning centroids are shown superimposed (Fig. 6C); the observed slope was near zero, whereas the predicted slope was larger in magnitude.
Absolute curvature model predictions for a neuron consistent with sensitivity to normalized curvature. A, Observed and predicted responses at all stimulus rotations (symbols and lines, respectively; same neuron as in Fig. 3C; same format as in Fig. 5). The neuron's responses were strongly selective for stimulus rotation. The preferred contour segment identified by the fitting procedure was a sharp convexity pointing to the lower left, adjoined by a shallow concavity on either side (see inset shape). B, The model's predicted responses, normalized to the maximum predicted response for each stimulus scale; the model predicted small yet systematic horizontal shifts of the tuning curve across scale. C, Observed and predicted tuning centroids (black and red); the observed tuning centroids did not shift systematically, whereas those predicted by the model did.
Model comparisons for two additional example neurons are provided (Fig. 7; only tuning curves at the optimal stimulus rotation are shown). Both neurons responded preferentially to the concave shapes in our stimulus set. For the first neuron (Fig. 7A), both the observed and predicted tuning centroids shifted systematically as a function of stimulus size, consistent with sensitivity to absolute curvature. For the second neuron (Fig. 7B), the observed tuning centroids did not shift appreciably as a function of stimulus size, consistent with sensitivity to normalized curvature, although the predicted tuning centroids did shift systematically. Collectively, these results confirm that our stimuli and analyses were sensitive enough to reveal possible systematic changes in neuronal stimulus tuning across size. These data also illustrate that, consistent with our predictions (Fig. 2), neurons that encode absolute curvature and are convex-preferring have negative slopes for the tuning centroids across scale, whereas neurons that are concave-preferring have positive slopes.
Additional example responses to the scale test. A, B, Observed and predicted responses for two neurons (left; symbols and lines); both neurons preferred the concave indentation in our stimulus set. Observed and predicted tuning centroids for the same neurons (right, black and red).
The example neuronal responses and population analyses described thus far suggest that some V4 neurons are sensitive to absolute curvature, whereas others are sensitive to normalized curvature. To quantitatively evaluate which of the two coding schemes (absolute or normalized curvature) best accounted for the observed responses of each V4 neuron recorded, we performed direct comparisons of the two models on a neuron-by-neuron basis. We generated two model predictions for each neuron: an absolute curvature prediction and a normalized curvature prediction (see Materials and Methods). We then computed the partial correlations between the observed responses and each of the two predictions and compared them (Movshon et al., 1985; Smith et al., 2005; see Materials and Methods). For the first example neuron (Fig. 5), the r-to-Z transformed partial correlation of the absolute curvature prediction exceeded that of the normalized curvature prediction (Za = 4.01; Zn = 2.39); this difference was significant and the neuron was therefore classified as selective for absolute curvature (Za − Zn > 1.28; p < 0.1; see Materials and Methods). For the second example neuron (Fig. 6), the r-to-Z transformed partial correlation for the normalized curvature prediction exceeded that of the absolute curvature prediction (Zn = 3.66; Za = 0.83); this difference was significant and the neuron was therefore classified as sensitive to normalized curvature (Zn − Za > 1.28; p < 0.1).
A scatter plot of the r-to-Z transformed partial correlation values for each neuron recorded is shown (Fig. 8; N = 80). The partial correlation value for the normalized curvature prediction (Zn, ordinate) is plotted against that for the absolute curvature prediction (Za, abscissa); dashed lines indicate the statistical bounds used for classification (see Materials and Methods). Across the population, Zn exceeded Za for 66 neurons (∼83%), indicating that the normalized curvature model provided a better account of the responses of most V4 neurons. Of these neurons, the difference between Zn and Za was statistically significant for 58 neurons (∼73%; Zn − Za > 1.28; p < 0.1), and these were classified as selective for normalized curvature (red). Of the remaining 14 neurons for which Za exceeded Zn, the difference between these measures was statistically significant for 10 neurons (∼13%), and these were classified as selective for absolute curvature (black). For the 10 neurons that encode absolute curvature, the observed and predicted tuning centroid slopes were well matched (r = 0.67; p = 0.04). Collectively, these model-driven analyses corroborate our findings based on the tuning centroid analyses and the linear separability indices (Fig. 4): most V4 neurons were selective for normalized curvature and thus consistent with maintaining their preferences for shape stimuli across appreciable transformations of stimulus scale within the RF.
Model comparisons. For each neuron, we plot the Z-transformed partial correlation of the normalized curvature model (Zn, ordinate) against the Z-transformed partial correlation of the absolute curvature model (Za, abscissa). Significance bounds (dashed lines) were used to classify neurons as selective for normalized curvature (red), or selective for absolute curvature (black); some neurons were unclassified (gray). Data from the two example neurons in Figures 5 and 6 are highlighted.
Controls
The schematic of the scale test (see Fig. 1C) demonstrates that the position of the convex projection varies as a function of stimulus size. To be certain that the systematic changes in stimulus preference across scale observed in a small subset of V4 neurons could not be attributed to positional shifts of the preferred contour segment within the RF, we incorporated a control experiment, the position test, in which we presented the same shapes at locations within the RF that matched the positions induced by scaling (Fig. 1D). Note that for concave shapes (see Fig. 1C, right), the position of the concave indentation does not change as a function of stimulus size; we therefore only conducted the position test for convex shapes. The responses of two example neurons (the same neurons as in Fig. 3A,B) to both the scale test and the position test are shown (Fig. 9A,B). For these two convex-preferring neurons, responses to the scale test (left) showed clear horizontal shifts in the tuning centroid across scale. However, their responses to the position test (right) showed no systematic shifts in the tuning centroid (triangles are superimposed and indistinguishable). Thus, these neurons signaled their preferred contour segments in a position-invariant manner.
Position test controls. A, B, Data from two example neurons (same as in Fig. 3A, B). Tuning curves for stimuli at different scales (left; replicated from Fig. 3), and tuning curves for stimuli at different positions (right); stimulus positions were chosen so as to match the positional shifts induced by scaling. Both neurons showed shifts in the tuning centroid across scale, but not across position (triangles). C, Population analysis of changes in neuronal stimulus preferences across position. The distribution of slopes of the tuning centroids across position for all convex-preferring neurons (N = 39). D, E, Distribution of the position separability index, derived from responses at the optimal stimulus rotation and at all rotations (D and E, respectively). In both cases, neurons showed high separability indices (median = 0.98 in D; 0.94 in E; triangles).
To determine whether most V4 neurons encode object contours in a position-dependent or position-invariant manner, we examined the distribution of slopes derived from the tuning centroids across position for all convex-preferring neurons recorded (N = 39). Again, we found that most neurons had near-zero slopes (Fig. 9C; median = +0.10), indicating that they maintained their stimulus preferences as a function of position. Similarly, most neurons had high separability indices, both for data at the optimal rotation and at all rotations (see Fig. 9D,E). The distribution of tuning centroid slopes was more centered near zero for the position test compared with the scale test, and most neurons (70%) showed perfect separability for position (separability index of 1), whereas a smaller subset (50%) showed perfect separability for scale.
For neurons that contributed data to both the scale and position tests (N = 39: N = 24, and N = 15 from each animal, respectively), we observed no correlation between the linear regression slopes for the two test conditions (Fig. 10; r = 0.07, p = 0.66); these measures were also uncorrelated when data from each recording subject were analyzed separately. We also examined whether convex-preferring neurons that showed tuning shifts in the scale test and that were classified as sensitive to absolute curvature based on the partial correlation analyses (Fig. 8; N = 5) also showed shifts in the position test. For these neurons, there was no significant correlation between the slopes in the two testing conditions (r = −0.21; p = 0.74); furthermore, there was no lawful relationship between the sign of tuning centroid slope in the two testing conditions. Thus, these data suggest that any shifts in neuronal stimulus preferences in the scale test could not be attributed to positional shifts of the preferred contour segment within the RF.
Comparing size and position invariance in neurons tested for both stimulus transformations. For neurons that preferred the convex shapes in our stimulus set (N = 39), we compared the linear regression slopes derived from responses in the scale test (abscissa and marginal distribution, below) to the linear regression slopes derived from responses in the position test (ordinate and marginal distribution, left). There was no correlation between the linear regression slopes for the two test conditions.
Response gain modulation
Thus far, our analyses have focused exclusively on assessing the invariance of tuning curves across stimulus transformations. Nevertheless, stimulus size may also influence neuronal responses by providing an overall gain modulation. Consistent with previous reports in IT cortex (Ito et al., 1995; Tanaka, 1996; Zoccolan et al., 2007), we found that many V4 neurons of either type (ie, sensitive to absolute or normalized curvature) showed response gain modulation across transformations of scale and position (Fig. 3, Fig. 9A). Most neurons (71%; Fig. 11A, top) showed response enhancement with increasing size, generally preferring stimuli at the largest scales; a smaller subset (29%; Fig. 11A, bottom) showed response suppression, preferring stimuli at the smallest scales. Across the population of neurons recorded, the ratio of responses to the largest versus smallest stimulus scales had a broad distribution (Fig. 11B; median = 1.5, range = 24). These response gain modulations demonstrate that V4 neurons carry information about stimulus scale, even when their stimulus preferences are maintained across scale transformations.
Response gain modulation as a function of size. A, Peak neuronal response at each scale, normalized by the maximum response, is plotted as a function of stimulus size. Most neurons (top) showed response enhancement; their responses were strongest for larger stimuli. A smaller subset of neurons (bottom) showed response suppression; their responses were strongest for smaller stimuli. B, Distribution of the ratio of responses to the largest and smallest scales tested, for all neurons; the distribution was broad, suggesting that many neurons carried information about stimulus size.
Discussion
To examine the neural basis of invariant object representation in primate visual cortex, we asked whether the selectivity of single V4 neurons for the bounding contours of objects was maintained across transformations of stimulus size within the RF. We found that most neurons (∼73%) maintained their preferences for shape stimuli across substantial changes in size (>2-fold in linear extent). For these neurons, normalized curvature, rather than absolute curvature, provided the better account of shape selectivity. A smaller subset of neurons (∼13%) faithfully signaled a particular magnitude of contour curvature at all stimulus sizes; these neurons showed systematic shifts in their preferences for shape stimuli as a function of size that were well accounted for by the absolute curvature model. For both types of neurons, shape tuning was independent of stimulus position within the RF (∼30% of the RF). Collectively, these findings posit area V4 as a suitable foundation for invariant object codes that support recognition behavior. This work also reveals, for the first time, the coding scheme used by V4 neurons (and perhaps neurons downstream of V4) to represent objects in a size-invariant manner.
What stimulus features underlie size invariance?
Many studies have demonstrated that neuronal responses in the ultimate stages of visual processing in IT cortex signal object identity across size transformations and can therefore support invariant recognition (Schwartz et al., 1983; Desimone et al., 1984; Gross et al., 1993; Sáry et al., 1993; Ito et al., 1995; Logothetis and Sheinberg, 1996; Tanaka, 1996; Hikosaka, 1999; Brincat and Connor, 2004; Liu et al., 2009; Rust and DiCarlo, 2010). Size invariance has also been demonstrated in V4 (Sawamura et al., 2005; Rust and DiCarlo, 2010). Nevertheless, we still lack an elemental understanding of the mechanisms that underlie invariant object representations. Neurons in V4 and IT are selective for many stimulus features (Gross et al., 1993; Tanaka, 1996; Connor et al., 2007), but we do not know which types of feature selectivity support invariant representation because studies that measure invariance do not also assess the basis of form selectivity in the same neurons. Here, we examined invariance with respect to a well documented type of form selectivity, asking whether V4 neurons that signal local boundary conformation maintained their preferences across size. Our measurements reveal that most V4 neurons signal normalized curvature, providing evidence that they encode objects in a size-invariant manner. Given that V4 is the dominant source of feedforward input to IT (Felleman and Van Essen, 1991), it is possible that the same coding scheme may also underlie size-invariant object representation in IT. Other candidate coding schemes, eg, selectivity for the medial axis of objects (Hung et al., 2012), may also contribute to invariant representation in V4 and IT, and merit future examination.
Conclusions drawn from invariance studies can depend critically on the choice of stimuli used, as well as the spatial and form selectivity of the neurons tested. For example, had we tested neurons with a smaller set of stimuli that coarsely sampled contour curvature, we may have classified fewer neurons as size-dependent. Alternatively, had we not ensured that the preferred contour segment was within the RF for all stimulus sizes, we might have classified many more neurons as size-dependent; indeed, a mismatch between stimulus and RF sizes may account for a previous report of limited position-invariance in V4 compared with IT (Rust and DiCarlo, 2010). We created stimuli that provided dense enough sampling along the contour curvature dimension to reveal any systematic shifts in tuning across scale transformations. We also tailored the stimuli presented to each neuron based on measurements of its RF position and size. Collectively, these strategies allowed us to identify invariant neurons with confidence and to ask whether the tuning curves for different stimulus sizes were scaled versions of each other, thus going beyond assessments of the preservation of stimulus rank order (Li et al., 2009). Our results demonstrate that most V4 neurons are as invariant as IT neurons based on linear separability metrics (Brincat and Connor, 2004; Rust and DiCarlo, 2010) suggesting that although RFs increase in size from V4 to IT, invariance levels are maintained.
Advancing our understanding of object coding in V4
Previous work has demonstrated that the preferences of many V4 neurons for closed shapes can be explained in terms of selectivity for boundary curvature at a specific location relative to object center (Pasupathy and Connor, 2001). However, because these previous findings were based on responses to stimuli at a single scale, they cannot evaluate whether this representation is based on a size-dependent or size-independent form of curvature. Our results advance understanding of object representation in cortex by demonstrating that V4 carries two parallel representations of boundary form: one size-invariant and one size-dependent, signaled by neurons selective for normalized and absolute curvature, respectively. For neurons selective for absolute curvature, the changes in shape tuning across size are not random, but a direct consequence of the geometric relationship between stimulus size and absolute curvature. For both types of neurons, stimulus size may modulate the gain of neuronal responses, suggesting that they each encode size information (Fig. 11). Taking into account these different coding schemes improved our ability to model V4 responses: the average model fit correlation across the dataset was 0.74 when neurons were fit with only the absolute curvature model and 0.80 when they were fit with the better of the two models (either absolute or normalized curvature).
The results of our control experiments also demonstrate that the stimulus preferences of most V4 neurons were independent of position in the RF. Although these findings extend previous work that tested position-invariant form selectivity in V4 neurons using a handful of stimuli (Gallant et al., 1993; Pasupathy and Connor, 2001), they stand in contrast to recent work using form stimuli defined by combinations of line elements, which demonstrated position-dependent stimulus tuning in many V4 neurons (Nandy et al., 2013). This discrepancy is likely due to differences in the spatiotemporal properties of the stimuli used. In the current study, we sought to investigate position- and size-invariant representations of isolated 2D objects. Our stimuli, when viewed parafoveally as in our recording sessions, appear as individual shapes separated by clear blank periods (stimulus duration was 300 ms; interstimulus interval was 200 ms). In contrast, the study by Nandy et al. (2013) used small line-composite shapes presented briefly and in rapid succession (stimulus duration was 16 ms; average interstimulus interval was 16 ms), and assessed invariance by comparing tuning for the individual line-composite shapes across position. Given the integration properties of the visual system, we speculate that their stimuli appear as dynamically morphing textures. If the entire texture patch were considered an object, the position-specific tuning curves thus derived may reflect localized feature preferences in an object-centered reference frame rather than the extent of position invariance. For example, a neuron selective for a circle may show a weak preference for different line orientations at different points along the circle, but this would not imply a lack of position invariance. Given the preponderance of size- and position-invariant neurons we observed, our findings endorse a key, yet underestimated, role for area V4 in invariant object representation and recognition, consistent with evidence that V4 lesions impair invariant recognition behavior (Schiller, 1995). This proposition is further strengthened by our recent demonstrations that V4 selectivity for object contours is also resilient to changes in stimulus color (Bushnell and Pasupathy, 2012) and to occlusions (Bushnell et al., 2011).
From the standpoint of neural computation, if contour curvature is computed piecewise by assaying contour orientation at regular intervals along an object's boundary, then neurons sensitive to normalized curvature may assay orientation at distances that scale proportionally with stimulus size, whereas neurons sensitive to absolute curvature may assay orientation at points separated by a specific linear distance, regardless of stimulus size. The former case is analogous to computing curvature in a polar coordinate system (Cavanagh, 1978). These different coding schemes may have different functional relevance; size-invariant neurons may support the recognition of objects regardless of scale, whereas size-dependent neurons may inform motor plans involved in grasping objects. In line with hierarchical models of object representation (Riesenhuber and Poggio, 1999), size-invariant neurons may be built up from the convergence of size-dependent neurons. A variant of this model was proposed to explain the emergence of neuronal selectivity for object contours in V4 and to account for position-invariant selectivity within the RF (Cadieu et al., 2007). The model achieves selectivity for contour curvature by pooling the responses of many V1-like oriented filters, and achieves position-invariance by repeating the pooling process at different positions within the RF. However, the model in its current form cannot account for the size-invariance of shape tuning, motivating future computational work to account for our main findings. Another implication of our work is that object size is estimated before the encoding of bounding contours, and that this estimation contributes to a representation based on normalized curvature. Estimates of object size could be computed locally as the distance between a given contour segment and object center, or more globally as the average distance of all contour segments to object center. In either case, the object must be segmented first. Additional experiments are therefore needed to determine how objects are segmented and how object center and size are estimated.
Invariant object coding is a distinctive property of high-level vision and is critical for many perceptual and cognitive functions. Theoretical work has long favored parameterizing objects by the curvature of their bounding contours, highlighting the efficiency and compactness of such a code (Attneave, 1954), its structural stability and invariance with respect to scale, perspective, and occlusion (Asada and Brady, 1984; Marimont, 1984; Besl and Jain, 1985; Verri and Yuille, 1986). Object recognition theory has also demonstrated that the shape of an object's boundary, particularly local contour segments containing “diagnostic features” shared across prototypes of a given object, can be useful for identifying objects across image transformations (Biederman, 1987). Our results provide empirical evidence for the functional suitability of contour-based mechanisms as a foundation for invariant object representation in the primate cortex, and their instantiation in single neurons in area V4.
Footnotes
This work was funded by NEI Grant R01 EY018839 to A.P., NEI Center Core Grant for Vision Research P30 EY01730 to the University of Washington, and NIH/ORIP Grant P51 OD010425 to the Washington National Primate Research Center. We thank Yoshito Kosai for technical assistance and Amber Fyall for expert animal care. We also thank Gregory Horwitz, Wyeth Bair, Raghu Pasupathy, Charles E. Connor, and J. Anthony Movshon for helpful comments and discussions.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr Yasmine El-Shamayleh, Department of Physiology and Biophysics, University of Washington, 1959 Northeast Pacific Street, HSB G-424, Box 357330, Seattle, WA 98195-7330. yasmine1{at}uw.edu