Abstract
Neurons in primate inferotemporal cortex (IT) are clustered into patches of shared image preferences. Functional imaging has shown that these patches are activated by natural categories (e.g., faces, body parts, and places), artificial categories (numerals, words) and geometric features (curvature and real-world size). These domains develop in the same cortical locations across monkeys and humans, which raises the possibility of common innate mechanisms. Although these commonalities could be high-level template-based categories, it is alternatively possible that the domain locations are constrained by low-level properties such as end-stopping, eccentricity, and the shape of the preferred images. To explore this, we looked for correlations among curvature preference, receptive field (RF) end-stopping, and RF eccentricity in the ventral stream. We recorded from sites in V1, V4, and posterior IT (PIT) from six monkeys using microelectrode arrays. Across all visual areas, we found a tendency for end-stopped sites to prefer curved over straight contours. Further, we found a progression in population curvature preferences along the visual hierarchy, where, on average, V1 sites preferred straight Gabors, V4 sites preferred curved stimuli, and many PIT sites showed a preference for curvature that was concave relative to fixation. Our results provide evidence that high-level functional domains may be mapped according to early rudimentary properties of the visual system.
SIGNIFICANCE STATEMENT The macaque occipitotemporal cortex contains clusters of neurons with preferences for categories such as faces, body parts, and places. One common question is how these clusters (or “domains”) acquire their cortical position along the ventral stream. We and other investigators previously established an fMRI-level correlation among these category domains, retinotopy, and curvature preferences: for example, in inferotemporal cortex, face- and curvature-preferring domains show a central visual field bias whereas place- and rectilinear-preferring domains show a more peripheral visual field bias. Here, we have found an electrophysiological-level explanation for the correlation among domain preference, curvature, and retinotopy based on neuronal preference for short over long contours, also called end-stopping.
Introduction
We are interested in the organizational principles of the object-recognition network in the macaque, which includes areas V1, V2, V4, and inferotemporal cortex (IT). IT is subdivided into domains responsive to naturally occurring categories such as faces and scenes (Kourtzi and Kanwisher, 2000; Tsao et al., 2006; Bell et al., 2011). With experience, IT cortex can develop domains for artificial categories such as buildings or text (Hasson et al., 2002; Cohen and Dehaene, 2004; Srihasam et al., 2014). Interestingly, these domains develop in stereotyped locations in IT. Previously, we investigated the effects of early training on the organization of IT and found that training could induce the development of domains for images never normally experienced by monkeys, such as human symbols (Srihasam et al., 2014). We further found that trained-symbol domain locations seemed to be determined by the shape of the stimuli: selectivity for symbol sets with curved contours were localized along the lower lip of the superior temporal cortex, whereas selectivity for symbol sets with straight edges were localized in more ventral IT. The importance of curvature in occipitotemporal cortex is emphasized by the existence of a distributed network of cortical patches that respond to curved stimuli (Yue et al., 2014). However, this raises the question of why curvature preferences should be distributed along the ventral stream at all. The observation that a selectivity gradient of curved versus straight contours correlated with the retinotopic map in visual cortex suggests that the curvature gradient is determined by the retinotopic map (Srihasam et al., 2014). We hypothesized that this correlation results from a combination of receptive field (RF) size and end-stopping.
Neurons with foveal RFs have smaller RFs compared with neurons with more peripheral RFs. Because many neurons are selective for stimulus length; that is,“end-stopped” or “hypercomplex” (Hubel and Wiesel, 1965), this size gradient includes not only RF size and spatial frequency, but also selectivity for stimulus length. End-stopped cells respond best to short lines, showing weaker responses when the line is extended into their inhibitory surround; this inhibition is contextual because it is reduced if the orientation in the surround differs from the orientation in the center (Hubel and Wiesel, 1965). Frequently visualized as an arrangement of inhibitory zones along the length of the oriented activating region (which forms the cell's orientation preference), end-stopping has been shown to be approximately symmetrical along the width of the activating zone (Sceniak et al., 2001). End-stopping is a plausible mechanism underlying selectivity for high-curvature features such as end points and curved or bent contours compared with long straight contours proportional to RF size (Hubel and Livingstone, 1987). In the cat, end-stopped V1 neurons respond best to high-curvature contours compared with non-end-stopped neurons (Dobbins et al., 1987). These high-curvature features are prevalent in some visual object categories such as faces and can be more salient in foveal compared with peripheral vision (Fig. 1). Therefore, smaller, more central, RFs may be better at encoding sharp changes in orientation and the larger RFs at increasing eccentricities better at encoding long, straight edges or gradually curving contours.
Natural scenes and curvature detection. a, Photograph of Torsten Wiesel and David Hubel (courtesy Francis A. Countway Library of Medicine, https://cms.www.countway.harvard.edu/wp/?tag=david-h-hubel). b, Line drawing of Hubel showing the distribution of the RF size of end-stopped cells that would respond best to different parts of the image.
To find out whether curvature selectivity correlates with end-stopping in the primate, we recorded from 870 cortical sites across areas V1, V4, and posterior IT (PIT) in six monkeys. We presented “banana” Gabors (Ibrahim, 2012) with different curvatures, orientations, and diameters and analyzed the responses from single units and multi-units. We found that sites that showed end-stopping preferred curved contours and this correlation was found in every recorded area. Further, we discovered that many sites in PIT prefer concave curvatures (relative to the fixation point) and that there is a small but reliable relationship between cells with concave preferences and face discrimination. These results indicate that an organization for curvature selectivity (and thereby shape selectivity) emerges inherently with the gradation in RF size that characterizes a retinotopic map. We propose that this provides an organizing principle for the functional architecture of the ventral stream, as well as a mechanistic link between the observations that face, body, and scene patch locations are correlated both with retinotopy (Levy et al., 2001; Hasson et al., 2002) and with selectivity for degree of curvature (Nasr and Tootell, 2012; Kornblith et al., 2013; Yue et al., 2014).
Materials and Methods
Procedures.
All procedures were approved by the Harvard Medical School Institutional Animal Care and Use Committee following the National Institutes of Health's Guide for the Care and Use of Laboratory Animals, eighth edition. This article conforms to the ARRIVE Guidelines checklist.
Husbandry.
The animals were socially housed in standard primate caging under a 12 h light/dark cycle with ad libitum access to chow. Water and fruits were available during experimental sessions.
Behavior.
Six adult male macaques (F, G, R, T, U, and V) weighing 8–11 kg were trained to perform a fixation task. They were rewarded for keeping their keeping their gaze within ±1° (monkeys F, T, U, and V) or ±1.3° (monkeys G and R) from the fixation spot while images were flashed elsewhere in the visual field. Eye position was monitored using an ISCAN system (www.iscaninc.com).
Electrophysiology hardware.
All animals were implanted with head posts before fixation training. Monkeys F, G, T, U, and V all had multielectrode arrays (96-electrode Utah-arrays or 64-electrode floating microelectrode arrays; Blackrock and Microprobes for Life Sciences). Monkey R had a chronic chamber for acute recordings using bundles of three electrodes. Monkeys U and V had arrays implanted in left/right V1 cortex, monkey R had a chamber over left V1, monkeys F and T over right/left V4 (between the lunate sulcus and superior temporal sulcus, 25 mm dorsal/7 mm posterior to the ear bars) and monkey G over the left PIT (anterior to the inferior occipital sulcus and posterior to middle posterior temporal sulcus). We collected electrophysiological information, including high-frequency (“spike”) events, local field potentials, and other experimental variables such as eye position, reward rate, and photodiode outputs tracking monitor frame display timing using either Blackrock's Cerebus Neural Signal Processor data acquisition system or a Plexon Multichannel Acquisition Processor Data Acquisition System. Each channel was auto-configured daily for the optimal gain and threshold; we collected all electrical events that crossed a threshold of 2.5–3.9 SDs from the mean peak height of the distribution of electrical signal amplitudes per channel. These signals included typical single-unit waveforms, multi-unit waveform bursts, and visually active hash.
Experimental workflow.
Each day, the animals performed their fixation task while we recorded from their arrays/chamber. The trial timeline in all experiments was as follows: at the start of each trial, the fixation target appeared and the animal had several seconds to direct its gaze to it. Within tens to hundreds of milliseconds after fixation onset, images were flashed on and off (200 ms-ON/200 ms-OFF for monkeys R and G; 133 ms-ON/120 ms-OFF for monkeys U, V, and F; and 147 ms-ON, 107 ms-OFF for monkey T). Images were presented on a gray background. If the animals held fixation during a subset of image presentations, they were rewarded with drops of water or juice. The reward size was increased over time for motivation.
Visual stimuli.
The curved (“banana”) Gabors were generated using MATLAB code based on formulas from Mina Ibrahim Samaan Ibrahim's research thesis (Ibrahim, 2012). These images were a composite of a curved complex wavelet and a Gaussian, described by four parameters: spatial frequency, orientation, curvature and size as follows:
where γ = constant, G(x, y) = exp
, F(x, y) = exp(i · f · (xc + c · xs2), xc = x · cosθ + y · sinθ, xs = −x · sinθ + y · cosθ, and DC is a bias term. We plugged in 8 orientation values (0 to 7/8*π radians in intervals of 1/8*π radians), 5 curvature values (−8, −6, ∞, +6, +8), and 4 Gaussian size values (0, 1, 2, and 3). The Gaussian sizes were relative to the size of the image frame, with the largest Gaussian size value (3) allowing the wavelet to cover ∼87% of the frame diameter, size value 2 covering 43%, size value 1 22%, and size value 0 11% of the frame diameter. The stimulus dimensions were (2.5° × 2.5°) for V1 experiments, (9° × 9°) for V4 experiments, and (9° × 9°) and (18° × 18°) for PIT experiments. The Gabors kept the same number of cycles per image, so the corresponding spatial frequency values were 5.0 cycles/° for V1, 1.4 cycles/° for V4, and between 0.7 and 1.4 cycles/° in PIT. The images had gray backgrounds. There was a mean of 38 ± 8 repetitions per image across all experiments.
General information analysis.
We collected data in 33 experiments using six different animals (monkey F, two sessions; monkey G, three sessions; monkey R, 24 sessions; monkey T, two sessions; and monkeys U and V, one session each). Data from arrays were largely multi-unit. Data from the acute recordings (monkey R) were collected after isolating at least one single unit in a given channel of the microwire bundle. We used the term “site” to refer to single units and multi-units. This raises the question of whether it was reasonable to treat these signals equally. In primate cortex, there are strong correlations between single-unit and multi-unit responses, especially if the visual stimulus features in question are organized in functional cortical maps (Liu and Newsome, 2003). For example, V1 and V4 are characterized by columns of individual cells with shared orientation preferences (Hubel and Wiesel, 1968; Ghose and Ts'o, 1997) and IT has columns of cells with similar complex shape selectivity (Wang et al., 1996; Tsunoda et al., 2001; Tsao et al., 2006; Sato et al., 2009; Bell et al., 2011); this shape-based organization likely subsumes curvature. Surround suppression (of which end-stopping is one manifestation) is also functionally organized in V4 (Ghose and Ts'o, 1997) and even MT (Born, 2000). Therefore, we find it reasonable that our multi-unit activity data is representative of single-unit properties. In three monkeys (F, G, and T), we recorded from the same array on multiple days. Exploratory analyses showed that image responses recorded from the same channel could be correlated across days with respect to their preferred stimuli, but not reliably enough to be considered identical sites, especially in multivariate analyses, in which the same set of channels represented the same image in different locations of activity space in different days. Because some of our analyses used linear classifiers, we found it preferable to treat the signals recorded from different days as different sites. With this caveat in mind, our total site count per monkey was as follows: monkey V, 96 × 1 (channels × days) = 96; monkey U, 96 × 1 = 96; monkey R, 6 × 24 = 144; monkey F, 96 × 2 = 192; monkey T, 96 × 2 = 192; and monkey G, 50 × 3 = 150) for a total of 870 sites. For our linear classification analyses, we also added an additional four experiments (from monkeys T and G) recorded on the same days as the experiments above, with the same experimental parameters except a different stimulus image size.
RFs analysis.
To estimate the RF of each site, we presented a small image in a grid-like pattern across a large region of visual space. The grid spacing varied from animal to animal, between (0.5–2°) encompassing a region between (2° × 2°) for V1 and (16° × 16°) for PIT. We measured the mean spike rate over each stimulus position and used the MATLAB function griddata.m to interpolate the scattered data into a continuous map. This map was smoothed using a disk filter (0.5° diameter for monkey F, 1° diameter for monkey G, and 0.1° diameter for all other animals). This map represented the aggregate RF of each site in our arrays. We passed this map through a threshold to exclude activity that was 2–3 SDs below the average rate over all positions. We then passed each map through the edge.m function to obtain the perimeter of the estimated RFs and the imdilate.m function to fill the perimeters with a constant value. As measures of statistical strength, we multiplied each RF perimeter by the peak firing rate at its center (effect size) and, for each site, we also conducted a Wilcoxon signed-rank test for zero median at every stimulus location, correcting for multiple comparisons via the false discovery rate algorithm. We then repeated the analysis above using only stimulus locations where the corrected p-value was <0.05 to highlight the most reliable RF estimates. We used regionprops.m to return the RF center locations, which we used to measure the size and eccentricity of each field and its distance from the stimulus center.
One-dimensional tuning curves and preferred values analysis.
Each banana Gabor was described by three features: orientation, curvature, and Gaussian envelope size (“size”). Orientation could take eight values, curvature could take five values, and size could also take four nominal values (see “Visual stimuli”). All images had the same dimensions within each experiment; only the Gaussian diameter varied across images. We computed each tuning curve as follows: orientation tuning curves were measured using only responses to straight Gabors (curvature 0) averaged over all sizes (the location of the peak response was taken as the preferred orientation). The size-tuning curve was measured using only responses to Gabors at the preferred orientation averaged across all curvature values (the location of the peak of the curve was taken as the preferred size). The curvature-tuning curve was measured using responses to the preferred orientation averaged across all sizes (the location of the peak of the curve was taken as the preferred curvature).
To determine whether a given tuning curve was statistically different from uniformity, we computed a one-way ANOVA using each site's trial responses for each feature value. Tuning was defined as statistically reliable if p < 0.05. We fitted each site's convexity-tuning function using a polynomial of the form R = Aquadx2 + Alinearx + C, where R is the model value, Aquad measures the magnitude of the quadratic component, Alinear is the linear component, C is the offset, and x is the stimulus convexity value scaled as (−2, −1, 0, 1, 2). Individual trial responses per site were normalized such that the maximum mean response was 1 and the minimum mean response was 0; due to this transformation, both Aquad and Alinear were arithmetically bounded to the range −0.25 to 0.25. We also fit each orientation tuning curve with a von Mises model of the form R = b + a * e(sin(θ−θ0)/d)) where a is the amplitude, b is the offset, θ0 is the curve center, and d is the inverse dispersion value (1/d is equivalent to tuning width). Goodness-of-fit was described by R2, the coefficient of determination. All fits above were done using the individual trial data with the fit.m MATLAB function.
Size and curvature analysis.
We computed the end-stopping index by fitting each site's size tuning curve measured at each site's preferred orientation and curvature value of 0 (straight Gabor) with a model of the form Rs = msi + b where Ri is the response to each size, si is the Gaussian size value and b is an offset term. The end-stopping index was defined as −m. The curvature index (CI) was defined as CI = , where (̅R̅c̅u̅r̅v̅=̅−̅2̅,̅ ̅R̅c̅u̅r̅v̅=̅+̅2̅)̅ is the mean response to the two highest curvature-value banana Gabors, and Rcurv=0 is the response to the straight Gabor at the largest size and at each site's preferred orientation.
Linear classification analysis.
We trained support vector machines with a linear kernel using the MATLAB function fitcecoc.m. We used a one-versus-one approach, with support vector machines (SVMs) trained to discriminate between pairs of images using fivefold cross-validation. Each site's spike rates were z-scored using the mean and SD of all its responses for that session. There was an average of 38 ± 8 response vectors in every positive and negative class per comparison. To estimate the chance accuracy for each paired comparison, we concurrently trained SVMs using the same set of data vectors, but with shuffled labels. This analysis was done within each experimental session and accuracy values were averaged across sessions.
Results
RFs
We collected spike rate responses from a total of 870 sites in areas V1 (from three monkeys, R, U, and V), area V4 (monkeys F, T), and PIT (monkey G), using 50- and 96-channel arrays in four monkeys (F, T, U, and V), three-channel electrode bundles in a fourth animal (R) and a 50-channel array in the last monkey (G). The arrays recorded single-unit and multi-unit activity (henceforth collectively referred to as “sites”). The median RF width for each neuronal population ranged up to 5.20 ± 1.05° at median eccentricities from 1.8 ± 0.2 to 3.9 ± 0.1° from the fovea (Fig. 2a,b, Table 1). We placed all stimuli at the center of the aggregate response field of each recording array (the stimulus position was not optimized for any given site). We presented 160 different banana wavelets to each population and recorded the responses of each site to every stimulus. These wavelets varied in three features, size, orientation, and curvature, and were displayed on an average gray background (Fig. 2c). There were eight different orientations = 0 to 157° in steps of 22.5°; four sizes, presented in octave steps relative to a maximum area: for the V1 monkeys, maximum stimulus dimensions were 2.5° × 2.5° (monkeys U and V) and 5° × 5° (monkey R); 9° × 9° (V4 monkeys); and 9° × 9° and 18° × 18° (PIT monkey). There were also five curvature values (−8, −6, ∞, +6, and +8 used in the MATLAB script, henceforth referred to as −2, −1, 0, 1, and 2 for simplicity), defined along a “convex–concave” axis based on each site's orientation preference and stimulus position (“concave” indicates curvature that was concave toward fixation—or concentric—and “convex” indicates curvature that was convex toward fixation; Fig. 2c,d; see Materials and Methods for a more quantitative description).
RFs and stimulus set. a, Response fields for every site recorded per animal. Each panel represents retinotopic space and each blue–yellow circle shows the region of space that elicited the highest responses for every site (2 SDs over the mean firing rate). Each circle is weighed by peak firing rate magnitude. The red square outline shows the position of the image, the small red square is the fixation point, and the dashed white lines show the horizontal and vertical meridians. b, RF width as a function of eccentricity for each monkey/visual area (colors: V1, V4, and PIT). Each point represents one response field in a. Transparent circles highlight RFs that also passed a strict statistical test (median response > 0 for all used positions, Wilcoxon rank-sum test, p < 0.05), and the radius of each transparent circle shows its relative firing rate magnitude. c, Subset of “banana” Gabors showing variations in size and curvature; there were seven more similar subsets at different orientations. The red open square outline in c corresponds to that in a. Small red squares represent the fixation point and illustrate our definitions for “convex” and “concave”: if the inner surface of the curve faced the fixation point (upper right, positive curvature), then the curve was described as “concave.”
RF statistics and stimulus information
Tuning for orientation, size, and curvature across visual areas
We investigated whether the banana Gabor stimuli elicited reliable responses from each site as defined by one-way ANOVA conducted for orientation, size, or curvature (each test computed using the trial-by-trial responses of a given site to variations of each modality; Fig. 3a). We found that the majority of sites in V1 and V4 showed reliable response modulation to at least one of these features, in contrast to a minority of PIT sites. Orientation modulated the responses of 68 ± 3% of V1 sites (mean ± SEM; p < 0.05, one-way ANOVA), 59 ± 2% of V4 sites, and 17 ± 3% of PIT sites. Size modulated 60 ± 3% of V1 sites, 66 ± 3% of V4 sites, and 34 ± 4% of all PIT sites. Finally, curvature modulated 57 ± 3% of V1 sites, 66 ± 2% of V4 sites, and 14 ± 3% of PIT sites (Fig. 3b, top). One reason that some sites might not have been tuned to the banana Gabors is that the stimulus could have been centered outside of their aggregate RFs. Because these were ensemble recording experiments, we could not concurrently optimize the stimulus position for all sites. To determine how image placement affected the reliability of size and curvature tuning, we repeated the one-way ANOVA test using a subset of sites that showed modulation to changes in orientation for the smallest straight Gabor (p < 0.05, one-way ANOVA). This set of sites was meant to reveal whether our unbiased-sample population added noise by being off-center. We will refer to this set as the “best-centered” sites. This was a strict selection criterion because the smallest Gabors could be smaller than the median RF size per area (Table 1). This best-centered set was small: only 24% of all our V1 sites (81/334) showed orientation selectivity using the smallest Gabor (p < 0.05, one-way ANOVA with orientation as the sole factor). In V4, 50% of sites (190/382) passed the test, as did only 14% of PIT sites (20/144). Having isolated this best-centered group of cells, we investigated again how many of them showed reliable tuning to orientation (using responses to all sizes), size, and curvature. The resulting trends with these best-centered sites were not qualitatively different from the general population (Fig. 3b, bottom, Table 2). We will continue to use all sites in subsequent analyses, sometimes repeating the analyses using the best-centered subset as a control.
Tuning to orientation, size, and curvature. a, Responses of one V4 multi-unit (monkey T) to changes in orientation (left), size (middle), and curvature (right) (all show mean ± SE). Orientation tuning was computed at the preferred size using straight Gabors, size tuning was computed using the preferred orientation and all curvatures, and curvature tuning was computed at the preferred orientation and size. All values are baseline subtracted. b, Percentage of sites within each area with tuning (p < 0.05, one-way ANOVA) for orientation, size, or curvature tuning (top, all sites; bottom; only sites with strong orientation tuning at the smallest Gabor size). Symbols show mean percentage ± SE via bootstrap. c, Distributions of preferred values for orientation (left), size (middle), and curvature (right) within each area using sites with strong tuning (p < 0.05, one-way ANOVA). Top row, Results using all sites; bottom row, results using only the best-centered sites. Each point is a mean percentage ± SE via bootstrap. Circles around each point denote a statistical deviation from a flat distribution (via bootstrap test). The dashed black line shows the distribution of preferred values expected by chance.
Percentage of sites with reliable tuning for orientation, size, and curvature
Given that many sites showed significant modulation by orientation, size, and curvature, we then investigated whether some feature values were preferred over others. There were eight different orientations (Norient), four sizes (Nsize), and five curvature values (Ncurva). If the populations showed uniform preferences for all feature values, the percentage of sites preferring each value should be 12.5% for orientation, 25% for size, and 20% for curvature. We calculated the percentage of sites that preferred each stimulus value and used a bootstrap to determine whether each calculated percentage was statistically different from the uniform values (if the 95% confidence intervals of the bootstrap distribution did not include the uniform value, it was designated as statistically different). We found that, on average, sites showed a approximately uniform distribution only for orientation. In V1, V4, and PIT, many orientation values were statistically underpreferred or overpreferred, but the mean absolute deviation from the baseline value of 12.5% was 3.0 ± 0.5% (V1), 3.5 ± 0.5% (V4), and 5.1 ± 0.9% (PIT). This is in contrast to size, where the percentage deviation for some values were 18.9 ± 1.3% (V1), 16.0 ± 1.3% (V4), and 8.3 ± 1.9% (PIT), and curvature, 14.4 ± 1.0% (V1), 13.4 ± 0.6% (V4), and 12.7 ± 1.5% (PIT; Fig. 3c). Nearly 50% of V1 sites showed their highest responses to the largest banana stimulus, whereas V4 sites were mostly end-stopped, as has been noted previously (Desimone and Schein, 1987). When it came to curvature values, most V1 sites showed a bias for straight Gabors, most V4 sites preferred curved stimuli and PIT sites preferred both straight and positive (concave) curvature. We explore these preferences in detail below.
Interactions in tuning
Many sites showed tuning for orientation, size, and curvature: how did these preferences interact? There are three possible interactions to consider: curvature preferences versus size preferences (i.e., end-stopping), orientation versus size, and orientation versus curvature. Because curved Gabors contain multiple orientations, orientation and curvature are not separate stimulus features and we will not explore this relationship further. We begin with the interaction between curvature and size because this was our key hypothesis and then we will explore interactions between orientation and size.
Curvature versus end-stopping
For each site, we plotted its curvature tuning at different sizes (and vice versa). Figure 4a shows three typical sites: sites a and c were length summating in that they responded better to large Gabors than to small; these sites responded best to straight Gabors compared with curved ones. The middle site (Fig. 4b) was end-stopped because it responded better to short Gabors than to large and this site responded better to either concave or convex stimuli compared with straight stimuli. Based on fMRI findings, we had hypothesized the existence of such an inverse relationship between end-stopping and curvature such that end-stopped neurons should have a preference for curved stimuli. To quantify the relationship between size and curvature tuning across the population, we used an end-stopping index and a curvature index (CI) for each site. The CI is >0 if the site responds more to the curvy Gabors, 0 if the responses are equal to both curvy and straight Gabors, and <0 if the straight Gabor is preferred. The end-stopping index is positive if a cell responds best to the smaller stimuli and negative if the site responds best to the largest stimuli. We found that sites that showed monotonically increasing responses to increasing size (length-summating sites) preferred straight Gabors, whereas sites that showed a smaller response to the longest stimuli (end-stopped sites) tended to prefer curved stimuli. This is apparent as a correlation between curvature and end-stopping indices for all recorded sites: the Pearson correlation coefficients were 0.66 for V1 sites (n = 334), 0.59 in V4 (n = 382), and 0.48 in PIT (n = 144) (p < 3 × 10−3 for all tests, permutation test; Fig. 4b). We also repeated this correlation analysis using only the best-centered sites and the trend was similar (Pearson correlation coefficient were 0.60 for V1 sites, 0.59 in V4, and 0.65 in PIT; p = 3 × 10−9, 2 × 10−19, and 2 × 10−3, respectively).
Relationship between end-stopping and curvature. a, Size-tuning in three multi-units (site A: V1, monkey V; sites B and C: V4, monkey T) using Gabors with five different curvatures (colors per legend). EI, End-stopping index. All values are baseline subtracted and normalized. b, CIs for every site as a function of its end-stopping index across all areas (V1 = red, V4 = black, PIT = blue). Each point shows the indices for one site and the transparent circles highlight the sites that were well centered. The colored lines show the total least-squares regression. c, Marginal frequency distributions of CI values across all visual areas.
Part of our original hypothesis was that sites with more eccentric RFs would show a preference for progressively straighter stimuli (for a given size). Therefore, we investigated whether the CI trended toward more negative values as RF eccentricity increased. Because we used chronically implanted arrays for the majority of these experiments, with stimuli scaled to aggregate RF size, we could not sample responses across a wide eccentricity range for each area, so we had no reason to expect a strong correlation. However, when we fitted a regression line to the CI versus eccentricity distribution for each animal's population, most of the individual populations showed a trend toward negative curvature values with increasing eccentricity (Table 3). Therefore, there was a trend for sites with more eccentric RFs to prefer straighter Gabors.
Relationship between curvature and eccentricity
As before, it was important to ensure that the relationship between curvature preferences and end-stopping was not simply due to off-center placement of the stimulus. We already presented one control, which was to examine that relationship only in the best-centered sites. A more comprehensive version of this control is to use the distances from the banana Gabor center to each site's RF center. We used this Gabor-to-RF distance as one predictor of the CI in a linear regression analysis. We fit one linear model per area (V1, V4, or PIT): in each model, the dependent variable was the CI distribution for the given area and the predictor variables comprised the size tuning index per site, each site's RF distance to stimulus center and a measure of tuning strength (the F-statistic of the site's orientation tuning using the smallest straight Gabor). We found that the end-stopping index was a reliable predictor of CI while controlling for RF-stimulus distance and tuning strength (for area V1, the linear regression weight was 0.22, 95% confidence interval: 0.20 to 0.25; for V4, 0.18, 95% confidence interval, 0.15 to 0.21; and for PIT, 0.08, 95% confidence interval, 0.06 to 0.11; Table 4). Each of these three models were good fits to the dependent variable (F-statistic vs constant model ranged from 19 to 90 for all three models, p < 9 × 10−11). Therefore, we conclude that end-stopped sites tended to prefer curved stimuli even after accounting for RF-stimulus distance and tuning strength.
Curvature index linear regression weights
Imaging studies show that there is a large region in the parafoveal representation of dorsal V4 that responds to curved objects more so than to rectilinear objects (Yue et al., 2014). This posterior curvature patch occupies much of dorsal V4 and has higher sensitivity to simple curvature compared with more anterior curvature patches, which prefer more complex curved features. We quantified the marginal frequency distribution of our CI for each area to determine whether our sampling showed this pattern (Fig. 4c). The CI is >0 if a given site responds more to the curvy Gabors, 0 if the responses are equal to both curvy and straight Gabors, and <0 if the straight Gabor is preferred. The median CI values for V1, V4, and PIT (±SE) were −0.20 ± 0.02, 0.23 ± 0.02, and −0.02 ± 0.02, respectively. Therefore, as a population, V1 sites preferred the straight Gabor, V4 showed preferences for the most curved Gabors, and PIT showed an intermediate preference level for both. Because our curvature stimuli were relatively simple, this is consistent with the imaging data and with previous interpretations of V4 as a processing hub for simple curvature.
Preferences for curvature vary along the visual hierarchy
We found that V1 sites tended to prefer straight Gabors, whereas V4 and PIT sites tended to prefer curved Gabors (Fig. 3c). In previous studies, V4 neurons were found to prefer one direction of curvature to its opposite, especially when that curvature is the convex part of a bounded object (Pasupathy and Connor, 1999). Most of the V4 sites that we tested also showed a dominant preference for one direction of curvature, but did not as a population prefer either concavity or convexity. To further explore convexity/concavity tuning in these areas, we fit each site's curvature tuning with a polynomial of the form R = Aquadx2 + Alinearx + C, where the linear weight Alinear can be thought of as a convexity-concavity preference value: Alinear ≪ 0 if convex Gabors are preferred, Alinear ≫ 0 if concave Gabors are preferred, and Alinear = 0 if neither is preferred over the other. The quadratic component Aquad can be interpreted as a curvature preference where, given Alinear = 0, Aquad ≫ 0 if both curved stimuli (convex or concave) are preferred, Aquad ≪ 0 if straight stimuli are preferred over curved, and Aquad = 0 if curved and straight stimuli are equally effective. Site 1 in Figure 5a has a negative Aquad; it responds best to straight stimuli and not well to either direction curved stimuli. Site 2 had a positive Aquad and it responded better to either concave or convex stimuli than to straight. Site 3 had a positive Alinear and a positive Aquad and it preferred concave stimuli to either straight or convex. We plotted the values of Aquad against Alinear for each site and found that most V1 sites showed a negative Aquad (i.e., they preferred straight to curved stimuli) with Alinear centered on zero (they did not prefer either concave or convex stimuli). Most V4 sites showed positive Aquad values (they preferred curved stimuli to straight) and well distributed Alinear values because most preferred either concave and convex Gabors. Most PIT sites showed positive Alinear values (i.e., they favored stimuli that were concave toward fixation; Fig. 5b,c). To determine the exact percentage of sites with preferences for convex, concave, or symmetric tuning per area, we binned each area's Alinear values into five bins, nominally described as “convex” (Alinear < −0.15), “slightly convex” (−0.15 < Alinear < −0.05), “symmetric” (−0.05 < Alinear < 0.05), “slightly concave” (0.05 < Alinear < 0.15), or “concave” (Alinear > 0.15). Thirty-four percent of all V1 sites had symmetric Alinear values compared with 20% of V4 sites and 20% of PIT sites. In V4, most sites were “slightly convex” or “slightly concave” (24%, 23%), whereas most PIT sites were “slightly concave” (31%).
Convexity–concavity tuning in three visual areas. a, Examples of convexity-concavity tuning of three different sites in V1 (left), V4 (center), and PIT (right) measured at each site's preferred orientation and size. Each point on the black curve shows the mean response and SE. Red lines show the polynomial fit. All values are baseline subtracted. b, Tuning fits for sites in V1 (left), V4 (center), and PIT (right). Each row shows a different site and each column shows a graded curvature continuum. Sites were ordered by their Alinear value. c, Scatterplot showing each site's Alinear value (abscissa, convexity vs concavity preferences) and Aquad values (ordinate, symmetry value), all presented according to area (colors). d, Marginal percentage distribution of Alinear values.
Because we found many sites that preferred concave Gabors, especially in PIT, we looked for evidence that this preference might be correlated with selectivity for real-world bounded objects, whose outline is usually concave relative to fixation. We presented faces (curvy) and artificial gadgets (rectilinear) such as chairs, monkey cages, and monitors (20 faces, 20 gadgets; Fig. 6a). We investigated whether, in each visual area, curvature selectivity correlated with classification accuracy for faces and objects. We sorted all sites by their CI and called the bottom third of all sites the “low-CI” population and the top third as the “high-CI” population. Across all V1 sites, the mean CI value of the lower third group was −0.30; for the high-CI group, it was −0.01; across all V4 sites, −0.45 and −0.02; and for the PIT sites, −0.50 and −0.03. We then trained linear classifiers (SVMs) to classify each image against each other (i.e., an one-vs-one scheme) using responses from either the low-CI or high-CI populations with 10-fold cross-validation. We corrected against biases by training/testing SVMs using shuffled labels and subtracted the classification accuracy of each shuffled-label SVM from the classification accuracy obtained using the correct labels. We found that, across all areas, low-CI and high-CI populations generally led to the same performance levels for individual image categorization (V1 low-CI and high-CI groups scored 5.00 ± 0.13% and 5.39 ± 0.15% over baseline; V4 sites, 5.80 ± 0.14% and 6.26 ± 0.15% over baseline; PIT, low-CI group scored at 4.77 ± 0.20% and the high-CI group at 6.27 ± 0.22%, respectively). We obtained these values by averaging all image versus image classification accuracy scores regardless of whether the image was a face or an object. Next, we compared the performance of the classifiers on distinguishing images of the same category (faces vs faces, objects vs objects). Here, we found that PIT populations with higher mean CIs were better at identifying faces; the populations did not lead to strongly different levels of SVM accuracy, but the trend was statistically reliable, as we describe below. High-CI PIT cells allowed a 6.86 ± 0.27% intracategorical accuracy level for faces and 5.68 ± 0.30% for objects compared with low-CI PIT cells, which scored 5.44 ± 0.26% and 4.10 ± 0.24%. These differences were less evident as we moved lower in the visual hierarchy: high-CI V1 cells allowed a 5.73 ± 0.18% intracategorical accuracy level for faces and 5.05 ± 0.24% for objects compared with low-CI cells, which scored 5.42 ± 0.18% and 4.57 ± 0.12%; high-CI V4 cells allowed a 6.62 ± 0.16% intracategorical accuracy level for faces and 5.90 ± 0.25% for objects compared with low-CI cells, which scored 6.30 ± 0.18% and 5.30 ± 0.14% (Fig. 6b). Because these differences were small, we used a three-factor ANOVA to determine whether these differences were likely to emerge from the same underlying distribution of accuracy scores. This three-way ANOVA was run on a sample of 240 scores to examine the contributions of visual area (V1, V4, and PIT), CI value (low vs high), and image category (face vs object). Overall, the model showed statistical effects for all three factors: there was an area effect (F(2,30) = 16.2, p < 10−10), a CI value effect (F(1,230) = 42.1, p < 10−10), and a category effect (F(1,230) = 63.1, p < 10−10). There was also a CI by area interaction (F(2,230) = 8.8, p < 0.0002). A contrast using the MATLAB multcompare.m function showed that the marginal accuracy mean for the PIT high-CI population, when classifying faces, was statistically higher than its classification accuracy for objects, and was also higher than the low-CI PIT population scores for either category.
Curvature tuning and preferences for faces. a, Images used to test preferences for faces and objects. b, Classification accuracy (minus shuffled-label baseline) within each area using low-CI or high-CI sites (black and white). c, Classification accuracy within each area by low-CI and high-CI sites when classifying faces versus faces or objects versus objects.
In summary, we found that sites at multiple stages along the visual hierarchy showed differences in their preferences for curved Gabors, with V1 sites favoring straight Gabors, V4 sites favoring curved Gabors, and PIT sites showing a bias for concave Gabors. Given the prevalence of face selectivity in IT, this raised the possibility that face selectivity may owe some of its origins to this rudimentary effect. We trained linear classifiers using groups of sites in every area that differed in their CI value and found that the PIT cells with higher CI values had a small but reliable advantage of 1.4% accuracy at face classification.
Size versus orientation tuning
We found that end-stopping and curvature were correlated across the ventral stream, consistent with previous fMRI findings (Srihasam et al., 2014) and one electrophysiology study in the cat (Dobbins et al., 1987). In addition, we collected responses to various combinations of orientation and size values. There are precedents in the literature reporting an association between orientation and size tuning: V1 cells show a reduction in orientation tuning width for Gabors of increasing size without a change in preferred orientation (Chen et al., 2005). To determine whether this association was present beyond V1, we conducted the following analyses. For each site in our dataset, we plotted its orientation tuning as evoked by straight Gabors of different stimulus sizes (Fig. 7a). We fit each orientation tuning with a von Mises function to compute each site's preferred orientation (θ0u,s, where u is the site and s is the stimulus size) and tuning width (du,s) and investigated whether orientation tuning changed as a function of size. First, for each site, we subtracted the preferred orientation measured with the largest Gabors from the preferred orientations measured with the smaller Gabors. We found that none of the three areas showed a statistical relationship between orientation preference and stimulus size: V1 sites showed a mean tuning center difference of 3.3° ± 1.8° across sizes (p = 0.84, F(2,223) = 0.17, one-way ANOVA, size as factor), V4 sites 9.6° ± 4.2° (p = 0.53, F(2,238) = 0.63) and PIT sites −1.4° ± 1.1° (p = 0.97, F(2,86) = 0.04; Fig. 7b). In contrast, both V1 and V4 sites showed a reduction in tuning width with increasing stimulus size: V1 sites showed a reduction in tuning width value from 1.5 to 0.7 (p = 3.6 × 10−4, F(3,305) = 6.32, one-way ANOVA) and V4 sites from 1.2 to 0.7 (p = 0.46, F(3,339) = 2.7), whereas PIT sites showed a nonlinear relationship between tuning width and stimulus size that was not statistically reliable (p = 0.29, F(3,123) = 1.27; Fig. 7c). This confirmed previous observations that V1 sites' optimum orientation does not change as a function of size, whereas tuning width narrows for larger stimuli (Chen et al., 2005; Liu et al., 2015), and we further extended this observation to area V4.
Relationship between orientation tuning and size. a, Orientation tuning for three V1 sites (animal V); each plot shows responses to a straight Gabors at four different sizes (indicated by colors). b, Mean difference in preferred orientation (±SEM) as a function of Gabor size relative to the biggest Gabor size. Each curve shows a different area. c, Mean Gaussian tuning width as a function of stimulus Gabor size. Asterisks show which tuning curves were statistically different from a flat model (p < 0.050, one-way ANOVA).
Discussion
Category-selective domain locations in IT are correlated with retinotopic organization (Levy et al., 2001) and with maps of curvature selectivity (Nasr and Tootell, 2012; Kornblith et al., 2013; Srihasam et al., 2014; Yue et al., 2014). It has been suggested that the correlation between retinotopy and category arises because of preferred viewing behavior for different categories of objects: faces are mapped to central visual field representations in IT and scenes to more peripheral eccentricities because we foveate faces and view scenes more with the peripheral visual field (Hasson et al., 2002). However, the causality is unclear: when a primate experiences a visual category, does its cortex develop functional domains specialized for this category with a consequent refinement of low-level features (e.g., curvature tuning) that optimize the domain's preference? Or does cortex have multiple retinotopic proto-maps that then localize selectivity for different visual categories based on low-level features? There is a correlation between low-level image features and category domains (Rice et al., 2014; Andrews et al., 2015) but, again, causality could go either way: differences in the average statistics of different categories could bias category-selective domains to prefer low-level features common to their category or a map of low-level feature selectivity could govern the localization of category domains. Here, we provide evidence for a neural mechanism that can link these hypotheses: because of the prevalence of end-stopping, a retinotopic map necessarily generates a curvature selectivity map, and such a low-level feature map could determine the locations of category domains both because of the statistical differences in low-level features of those categories and because of differences in viewing behavior.
In the visual system, RF size varies with eccentricity (Hubel and Wiesel, 1974; Dow et al., 1981). Because many cells in V1 and beyond are end-stopped (Hubel and Livingstone, 1987), RF size should be correlated with selectivity to changes in orientation such as end points, corners, and curvature (Hubel and Wiesel, 1965). In fact, this has been shown in cat V1 (Dobbins et al., 1987) and, in this study, we have shown that neuronal sites with end-stopped RFs are more selective for curvature compared with sites with length-summating RFs across the ventral stream. This result explains the fMRI observation that that maps for eccentricity are correlated with maps for curvature (Srihasam et al., 2014). Imaging evidence suggests that patches along the ventral stream (from V4 to anterior IT) have increasing preferences for curvature (Wilkinson et al., 2000) and that curvature patches are located close to face patches (Yue et al., 2014). Our electrophysiology results showed a related elaboration in curvature tuning along the ventral stream, where most V1 sites showed preferences for straight Gabors, whereas most V4 and PIT sites showed preferences for curved Gabors. We further found that V4 and PIT sites differed in the symmetry of their tuning: most V4 sites responded best to either convex or concave curvature, whereas PIT sites showed preferences for contours that were concave toward fixation.
Convexity and concavity have been the focus of many psychophysical and imaging studies of figure–ground segregation. In those studies, convexity and concavity were defined relative to the object center (Hoffman and Richards, 1984; Driver and Baylis, 1996; Pasupathy and Connor, 1999; Haushofer et al., 2008). Our curved Gabors were ambiguous with respect to figure–ground, so our use of the terms “convex” and “concave” are not the same as in previous studies (we defined concave and convex relative to the fixation point). We found that the population of PIT cells preferred curvatures concave toward the fixation point, an image feature that would occur on fixating a round object such as a face. To test this association, we investigated whether PIT populations with higher CIs showed any advantage in intracategorical face/face encoding compared with intracategorical object encoding. The results showed a small but statistically reliable ability of curvature-preferring PIT sites to perform better in intracategorical face discrimination compared with the ability of rectilinear-preferring sites. We were somewhat surprised by the small advantage in accuracy, but the distributed nature of cortical encoding is a good a priori reason for why any randomly sampled set of PIT sites could allow classifiers to perform well (Haxby et al., 2001).
Others have shown that V4 cells prefer curved to rectilinear objects (Gallant et al., 1993) and that these curvature preferences are specific for one given combination of curvature and orientation values (Pasupathy and Connor, 1999). In our sample, although some V4 sites responded well to both convex and concave curvature (relative to fixation), most sites preferred one over the other, consistent with previous studies (Pasupathy and Connor, 1999). However, we also discovered some (20%) of V4 sites with convexity/concavity tuning curves that were symmetric. This could reflect one of the following: (1) electrodes recording at sites with columns of single cells with true mirror selectivity for curvature, akin to IT face cells that are view invariant; (2) electrodes recording at the junction of curvature-tuned columns of single cells with opposing preferences; or (3) electrodes recording at sites with single cells that particularly like differences in orientation between the center and surround despite the center stimulus preference. Indeed, several mechanisms could account for curvature selectivity; for example, alignment of inputs along a curved contour, nonspecific surround suppression, or orientation-selective end-stopping. Responsiveness to both signs of curvature would be consistent with the second and third mechanisms above, but not the first. Moreover, it has been shown that many neurons show stronger suppression when the stimulus orientation at the RF center is the same as stimulus orientation at the RF surround (Hubel and Wiesel, 1965; Cavanaugh et al., 2002; Shen et al., 2007; Trott and Born, 2015), which mitigates against the second mechanism. Therefore, our results and those of others support the idea that end-stopping can give rise to selectivity for curvature over straight contours without requiring curvature to be generated by precise alignment of inputs along a curved trajectory.
Footnotes
↵*C.R.P. and T.S.H. are co-first authors.
This work was supported by the National Institutes of Health (National Eye Institute Grant NEI EY 16187 to M.S.L. and Core Grant for Vision Research NEI EY12196; Grants T32 NS007484 and NEI F32 EY025523 to T.S.H.; and Grant R01 EY011379 to R.T.B.), the Eric M. Mindich Research Fund for the Foundations of Human Behavior (M.S.L.), and the Burroughs Wellcome Fund (C.R.P.). We thank Richard T. Born for comments on the manuscript and for facilitating some of the experiments; John Maunsell, Bram-Ernst Verhoef, and Thomas Luo for facilitating some of the experiments; and Tim LaFratta and John LeBlanc for machine shop support.
The authors declare no competing financial interests.
- Correspondence should be addressed Carlos R. Ponce, M.D., Ph.D., Department of Neurobiology, Harvard Medical School, 220 Longwood Avenue, WAB 229C, Boston, MA 02115. crponce{at}gmail.com