Abstract
When we view an object, its appearance depends in large part on specific surface reflectance properties; among these is surface gloss, which provides important information about the material composition of the object and the fine structure of its surface. To study how gloss is represented in the visual cortical areas related to object recognition, we examined the responses of neurons in the inferior temporal (IT) cortex of the macaque monkey to a set of object images exhibiting various combinations of specular reflection, diffuse reflection, and roughness, which are important physical parameters of surface gloss. We found that there are neurons in the lower bank of the superior temporal sulcus that selectively respond to specific gloss. This neuronal selectivity was largely maintained when the shape or illumination of the object was modified and perceived glossiness was unchanged. By contrast, neural responses were significantly altered when the pixels of the images were randomly rearranged, and perceived glossiness was dramatically changed. The stimulus preference of these neurons differed from cell to cell, and, as a population, they systematically represented a variety of surface glosses. We conclude that, within the visual cortex, there are mechanisms operating to integrate local image features and extract information about surface gloss and that this information is systematically represented in the IT cortex, an area playing an important role in object recognition.
Introduction
Objects have specific surface reflectance properties that depend on their material composition and the fine structures of their surfaces. Our visual system is able to extract information about these surface reflectance properties from the retinal image, and the resultant perception of surface quality plays an important role in the identification of materials and the recognition of objects (Hunter and Harold, 1987; Adelson, 2001; Maloney and Brainard, 2010). Attempts to understand the neural processing underlying the perception of surface qualities have emerged in recent years (Arcizet et al., 2008; Köteles et al., 2008), and functional imaging studies in human subjects have shown that the ventral higher visual areas are activated when subjects attend to or discriminate materials (Cant and Goodale, 2007, 2011; Cant et al., 2009; Cavina-Pratesi et al., 2010; Hiramatsu et al., 2011).
In the present study, we used a set of stimuli with different reflection properties to examine how surface reflectance property is represented in the brain. An important component of surface reflectance is gloss, which strongly influences surface appearance and changes depending on the material composition and smoothness of a surface. Three reflection parameters that have been shown to be particularly important for characterizing surface gloss are specular reflectance, diffuse reflectance, and roughness (Cook and Torrance, 1982; Ward, 1992; Ngan et al., 2005) (Fig. 1A). In the present study, we manipulated these parameters to generate a set of visual stimuli and recorded the activities of single units in the monkey visual cortex to explore neurons selective for surface gloss and to examine the response properties of these cells.
It is well known that the inferior temporal (IT) cortex plays a key role in the visual recognition of objects. Neurons selectively responsive to complex patterns, such as a face, and those selective for texture and color have been shown to reside there (Bruce et al., 1981; Perrett et al., 1982; Desimone et al., 1984; Tanaka et al., 1991; Komatsu et al., 1992; Kobatake and Tanaka, 1994; Eifuku et al., 2004; Tsao et al., 2006; Conway et al., 2007; Yasuda et al., 2010). In addition, activities related to encoding both the three-dimensional (3D) geometry of objects (Janssen et al., 2001; Yamane et al., 2008; Nelissen et al., 2009) and the illumination direction have also been recorded in the region within the superior temporal sulcus (STS) in the IT cortex (Vogels and Biederman, 2002; Köteles et al., 2008). Furthermore, a recent functional magnetic resonance imaging (fMRI) experiment using monkeys revealed activity in the STS that distinguished glossy from matte surfaces (Okazawa et al., 2011). These results suggest that a variety of information closely related to encoding surface gloss converge in the STS, and that this is an ideal area in which to explore the activities of neurons conveying information about the surface gloss of objects. We found that neurons selectively responding to specific glosses are present in the STS and that as a population these neurons systematically represent a wide range of glosses.
Materials and Methods
Surgery and recordings of neuron activities.
We recorded neuron activities from three hemispheres of two monkeys (monkeys “AQ ” and “TV ”; one male and one female; Macaca fuscata; weighing 5.8–6.2 kg). Before starting the physiological experiment, a head holder and a recording chamber (rectangular in shape with an opening 10 or 15 mm × 10 mm at the edge) were surgically attached to the skull under aseptic conditions and general anesthesia. Neuronal activities were recorded from the posterior bank of the STS, in the central part of the IT cortex (Fig. 2A). We did not explore the lateral convexity. The center of each recording chamber was located at 22 mm lateral and 8–10 mm anterior, based on the stereotaxic coordinates. Neurons were recorded extracellularly using tungsten microelectrodes (Frederick Haer) that were inserted vertically from the vertex through guide tubes fixed to a plastic grid within which holes were placed at intervals of 1 mm. By using two grids that were shifted 0.5 mm vertically and horizontally with respect to one another, a minimum interval of 0.7 mm between holes was attained. The activities of single neurons were isolated through on-line monitoring during recordings, as well as through off-line spike sorting using a template-matching algorithm. Off-line analysis confirmed that all of the data were single-neuron activities.
During the physiological recordings, we first mapped a wide region of the posterior bank of the STS and assessed the visual responses to stimuli with a variety of glosses. After mapping, guide tubes made of MRI-compatible metal (titanium or gold) were inserted into the brain, targeting the regions where gloss-selective neurons were observed (Fig. 2B). We then sampled the neurons in these regions extensively. The tips of the guide tubes were positioned ∼1 cm above the targeted cortical regions. While the guide tubes remained inserted in the brain, we took MRI images to confirm the recording positions. All procedures for animal care and experimentation were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals (1996) and were approved by our institutional animal experimentation committee.
Experimental apparatus and the task.
During the experiments, the monkeys were seated in a primate chair and faced the screen of a CRT monitor (frame rate, 100 Hz; Totoku Electric) situated at a distance of 85 cm from the monkey. Eye position was monitored using an eye coil or an infrared eye camera system (ISCAN). Visual stimuli were generated using a graphics board (VSG; Cambridge Research Systems), and then presented on the CRT monitor. Image resolution was 800 × 600 pixels (30 pixels/°). Monkeys were required to fixate on a small white spot (visual angle, <0.1°) at the center of the display. A trial started with the presentation of the fixation spot, after which stimuli were presented five times within a trial. Each stimulus presentation lasted 300 ms. The first stimulus was presented 800 ms after the monkey started fixating and was followed by four stimuli with 300 ms interstimulus intervals. Monkeys were rewarded with a drop of juice 300 ms after turning off the last stimulus. Monkeys had to maintain eye position within a 2.6 × 2.6° window centered at the fixation point. If the eye deviated from the eye window, the trial was canceled, and an intertrial interval (ITI) started. The duration of the ITI was 1000 ms. When the stimulus was presented on the fovea, the fixation spot was turned off after the first 500 ms of presentation to avoid interference between the fixation spot and the visual stimulus.
Visual stimuli.
To assess the selectivity for surface reflectance of neurons in the STS, we generated visual stimuli having 33 types of surface reflectance selected from the MERL BRDF dataset (http://www.merl.com/brdf/) (Fig. 1B). Bidirectional reflectance distribution function (BRDF) is one of the most general methods for quantitatively characterizing surface reflectance properties. This dataset contains BRDF data for ∼100 materials (Matusik et al., 2003), and we selected 33 surfaces with the aim of producing stimuli that were as dissimilar in appearance as possible. These 33 surfaces selected covered nearly the entire range of MERL BRDF dataset. The surface reflection of many materials can be represented by a combination of two components (diffuse reflection and specular reflection), and the reflection properties can be characterized by three parameters: diffuse reflectance (ρd), indicating the strength of the diffuse reflection; specular reflectance (ρs), indicating the strength of specular reflection; and roughness (α), indicating the microscopic unevenness of the surface that causes the spread of specular reflection (Fig. 1A). Examples of the appearance changes caused by a change in each parameter are shown in Figure 1A. An object with low ρd and ρs is a black matte object (left). As ρd increases, the object becomes lighter (upper middle). As ρs increases, the object becomes shiny with sharp highlights if α is small, or with blurred highlights if α is large. To render the stimuli, ρd and ρs were set for R, G, and B separately because the color of the diffuse and specular reflections varied across surfaces. Roughness α did not depend on color. We thus controlled seven parameters (ρd_r, ρd_g, ρd_b, ρs_r, ρs_g, ρs_b, α), and the values for the Ward–Duer model, one of the BRDF models given in the study by Ngan et al. (2005), were used. Figure 1E shows the distribution of the reflection parameters in 3D space, which will be referred to as gloss stimulus space. In this plot, ρd indicates the mean of ρd_r, ρd_g, and ρd_b, while ρs indicates the mean of ρs_r, ρs_g, and ρs_b. Glossy stimuli with strong highlights (large ρs and small α) are located to the back and left, shiny stimuli with blurred highlights (large ρs and large α) are located to the back and right, and matte stimuli (small ρs) are located to the front and right. Although this plot ignores the variation of ρd and ρs across RGB channels, it can still capture essential features of gloss-selective neural responses. This gloss stimulus space will be used often in this paper because it is useful for visualizing stimuli and the gloss-selective responses of neurons.
We used LightWave software (NewTek) to generate 10 different 3D shapes (Fig. 1C). For the illumination environment, we used one of the high dynamic range images from the Devebec dataset (http://ict.debevec.org/∼debevec/) (Eucalyptus Grove; illumination 1) as the default. We rendered object images using Radiance software (http://radsite.lbl.gov/radiance/), using image parameters (surface reflectance, shape, illumination environment) as described above. Stimuli with shape 3 are shown in Figure 1B, and examples of stimuli with other shapes are shown in Figure 3A. In a control experiment to examine the effect of illumination, we used another illumination environment image from the Devebec dataset (Campus at Sunset; illumination 2) (Figs. 1D, bottom; 3C). The luminance values of the rendered images were linearly mapped to a low dynamic range using a mean value mapping method in which the mean value, including the background, was mapped to 0.5 and pixels that exceeded 1 were clipped. The object images were then cut out at the object contour. In a control experiment to examine selectivity for color and luminance, we used stimuli in which the pixels were randomly rearranged within the object contour (shuffled stimulus; Figs. 1D, top; 3B). The mean luminance of the objects ranged from 3.15 to 78.2 cd/m2, and the objects were presented on a gray background (10 cd/m2). The objects subtended ∼5° of visual angle and were usually presented on the fovea. When responses at the fovea were weak and stronger responses were evoked by stimuli presented at a position outside the fovea, stimulus selectivity was examined at that position (27 of 215 neurons recorded; 6 of 57 gloss-selective neurons) (see Results).
Test of gloss selectivity.
When we isolated a single neuron, we conducted a preliminary test to assess its responsiveness to visual stimuli. For this test, we used a stimulus set consisting of 15 surface reflectance properties, including three sets of gloss parameters (large ρs and small α, large ρs and large α, zero ρs) combined with five colors/lightnesses (red, green, blue, white, black). We tested the neural responses using this preliminary gloss stimulus set with 10 object shapes, and when a neuron responded to at least one of the test stimuli, we determined the optimal shape for that neuron. In the subsequent main experiment, we examined gloss selectivity in detail using object images with the optimal shape and the 33 types of surface reflectance. In the early part of the experiment, we used only two (shapes 3 and 9) or four (shapes 2, 3, 9, and 10) shapes (16 of 57 gloss-selective neurons described in Results). Neural responses were analyzed only for correct trials, and the minimum number of repetitions of each stimulus accepted for analysis was five. Mean firing rates were computed for a 300 ms period beginning 50 ms after stimulus onset. We then subtracted baseline activities that were computed for the 300 ms immediately before the onset of the first stimulus within a trial, and the resultant rate was taken as a measure of the neuronal response to the visual stimulus. Only neurons that showed response of >10 spikes/s and a significant increase in activity in response to at least one stimulus (p < 0.05, t test) were included in the sample of visually responsive neurons. The presence or lack of selectivity for the 33 types of gloss stimuli was examined using ANOVA, and the strength of the selectivity was quantified as a selectivity index that was defined as follows: 1 − (minimum response)/(maximum response). With this selectivity index, as selectivity increases, the index value increases and will exceed unity if the minimum response is less than the baseline activity. The sharpness of the selectivity was quantified using two indices: the number of stimuli that elicited responses with amplitudes more than one-half that of the maximum response and a sparseness index defined as follows: where ri is the firing rate to the ith stimulus in a set of n stimuli (Rolls and Tovee, 1995; Vinje and Gallant, 2000). If ri was a negative value, it was replaced to zero. The sparseness index indicates the degree to which responses are unevenly distributed across the set of stimuli. We used a modified version of the sparseness index (Vinje and Gallant, 2000) because we felt the result would be more intuitive if sharper selectivity yielded a larger index value. The sparseness index is at a minimum, with a value of 0, when responses to all stimuli have the same magnitude. As the stimulus selectivity becomes sharper, the index becomes larger. If only one stimulus among the set evokes a response, the index is at a maximum and is equal to 1.
Examination of the effects of shape and illumination.
To examine the effect of shape, we compared the responses to the gloss stimulus set across different object shapes (Figs. 1C, 3A). Responses were compared between the shape that yielded the strongest responses in the preliminary test (optimal shape) and that yielding the second-strongest responses (nonoptimal shape) by computing correlation coefficient between two sets of responses. We also conducted two-way ANOVA with gloss and shape as factors to examine the main effect and their interaction. In addition, to examine whether the strength of the selectivity is affected by the change in shape, we compared the gloss selectivity index between the responses to the optimal and nonoptimal shapes.
To examine the effect of illumination, we compared the responses to the gloss stimulus set rendered with the optimal shape across different illuminations (Fig. 3, compare A, C). Responses were compared between the default illumination (Eucalyptus Grove) and another illumination (Campus at Sunset) by computing correlation coefficient between two sets of responses. We also conducted two-way ANOVA with gloss and illumination as factors to examine the main effect and their interaction. In addition, to examine whether the strength of the selectivity is affected by the change in illumination, we compared the gloss selectivity index between the responses under two different illuminations.
To examine the effect of shape and illumination, we also used a separability index (Mazer et al., 2002; Grunewald and Skoumbourdis, 2004; Yamane et al., 2008) to quantify how well a neuron retained its selectivity for gloss across changes in shape or illumination. To compute the separability index for shape changes, we first tabulated the gross responses of each selective neuron in an m × n response matrix (M), where m and n corresponded to the different glosses and shapes, respectively. We then computed the singular value decomposition (M = USV′) of the response matrix. If selectivity for gloss is independent of the shape, the responses are fully explained by the first principal components (i.e., the product of the first columns of U and V); otherwise, the responses are explained by the second principal component to some extent. The separability index is defined as the squared correlation (r2) between the actual responses and the predicted responses reconstructed from only the first principal components. We used a permutation test to determine whether a separability index was significantly larger than chance. We randomly permuted the mean neuronal responses for different glosses within each tested shape, and computed a separability index for the reshuffled responses. Permuting the responses within but not across shapes ensured that the mean permuted response averaged across glosses for a given shape would be the same as the mean observed response. Permutations were performed 1000 times. If the separability index value obtained experimentally exceeded the 95th percentile of the distribution of the separability indices for the reshuffled responses, the neuron was deemed to have a separability index significantly larger than the chance level. We also assessed the extent to which the responses are explained by the second principal component obtained from the singular value decomposition. If the r2 between the actual responses and the predicted responses computed from only the second columns exceeded the 95th percentile of the distribution of the r2 for the reshuffled responses, the second principal component would be deemed to have made a significant contribution. The separability index for changes in illumination was computed in a similar manner.
Examination of the representation of gloss by the population of neurons.
To better understand how gloss-selective neurons represent gloss, we conducted multidimensional scaling (MDS) analysis. First, Pearson's correlation coefficients (r) between the responses of the population of gloss-selective neurons to all possible stimulus pairs were computed, then nonclassical MDS (nonmetric) was applied using 1 − r as a distance, and the result was plotted on a two-dimensional space. We also tested other distance metrics such as Euclidean distance or Spearman's correlation coefficient, but the results of the MDS analyses were similar, regardless of the distance metric used.
Results
Selective responses to a gloss stimulus set
We examined neural responses to a gloss stimulus set that consisted of 33 types of surface reflectance rendered in the optimal shape for each neuron. We found that there are neurons in the lower bank of STS that selectively respond to gloss. We penetrated electrodes to map neural responses at 101 positions (68 in monkey AQ, 33 in monkey TV) in the lower bank of the STS, in the posterior TE (A4–A16, L18–L26 in the lower bank of the STS) (Fig. 2C), and tested the responses using a preliminary gloss stimulus set. For neurons responsive to these stimuli, we examined the stimulus selectivity in more detail using the primary gloss stimulus set. Neurons responsive to glossy stimuli appeared to be localized within the region of the IT cortex that we had mapped, and guide tubes were inserted targeting the regions where these neurons were frequently encountered. In total, we recorded the activities of 215 neurons (147 from monkey AQ, 68 from TV) that responded to the gloss stimulus set. Of these, 194 neurons (129 from AQ, 65 from TV) exhibited selectivity (ANOVA, p < 0.05).
Figure 4 shows responses of three representative neurons (cells 1, 2, and 3) that exhibited selectivity for the gloss stimulus set. Cell 1 (Fig. 4A–C) strongly responded to stimuli with sharp highlights (e.g., stimuli 8 and 13) and did not respond to stimuli with weak glossiness (e.g., stimuli 1 and 33). This neuron showed strong and sharp gloss selectivity (gloss selectivity index, 1.08; sparseness index, 0.51). Only six stimuli evoked more than a half-maximal response. Stimuli that induced strong responses in cell 1 were clearly localized in gloss stimulus space (Fig. 4C): strong responses were evoked by stimuli with large specular reflectance (ρs) and small roughness (α).
Cell 2 (Fig. 4D,E) selectively responded to shiny objects with blurred highlights; that is, objects with large specular reflectance and large roughness (e.g., stimuli 21 and 24) (gloss selectivity index, 0.95; sparseness index, 0.46). Only three stimuli evoked more than a half-maximal response in this neuron.
Cell 3 (Fig. 4F,G) exhibited modestly sharp selectivity to gloss stimulus set broader than cells 1 and 2 (gloss selectivity index, 1.05; sparseness index, 0.32), with nine stimuli evoking more than a half-maximal response. This neuron strongly responded to matte stimuli without clear highlights and those with small specular reflectance and large roughness.
Effect of object shape and pixel shuffling within the stimulus
The results described above suggest there are neurons that selectively respond to images of objects with a specific gloss. However, images in the gloss stimulus set also varied with respect to their local luminance pattern; that is, glossy stimuli have sharp light spots corresponding to highlights whose patterns are roughly constant as long as the object shape and illumination environment are unchanged. It was therefore possible that the selective response of cell 1 was due to the presence of a specific pattern of highlights in some stimuli. To test this possibility, we recorded the responses of the same neurons to the gloss stimulus set rendered on a different 3D shape and assessed whether the change in shape affected stimulus selectivity. In Figure 5A, the red line indicates the rank order of the responses of cell 1 to the gloss stimulus set when the optimal shape (shape 3) was used. The blue line indicates the responses of the same neuron when a nonoptimal shape (shape 2) was used and the responses were aligned according to the same stimulus order as the red line. This neuron exhibited significant main effects of both surface reflectance and object shape (two-way ANOVA, p < 0.05), as well as a significant interaction between the two. This means that there was some difference in the pattern of gloss selectivity between the two shapes. More importantly, however, the overall pattern of responses to shape 2 was similar to the pattern of responses to the optimal shape, and there was a clear tendency for the responses to gradually decline along the horizontal axis. Responses to the gloss stimulus set showed a strong correlation between the optimal and nonoptimal shapes (r = 0.86; Fig. 5A, inset), which significantly differed from zero (p < 0.05). These results indicate that even when the local luminance pattern was changed by changing the object shape, the gloss selectivity of this neuron was largely maintained; thus, stimulus selectivity does not appear to be due to the local luminance pattern.
Images in the gloss stimulus set also varied with respect to mean chromaticity and luminance. To exclude the possibility that the response selectivity was due to differences in the color and luminance of the stimuli, we tested the responses to shuffled stimuli in which the pixels were randomly rearranged within the object contour (Figs. 1D, top; 3B). In the shuffled stimuli, the luminance and color histograms of the pixels did not change, nor did the mean luminance and mean chromaticity, but the glossiness dramatically changed, particularly for the glossy stimuli. In Figure 5A, the black line indicates the responses of cell 1 to the shuffled stimuli aligned according to the same order as the red and blue lines. That cell 1 did not show clear responses (maximum, 1.71 spikes/s) to the shuffled stimuli reveals that the selective responses to the original stimulus set was not due to the mean color or luminance of these stimuli. In Figure 5B, responses of cell 2 to images rendered on a nonoptimal shape (shape 9) and to the shuffled stimuli are compared with the responses to the optimal shape. As with cell 1, the pattern of selectivity for the gloss stimulus set was highly correlated between the optimal and nonoptimal shapes (red and blue lines; r = 0.82; p < 0.01), and the responses to the shuffled stimuli were very weak (black line; maximum, 6.84 spikes/s).
The results were markedly different with cell 3, however (Fig. 5C). With this neuron, the responses to the gloss stimulus set were highly correlated between the optimal (shape 8) and nonoptimal (shape 4) shapes (red and blue lines; r = 0.87; p < 0.01), but, unlike cells 1 and 2, this neuron also strongly responded to the shuffled stimuli (black line; maximum, 25.6 spikes/s), and those responses also correlated with the responses to the optimal shape (r = 0.71; p < 0.01). This suggests that the activity of cell 3 was strongly influenced by low-level image features such as the mean luminance and chromaticity.
From the neurons that exhibited sufficiently strong (>10 spikes/s) and selective responses to the gloss stimulus set (ANOVA, p < 0.05), we isolated neurons that were likely selective for glossiness by using two criteria. First, a given cell should be responsive to a nonoptimal shape, and the patterns of stimulus selectivity obtained with the optimal and nonoptimal shapes should be significantly correlated (p < 0.05). Second, either the neuron does not show a significant response to the shuffled stimuli (<10 spikes/s and/or p > 0.05, t test) or the correlation between the patterns of stimulus selectivity obtained with the optimal shape and shuffled stimuli are not significant. Neurons satisfying these two criteria were defined as “gloss-selective.” Of the 194 neurons that exhibited selectivity for the gloss stimulus set in the optimal shape, we assessed the responses to more than one shape in 145, to the shuffled stimuli in 169, and to both in 139 neurons. The distribution of correlation coefficients obtained under each of these conditions is shown in Figure 6. The abscissa represents the correlation coefficient between the responses to the optimal shape and the shuffled stimuli, while the ordinate represents the correlation coefficient between the responses to the optimal and nonoptimal shapes. The scatter plot includes neurons recorded in both tests (shape change and shuffling), whereas the histograms include neurons that were tested in only one of these tests (open bars). Many neurons (118 of 145; 81%) exhibited significant correlation between the responses to the optimal and nonoptimal shapes. With regard to the responses to the shuffled stimuli, 54 neurons (54 of 169; 32%) did not show significant responses (leftmost bar in the histogram). Of the remaining 115 neurons that showed clear responses, the correlation between the responses to the optimal shape and shuffled stimuli was not significant in 51 (51 of 115; 44%). Of 139 neurons tested under both control conditions, 57 (31 from monkey AQ; 26 from TV) satisfied the two criteria for gloss-selective neurons listed above (Fig. 6, red circles). Cells 1 and 2 are examples of this group of neurons. However, 43 neurons showed a significant correlation between their responses to the optimal and nonoptimal shapes, and between their responses to the optimal shape and the shuffled stimuli (Fig. 6, blue circles). Cell 3 is an example of these neurons, which, presumably, selectively respond to the specific luminance or color of the stimuli. We also examined the stability of the selectivity of 57 gloss-selective neurons across a change in shape by using a separability measure (Fig. 7). All of the neurons had a significant separability index, and most had a separability index >0.7 (mean ± SD, 0.86 ± 0.08) (Fig. 7A). Representative examples of the interaction plot for four neurons are shown in Figure 7C. Moreover, only one neuron showed a significant r2 computed using the second principal component (Fig. 7B). Together, these results confirm that gloss selectivity is largely independent of the change in stimulus shape. Most of the gloss-selective neurons showed strong selectivity for the gloss stimulus set, with a selectivity index >0.6 (median, 1.02) (Fig. 8A), and many also showed sharp selectivity, with a sparseness index >0.3 (median, 0.43) (Fig. 8B). There was no significant difference in the strength of the selectivity index between the responses to the optimal shape (mean ± SD, 1.03 ± 0.18) and those to the nonoptimal shape (mean ± SD, 1.02 ± 0.18) (p > 0.05, t test). We also examined whether there was difference in eye movements to different stimuli that may explain the neural selectivity. When we compared the variance of eye position during presentation of the best stimulus (rank 1) and the worst stimulus (rank 33) for 57 gloss-selective neurons, only one showed significant difference (p < 0.05, t test). Similarly, only two neurons showed significant variation in the variance of eye position across 33 stimuli (p < 0.05, ANOVA). These results clearly indicate that the difference in eye movements cannot explain the neural selectivity to the gloss stimulus set.
We next examined how the responses of gloss-selective neurons were affected by a change in object shape or by image shuffling at the population level by computing the rank order of the population responses in a way similar to what was done in Figure 5. That is, we sorted the responses of each neuron to the nonoptimal shape and shuffled stimuli according to the rank order of the responses to the optimal shape and then averaged the responses across the population (Fig. 9A). We found that responses to the nonoptimal shape monotonically declined along the horizontal axis, which was similar to the pattern of responses to the optimal shape although the slope of the decline was shallower. The difference in the averaged responses between the optimal and nonoptimal shapes for nonpreferred stimuli (rightmost part of the graph) is likely due to nonsystematic differences in the rank order between the optimal and nonoptimal shapes. In contrast to the responses to the nonoptimal shape, the responses to the shuffled stimuli were nearly flat, although the slope of the linear fit (−0.07 spikes per s/rank) was significantly different from zero (p < 0.05, t test), suggesting that average color or luminance of the stimuli slightly affected the selectivity. However, when we computed the slope of the responses to the shuffled stimuli for each of the 27 gloss-selective neurons that exhibited significant response to the shuffled stimuli, only 2 had slope that was significantly different from zero (p < 0.05, t test). This confirms that little selectivity was retained after shuffling of the image pixels. For neurons that showed clear responses to the shuffled stimuli (Fig. 6, blue circles), responses to both the nonoptimal shape and shuffled stimuli showed similar monotonically decreasing patterns along the rank order of the optimal shape (Fig. 9B), indicating that the selectivity was maintained under both conditions. In the following, we will describe in more detail the response properties of the 57 neurons that satisfied both of the aforementioned criteria for gloss selectivity.
Stimulus preference of gloss-selective neurons
The preferred stimulus of gloss-selective neurons differed from cell to cell. Figure 10 shows two other examples of gloss-selective neurons: one (Fig. 10A,C) responded selectively to stimuli with large specular reflectance (ρs), small roughness (α), and sharp highlights, while the other (Fig. 10B,D) responded selectively to stimuli with large roughness, regardless of the specular reflectance.
To examine how gloss-selective neurons responded as a population to the gloss stimulus set, we computed the population response to each stimulus where the maximum response for each neuron was set to unity before averaging the responses. Figure 10E shows the normalized population average response to each stimulus. The population of gloss-selective neurons responded more or less to all of the stimuli, although there was significant variation in the response magnitudes across the stimulus set (ANOVA, p < 0.05). The ratio between the maximum and minimum of the normalized responses (to stimulus 13, 0.47, and to stimulus 33, 0.21, respectively) was 2.18, and there was a tendency for glossier stimuli to elicit stronger responses. This tendency was more clearly seen when the distribution of the preferred stimulus for each gloss-selective neuron was examined. Figure 10F depicts the number of neurons that showed a peak response to each stimulus in the gloss stimulus set. Peak responses frequently occurred with stimuli having large specular reflectance and little roughness, but occurred less frequently with stimuli having small specular reflectance.
Effects of the illumination environment
In all of the results described so far, object images were rendered under the same illumination environment (illumination 1, Eucalyptus Glove). Changing the illumination environment does not affect the apparent glossiness very much, as long as natural illumination is used (Fleming et al., 2003). Therefore, if the responses of gloss-selective neurons are related to encoding glossiness, we would expect that selectivity for the gloss stimulus set would be retained, even after the illumination environment was changed. To test that idea, we assessed gloss selectivity of 48 of the 57 gloss-selective neurons using stimuli in which an object with the optimal shape was rendered under different illumination (illumination 2, Campus at Sunset; Figs. 1D, bottom; 3C). In Figure 11A, the red line indicates the responses of cell 1 to the optimal shape illuminated under illumination 1 (same as the red line in Fig. 5A), and the blue line indicates the responses of the same neuron to the same stimulus set under illumination 2. The results are aligned according to the same order as the red line. We found that there was a clear tendency for the responses to gradually decline along the horizontal axis and that the responses to the stimulus set under the two illumination conditions were highly correlated (r = 0.81; p < 0.05) (Fig. 11A, inset). Figure 11B summarizes the effect of the illumination condition (abscissa) and object shape (ordinate) on the activity of gloss-selective neurons tested under the two illumination conditions. Given our definition of gloss-selective neurons, all of these neurons showed significant correlation between their responses to the optimal and nonoptimal shapes. Likewise, most of the neurons showed significant correlation between illuminations (40 of 48; 83.3%; red circles). This indicates that the gloss selectivity of these neurons was retained across different illuminations, which is consistent with the notion that apparent glossiness is rather stable under different natural illumination conditions. Analysis based on the separability measure also showed that the selectivity of gloss-selective neurons remains mostly stable under different illumination conditions (Fig. 12). All neurons but one showed a significant separability index, and most neurons showed separability index values >0.7 (mean ± SD, 0.84 ± 0.1) (Fig. 12A). In addition, only two neurons showed a significant r2 computed using the second principal component (Fig. 12B). These results confirm that gloss selectivity of these neurons is largely independent of a change in illumination.
To further examine how the population of gloss-selective neurons was affected by the illumination condition, the rank order of the responses obtained under illumination 2 was compared with that obtained under illumination 1 (Fig. 12C, red and blue lines, respectively). The average responses obtained under illumination 2 gradually decreased along the rank order of the responses obtained under illumination 1 (abscissa), indicating that selectivity was largely maintained at the population level. As was the case under illumination 1, gloss-selective neurons showed strong selectivity to the gloss stimulus set under illumination 2. The selectivity index was even higher, although marginally, under illumination 2 (mean ± SD, 1.08 ± 0.20) than under illumination 1 (mean ± SD, 1.04 ± 0.19) (p = 0.032, t test).
Population encoding of gloss
How are different glosses encoded by the activities of gloss-selective neurons? Knowing which pairs of stimuli were differentiated and which pairs were not well differentiated should provide a clue as to how different glosses are encoded by the population of gloss-selective neurons. To examine this problem, we computed a correlation coefficient (r) for the responses of the 57 gloss-selective neurons to all possible pairs of the 33 stimuli in the gloss stimulus set. Then (1 − r) was regarded as the neural distance between two stimuli and MDS analysis was applied to the resultant distance matrix, which contained the neural distances for all possible pairs of stimuli. Figure 13, A and B, depicts the relationships between the responses of the 57 gloss-selective neurons for two example pairs of stimuli. Stimuli 3 and 8 (Fig. 13A) are quite different in color and luminance, but both have sharp highlights and similar glossiness. The population of gloss-selective neurons exhibited highly correlated responses to these two stimuli (r = 0.92), indicating that the neural distance between them was small. Stimuli 3 and 31 (Fig. 13B) are very different in appearance: stimulus 3 is highly glossy, whereas stimulus 31 is matte. The response patterns to these two stimuli were quite different, and the correlation between them was very weak (r = 0.22), indicating the neural distance was large. We computed the neural distances for all pairs of stimuli using the same procedure, after which the stimuli were arranged on a two-dimensional plane such that their relative positions on the plane maintained the neural distances as much as possible.
Figure 13C indicates the resulting diagram of this MDS analysis. The scree plot in Figure 13C (inset) shows that two dimensions are sufficient to capture most of the variance of the neural distance (stress, 0.12) and to understand the basic aspects of the neural encoding of the stimulus set. In this two-dimensional diagram derived from the MDS analysis, stimulus pairs that yielded similar response patterns in the neural population are plotted near one another and those that yielded different response patterns are plotted farther away. At the left side of this figure, highly specular stimuli are accumulated. However, at the bottom right, glossy stimuli with blurred highlights are accumulated, and toward the top right glossiness is reduced and matte stimuli are clustered at the top right. The results of the MDS analysis show that the population responses of gloss-selective neurons systematically represent a variety of glosses and suggest that these neurons carry information that is closely associated with characterizing the surface gloss of objects.
Discussion
Surface reflectance properties in object recognition
We are very sensitive to the surface properties such as gloss of both natural and man-made objects, and often our behavioral decisions are dependent upon them. For example, the surface reflectance of foods significantly changes depending on whether the food is fresh or old, and the reflectance of an animal's skin changes depending on the health condition. This type of information is closely related to the optical and surface reflectance properties of an object, and understanding how these properties are represented in the brain is essential for understanding the neural mechanisms involved in object recognition.
In the present study, we found neurons in the lower bank of STS that selectively respond to specific glosses. As a population, these neurons systematically represented a variety of glosses. This finding provides strong evidence that IT cortex, which plays an important role in object recognition, is involved in processing information about gloss.
Comparison with previous studies
Two previous studies examined the selectivity of neurons to various materials having different surface reflectance properties. These studies described material-selective neurons in area V4 (Arcizet et al., 2008) and IT cortex (Köteles et al., 2008), and reported that the material selectivity was largely unaffected by changes in illumination direction. Both of those studies used visual stimuli consisting of various materials taken from the CUReT BRDF dataset (http://www.cs.columbia.edu/CAVE/software/curet/), which contains measured BRDF data on various materials (Dana et al., 1999). However, materials in this dataset generally have a 3D mesostructure, macroscopic geometric details on the surface, characteristic to each material, which yields complex texture patterns of shading [Köteles et al. (2008), their Fig. 1A]. It is therefore likely that the observed neural activities selective for specific materials were due to complex texture patterns of shading specific to each material. However, objects sampled in the MERL BRDF dataset used in the present study do not have such 3D mesostructures, which eliminated the influence of shading texture patterns and enabled us to study neural selectivity to surface reflectance properties in isolation.
Classification of gloss-selective neurons
Classification of gloss-selective neurons in the present study does not imply that these neurons form a distinct group clearly separated from other nearby neurons. As can be seen in Figure 6, gloss-selective neurons form a continuous distribution with other neurons, and it is likely that some of those cells may also be involved in encoding gloss. In particular, the neurons represented by blue circles in Figure 6 retained selectivity for the gloss stimulus set, even when the object shape was changed. Although these neurons may be responding to low-level image statistics such as mean luminance or chromaticity, they may also be involved in encoding glossiness. Pixel shuffling causes large changes in the apparent gloss of specular stimuli, whereas the changes are small for matte stimuli. It may be possible that these neurons selectively encode stimuli with low specularity, and the neuron depicted in Figure 5C may be a good example. Nonetheless, in attempting to explore neurons selective for gloss, we opted to apply rather conservative criteria.
Encoding 3D shapes and surface reflectances in the IT cortex
When we view an object, three factors, namely shape, surface reflectance, and illumination environment, interact to form a retinal image and are thus intermingled within the image. Isolating each factor from the retinal image is a fundamental task of the visual system. In the lower bank of STS, within area TE, neurons that encode 3D information based on stereo disparity and texture gradients have been previously reported (Janssen et al., 2000a,b, 2001; Liu et al., 2004; Yamane et al., 2008). However, there is important difference in nature of visual features related to stereo and texture compared with those related to shading and specular highlights (Blake and Bülthoff, 1991; Norman et al., 2004). That is, visual features related to stereo and texture processing correspond to fixed locations on the surface of an object, whereas those related to shading and highlight processing change their locations depending on the position of the light source and viewpoint. Responses of TE neurons in the present study retained their selectivity when the pattern of shading and highlights was altered through a change in illumination. Thus, these neurons fit well with the properties of the visual features generated by the interaction of illumination with surface reflectance in the sense that the responses tolerated a change in illumination to invariably encode the surface reflectance properties of objects.
Imaging of gloss-selective activities using fMRI in monkeys has shown activities in the ventral visual areas, including areas TEO and TE, but not in the parietal cortices (Okazawa et al., 2011). The results are analogous to those from human and monkey fMRI studies showing that activities related to deriving shape from shading is only observed in ventral higher areas (Georgieva et al., 2008; Nelissen et al., 2009). By contrast, activities related to deriving 3D shape from stereo and texture features can be observed in the parietal cortex (Durand et al., 2007; Joly et al., 2009) as well. These results suggest that visual features related to surface reflectance and illumination are mainly processed in the ventral visual pathway. Presumably, two separate mechanisms exist in the lower bank of STS, one to encode the 3D shapes of objects by processing stereo and texture information and another to encode the surface reflectance of objects by processing shading and highlight information. This does not necessarily mean different mechanisms coexist in the same location within STS. For example, the recording site in the present study is clearly more posterior than the 3D shape selective region described by Janssen et al. (2000a, 2001). Locations of the gloss-selective neurons are also different from the three regions where concentration of color selective neurons has been reported in IT cortex. Two of them were located in the anterior and posterior regions on the lateral convexity of IT cortex (Harada et al., 2009; Yasuda et al., 2010; Banno et al., 2011), and the remaining one (Conway and Tsao, 2006; Conway et al., 2007) was located in the more posterior region in the lower bank of STS (posterior 1–4 mm in the stereotaxic coordinates).
There exists an interesting relationship between surface gloss and 3D shape. Specular highlights cling to regions of the surface of an object that have a high degree of curvature (Koenderink and van Doorn, 1980) and thereby facilitate 3D shape discrimination (Todd et al., 1997, 2004). Because the gloss selectivities of TE neurons in the present study were invariant across a change in object shape, these selectivities are likely related to the coding of the surface reflectance rather than to a cue for coding in the 3D shape. It would be interesting to know how such information about gloss interacts with 3D shape information within STS.
Relationship with gloss perception
In the present study, we examined the gloss selectivity of neurons using stimuli defined by a combination of physical parameters of gloss (ρs, ρd, α). An important question is how the activities of these neurons are related to the gloss perception. The relationship between the physical parameters of gloss and perceived gloss was examined in a previous psychophysical study, and a perceptually uniform gloss space was derived (Ferwerda et al., 2001). This perceptual gloss space is a two-dimensional space with a c (contrast gloss) axis and d (distinctness of gloss) axis. The results of our MDS analysis show that a variety of glosses can be systematically represented in a two-dimensional space, and c and d appear to systematically vary within this plane. This suggests that the activities of gloss-selective neurons are closely associated with gloss perception.
Neural processes related to generating gloss selectivity
How is gloss selectivity of IT neurons generated from the neural processing in early visual areas? Detection of complex shapes is thought to be achieved through integration of local features such as local contrast, orientation, spatial frequency, and contour curvature (Riesenhuber and Poggio, 1999; Kourtzi and Connor, 2011). The visual features related to gloss perception are not yet well understood, although the importance of highlights has long been recognized (Beck and Prazdny, 1981; Hunter and Harold, 1987; Blake and Bülthoff, 1990) and the importance of image statistics has been suggested more recently (Nishida and Shinya, 1998; Motoyoshi et al., 2007). That the responses of the gloss-selective neurons in the present study were significantly diminished by shuffling of the image pixels indicates their selectivity is not due simply to low-level image statistics: a difference in the parameters of the luminance and chromaticity histograms of different stimuli, for example. Furthermore, when we analyzed the correlation between the luminance contrasts of stimuli and responses, most neurons did not exhibit a significant correlation, indicating that their selectivity is also not due to image contrast. How the responses of gloss-selective neurons are determined by the combination of simple image features will be an important question for future research and should enhance our understanding of the visual features involved in gloss perception.
Footnotes
This work was supported by Grant-in-Aid for Scientific Research on Innovative Areas 22135007 from Ministry of Education, Science, Culture, Sports and Science, Japan.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Hidehiko Komatsu, Division of Sensory and Cognitive Information, National Institute for Physiological Sciences, Myoudaiji, Okazaki 444-8585, Aichi, Japan. komatsu{at}nips.ac.jp