Functional magnetic resonance imaging (fMRI) has been used extensively to identify regions in the inferior temporal (IT) cortex that are selective for categories of visual stimuli. However, comparatively little is known about the neuronal responses relative to these fMRI-defined regions. Here, we compared in nonhuman primates the distribution and response properties of IT neurons recorded within versus outside fMRI regions selective for four different visual categories: faces, body parts, objects, and places. Although individual neurons that preferred each of the four categories were found throughout the sampled regions, they were most concentrated within the corresponding fMRI region, decreasing significantly within 1–4 mm from the edge of these regions. Furthermore, the correspondence between fMRI and neuronal distributions was specific to neurons that increased their firing rates in response to the visual stimuli but not to neurons suppressed by visual stimuli, suggesting that the processes associated with inhibiting neuronal activity did not contribute strongly to the fMRI signal in this experiment.
Over the past 15 years, numerous studies have used functional magnetic resonance imaging (fMRI) to identify the neural structures involved in the perception and recognition of complex visual stimuli (for review, see Reddy and Kanwisher, 2006; Op de Beeck et al., 2008; Mahon and Caramazza, 2009). These studies have revealed a number of category-selective regions located throughout the inferior temporal (IT) cortex in both humans (Puce et al., 1995; Kanwisher et al., 1997; Epstein and Kanwisher, 1998; Downing et al., 2001; Haxby et al., 2001; Schwarzlose et al., 2005) and monkeys (Tsao et al., 2003; Pinsk et al., 2005, 2009; Bell et al., 2009; Rajimehr et al., 2009). Such regions provide a neuroanatomical basis for the behavioral (Rosch and Mervis, 1975; Rosch et al., 1976; Murphy and Brownell, 1985) and neurological (Bodamer, 1947; Damasio et al., 1982; Epstein et al., 2001) evidence suggesting that object recognition includes a stage at which stimuli are classified according to their semantic category. However, relatively little is known about the neuronal characteristics of these fMRI-identified category-selective regions because the relationship between the fMRI signal and the firing of individual neurons is still under investigation (Logothetis, 2002; Goense and Logothetis, 2008; Lee et al., 2010).
Recently, Tsao et al. (2006) sampled a small number of sites within an fMRI-identified face-selective region in the superior temporal sulcus (STS) of monkeys and reported that virtually all visually responsive neurons (97%) at those sites were strongly face selective. This study provided a critical piece of evidence linking the fMRI signal with the spiking activity of individual neurons. However, a number of crucial questions remain. First, are face-selective neurons also found outside of fMRI-identified face-selective regions, and, if so, in what concentrations? Second, does the relationship between neuronal preferences and fMRI selectivity extend to categories other than faces? Third, does the relationship differ for neurons that increase their spiking activity in response to visual stimuli compared with those that decrease their activity? Finally, does the degree of category selectivity differ between neurons found inside versus outside these fMRI-identified category-selective regions?
Here we address these questions by contrasting the distribution and response properties of visually responsive neurons in IT cortex in two monkeys, both inside and outside fMRI regions selective for four stimulus categories: faces, body parts, objects, and places.
Materials and Methods
All procedures were approved by the National Institute of Mental Health (NIMH) Animal Care and Use Committee and conform to all NIH guidelines. Two adult male rhesus monkeys (Macaca mulatta) were used in this study (10–12 kg). An MR-compatible head post was implanted for training and imaging experiments. After identification of the regions of interest using fMRI (see below), a second surgery was performed during which an MR-compatible recording chamber was implanted (centered 12–14 mm anterior to the interaural axis, left hemisphere for monkey S, right hemisphere for monkey W) over a 19 mm craniotomy. Centering the chamber thus provided access to a portion of the relevant category-selective regions identified by fMRI. Confirmation of the placement of the recording chamber and electrodes was accomplished by collecting additional anatomical scans after surgery, with and without electrode(s) in place (see Fig. 2A; see below for details).
Summary of neuroimaging techniques.
Functional neuroimaging data described in this study were collected at the Martinos Center for Biomedical Imaging at the Massachusetts General Hospital (MGH) (monkey S, single-loop send/receive coil, TR = 2.5 s, TE = 28 ms, 35 coronal slices) and at the NIH FMRI Core Facility (monkey W, eight-channel phased-array coil, TR = 2 s, TE = 17.9 ms, 27 coronal slices). All procedures conformed to guidelines set by the MGH Center for Comparative Medicine and the NIMH Animal Care and Use Committee. Details regarding the collection of functional neuroimaging data have been described previously (Bell et al., 2009). Briefly, functional scans (1.25 × 1.25 × 1.90 mm for monkey S, 1.56 × 1.56 × 1.55 mm for monkey W) covering the temporal lobe were collected at 3 T while monkeys performed a passive fixation task. Monkeys were required to maintain stable fixation within 3° of a central fixation point while stimuli from one of four visual categories (monkey faces, monkey body parts, familiar objects, and familiar places) were presented foveally (Fig. 1A). Stimuli were converted to grayscale, matched for overall luminance, and resized without altering the aspect ratio, such that the largest dimension (height and/or width) was 22° in size. Stimuli were presented on a random-dot pattern (Fig. 1A; matching the control condition, see below) to produce an overall stimulus width of 40° across. The random-dot pattern was produced using a custom Matlab (MathWorks) program designed to match the overall luminance of the stimuli. The stimuli were presented in blocks of 15–16 images each (15 for monkey W, 16 for monkey S), selected from a set of 48 (45 in the case of monkey W, taken from the same pool) possible exemplars per category (stimulus presentation time: 2 s = 30 or 32 s blocks for monkey S and monkey W, respectively). Except in the case of aborted sessions, each of the exemplars was presented an equal number of times within a given session (for additional details regarding the stimuli, including an analysis of stimulus similarity, see Bell et al., 2009). Runs also included blocks of baseline fixation (blank screen + fixation point) as well as a random-dot pattern control block (i.e., the “scrambled” condition; see Fig. 1A). Runs lasted either 5 min 10 s or 5 min 40 s (for monkeys S and W, respectively), and monkeys completed 18–32 runs per session. Eye position was monitored throughout, and runs in which the monkey failed to maintain fixation for at least 85% of the run were rejected. Short breaks in fixation lasting <200 ms (e.g., as a result of blinks) were ignored. To improve the contrast-to-noise ratio, monkeys were injected with an iron-based contrast agent [mono-crystalline iron nanoparticle (MION)] before every scan session (Vanduffel et al., 2001; Leite et al., 2002).
To define the category-selective regions in the fMRI data, we contrasted each category to the other three (thresholded to at least p < 0.05, corrected for multiple comparisons). To identify voxels selective for complex images, the output was masked with a contrast of all four categories versus the random-dot pattern (thresholded to at least p < 10−10). Note that this method of defining category-selective (in particular, face-selective) regions in the monkey brain is subtly different from others in the literature and likely affects the number of regions observed. For example, in the pioneering study by Tsao et al. (2003), contrasting faces to non-face objects produced three face-selective regions per hemisphere. Several years later, using a modified approach (i.e., MION and a multi-echo sequence), they increased this number to six (Moeller et al., 2008). Other laboratories, including our own, have reported between two and six regions per hemisphere (Pinsk et al., 2005, 2009; Bell et al., 2009). With numerous methodological advances (e.g., the use of MION, phased-array coils, and multi-echo sequences), it is likely that the ability to tease apart subdivisions of the original three larger regions has been enhanced. In the context of the current study, the exact number of patches is not as important as their relationship to the underlying neuronal distributions.
Monkeys performed a “rapid serial visual presentation” task (Földiák et al., 2004) while single-unit data were recorded from the IT cortex. Monkeys were required to maintain stable fixation within 3° from a central fixation point while images from five semantic categories (monkey faces, monkey body parts, fruit, familiar objects, and familiar places) were presented in rapid succession. Responses to fruit stimuli were collected as part of another study and have therefore been omitted from all additional analyses. The remaining test stimuli consisted of 80 grayscale images selected from the same stimulus set used for the fMRI experiments (see above). Stimuli were controlled for overall luminance, presented on a black background, and organized into four semantic categories of interest with 20 individual stimuli each (Fig. 1A). Face stimuli were cropped (oval) to further control for differences in shape and aspect ratio. Each stimulus was presented three to five times per neuron (i.e., 60–100 repetitions per category). Each trial began with an initial fixation period of 100–300 ms. The stimulus was then presented foveally for 300 ms, followed by an additional 100 ms fixation interval after which time a liquid reward was given. Fixation performance of the animal was often >90%, and trials in which the monkey failed to maintain fixation throughout the entire trial were rejected. In contrast to the fMRI experiments, stimuli were smaller (∼5 × 5°) and were presented unblocked in random order to characterize neuronal responses to individual exemplars. Given the repeated demonstrations of size invariance in IT cortex (Sato et al., 1980; Ito et al., 1995; Rust and Dicarlo, 2010), it is unlikely that the differences in stimulus size between the two experiments had a significant impact on the observed category selectivity.
During recording sessions, one to four electrodes were lowered into IT cortex, guided by transdural guide tubes held in place by a delrin grid (Crist Instruments). Accurate and reliable targeting of the fMRI-identified category-selective regions in IT cortex was essential to this study. To avoid/minimize possible bending of the electrodes, which could have compromised the alignment between electrode penetrations and the fMRI data, we used larger-diameter electrodes (250 μm) and passed them through rigid guide tubes (55–65 mm in length) that terminated within the superior temporal gyrus, greatly reducing the distance the electrodes had to travel without guidance. To localize our electrode penetrations relative to the fMRI data, we collected several additional anatomical scans (at 4.7 T, fast, low-angle shot sequence and/or T2-weighted 2D sequences, 0.5 × 0.5 × 1.0 mm voxels) with electrodes positioned at strategic depths (based on stereotaxic coordinates and activity landmarks) and locations relative to the recording grid (e.g., cardinal positions; see Fig. 2A). These scans confirmed that there was minimal bending or deviation of the trajectory of the electrode after exiting the guide tube. The anatomical scans were first aligned to the functional data, using both automated and manual alignment procedures (for details, see Bell et al., 2009) such that the area of the fMRI-identified category-selective maps accessible from the recording chamber could be identified. Using both the electrode scans and fixed landmarks (e.g., edges of the recording chamber) as guides, we then extrapolated the position of each grid hole relative to the functional data (see Fig. 2B).
The accuracy of this registration procedure depends on both the alignment between the functional and anatomical scans, as well as the ability to precisely localize the electrode tip on the MR images. In the case of the former, using well-established automated alignment procedures can minimize errors. In the case of the latter, there is an unavoidable margin of error given the limited resolution of the in situ anatomical scans relative to the diameter of the electrode. To account for both of these sources of alignment error, we did not compare the neuronal distributions or properties within individual penetrations to individual voxels. Instead, we compared the distributions of all penetrations targeting a particular category-selective region. Furthermore, we grouped the data from penetrations outside an area of interest into those within 1–4 mm (i.e., Near) versus those >4 mm from the given region (Out) (see Results).
Based on the previous fMRI, we assigned each grid hole a color corresponding to the fMRI-identified category selectivity found at the level of the inferior bank of the STS, in which all neuronal recordings were performed. The grid had holes spaced 1 mm apart and allowed for recordings over ∼154 mm2 of IT cortex (5–19 mm anterior to the interaural axis in monkey S, 7–21 mm in monkey W). Waveform data were sampled at 40 kHz and later sorted into individual units using Offline-Sorter (Plexon Systems).
Spike trains were first converted to spike–density functions using a normal Gaussian kernel. Each action potential was converted to an individual Gaussian pulse having a total area of 1 (spike) and an SD (σ) of 10 ms. The individual pulses were summed together to yield a single spike–density function for each trial. The magnitude of the visual response for each trial was defined as the mean spike density 50–300 ms after stimulus onset.
Each neuron was classified according to the following criteria: visually responsive versus nonresponsive, category-selective versus not category-selective, and stimulus-selective versus not stimulus-selective (within their preferred category). A neuron was classified as visually responsive if the mean response to any of the four categories of interest (faces, body parts, objects, or places) was significantly (Wilcoxon's rank-sum test; p < 0.05) different from baseline (defined as the average activity 200 ms before 50 ms after stimulus onset). If the average response for a given category was less than the average baseline response, the category response was defined as “suppressed.” If the response was greater than the average baseline response, the response was defined as “excitatory.” Note that it was possible for a single neuron to show both excitatory and suppressed responses (see Fig. 3C).
To determine whether a given visually responsive neuron was category selective, we compared the average responses to each of the four categories using a one-way ANOVA. Neurons with a main effect of category (p < 0.05) were defined as “category-selective.” The degree to which a neuron was selective for a given category was calculated using the following formula: where CSIa is the category selectivity index for category a, Ra is the response to category a, and Rb–Rd are the responses to the remaining categories. Note that all CSI calculations were done using the raw firing rates of each neuron (not the normalized response magnitudes) to avoid artificially inflating values attributable to the inclusion of negative responses. A high absolute CSI value indicates a category response that is very different from the average response to the remaining three categories (i.e., strong category selectivity); a low absolute CSI value indicates a category response that is only marginally different from the average response to the remaining three categories (i.e., weak category selectivity). To determine whether a given visually responsive neuron was sensitive to stimulus identity (i.e., “stimulus-selective” neuron), we compared the average responses to each of the 20 exemplars within a given category using a one-way ANOVA. Neurons with a main effect of stimulus identity (p < 0.05) were defined as stimulus selective. Response latency was defined as the point at which the average activity exceeded baseline + 2 SDs for a minimum of 20 ms. Values <50 ms were discarded.
Summary of fMRI findings
Figure 1B shows maps of the different category-selective regions identified using fMRI for two monkey subjects: the left hemisphere for monkey S (based on 9916 functional volumes, 74 blocks per condition) and the right hemisphere for monkey W (based on 12,580 functional volumes, 74 blocks per condition). As described previously (Bell et al., 2009), we identified regions selective for each of four categories tested: faces, body parts, objects, and places. These category-selective regions were concentrated in the inferior bank of the STS but extended to the ventral surface of IT cortex. In the left hemisphere of monkey S (Fig. 1B, left), we identified two face-selective regions: one was located anteriorly in temporal cortex area TE (centered at +17–18 mm anterior to the interaural axis), and the other was located posteriorly in/near temporal–occipital area TEO (centered at +5–6 mm). These regions correspond to the “anterior” and “middle” face-selective regions previously identified by Tsao et al. (2003, 2006) and Pinsk et al. (2005, 2009) and to the “anterior” and “posterior” face-selective regions of others (Hadj-Bouziane et al., 2008; Bell et al., 2009; Rajimehr et al., 2009). Also consistent with previous studies (Tsao et al., 2003; Pinsk et al., 2005; Bell et al., 2009), two regions selective for body parts were located immediately adjacent to the face-selective regions, centered at +17 and +6 mm anterior to the interaural axis, respectively. Occupying the majority of the cortex between the two face/body-part-selective regions was a single, large object-selective region, spanning regions between +6 to +14 mm. A single place-selective region (data not shown) was located along the ventral surface of IT cortex lateral to the occipitotemporal sulcus (OTS), centered at ∼5 mm anterior to the interaural axis.
The data obtained from the second monkey (monkey W, Fig. 1B, right) were more variable, likely because of the increased movement observed in this subject. However, we were nonetheless able to identity several statistically significant category-selective regions within its right hemisphere. We identified both an anterior and a posterior face-selective region within the inferior bank of the STS, centered at +14–17 and +7–9 mm anterior to the interaural axis, respectively. In addition, several smaller face-selective regions were identified, including one on the ventral surface of IT cortex (centered at +11 mm anterior to the interaural axis, lateral to the OTS) and another within the STS close to the temporal pole (centered at +22–23 mm anterior to the interaural axis). It is probable that some of these smaller regions represent subregions of the anterior and posterior face-selective regions that became isolated as a result of statistical thresholding. Two body-part-selective regions were identified on the lateral edge of the STS; one was located anteriorly (centered at +12–14 mm anterior to the interaural axis) and the other was located posteriorly (centered at +5–6 mm anterior to the interaural axis). Surrounding the anterior face- and body-part-selective regions was an object-selective region, spanning +10–16 mm anterior to the interaural axis. Two anterior place-selective regions were identified, one located within the STS (centered at +20–22 mm anterior to the interaural axis) and the other located immediately ventral to this, near the anterior middle temporal sulcus.
To localize our electrode penetrations relative to the fMRI data, we collected several anatomical scans with electrodes positioned at strategic depths (Fig. 2A; for details, see Materials and Methods). The approximate area accessible by our recording grid is indicated on the flattened fMRI category maps in Figure 1B as dashed ovals. Figure 2B shows a top-down view of the recording grid for each monkey. The colors indicate the fMRI-identified category selectivity in the inferior bank of the STS, in which all recordings were performed. The white circles indicate the grid holes from which we sampled neuronal data (26 for monkey S; 23 for monkey W). Given the shape and location of our recording grids, we were able to access the following fMRI-identified regions: the complete anterior face-selective region (in both monkeys), the anterior portion of the posterior face-selective region (in both monkeys), the anterior body-part selective region (in both monkeys), a large portion of the object-selective region (in both monkeys), and an anterior place-selective region (in monkey W).
Properties of category-selective neurons in IT cortex
We recorded activity from 1272 individual neurons in the inferior bank of the STS (areas TE/TEO; von Bonin and Bailey, 1947) from two monkeys (609 from monkey S; 663 from monkey W). Of these, 77% (975 of 1272) showed a significant response to stimuli from at least one of the four visual categories tested; only these 975 visually responsive neurons were considered for additional analysis. Three types of response profile were identified (Fig. 3; Table 1). Most neurons significantly increased their firing rate in response to the visual stimuli tested (Fig. 3A, Excitatory; 529 of 975, 54%). The second group decreased their firing rate in response to stimuli from at least one category tested (Fig. 3B, Suppressed; 250 of 975, 26%). The remaining 20% of neurons (196 of 975) increased their firing in response to stimuli from one (or more) category and decreased their firing in response to stimuli from at least one other category (Fig. 3C, Both).
Figure 4 shows the neuronal populations sampled from the two monkeys, separated according to those neurons that exhibited significant excitatory (left) and suppressed (right) responses. Neurons exhibiting both excitatory and suppressed responses, such as that shown in Figure 3C, appear in both panels. Individual rows show the responses to each of the 80 different stimuli presented (20 exemplars per category) for a single neuron. Responses are shown relative to the baseline firing rate of each neuron to reveal excitatory versus suppressed responses. Individual neurons are sorted according to which category evoked the strongest response (or, in the case of suppressed responses, the weakest response). Below, all visually responsive neurons are referred to by their preferred category (i.e., that which evoked the strongest average response across all exemplars within that category): those that responded most strongly to faces are referred to as “face neurons,” neurons that responded most strongly to body parts are referred to as “body-part neurons,” and so forth. Note that this does not imply that a given neuron is ultimately “selective” for that particular category, merely that of the four categories tested, this category evoked the strongest response.
To evaluate whether a given neuron exhibited significant across-category selectivity, we performed a one-way ANOVA on each neuron, with stimulus category as the factor of interest (i.e., faces, objects, etc.). By this criterion, 73% (713 of 975) of visually responsive neurons were category selective (main effect of category, p < 0.05; Table 1). Figure 3A illustrates one such neuron, showing a clear bias for face stimuli. In comparison, the neuron in Figure 3B was not category selective; it showed approximately the same level of response suppression to stimuli from all four categories. A second ANOVA, using stimulus identity as the main factor of interest (i.e., face1, face2, etc.), revealed that only 23% of neurons exhibited stimulus selectivity within their preferred category (main effect of stimulus identity, p < 0.05; Table 2). In other words, if a given neuron responded robustly to a face stimulus (for example), it was very likely to respond robustly to all face stimuli. Object and body-part neurons showed the greatest proportion of neurons with significant stimulus selectivity (Table 2; 45 and 41%, respectively vs 17 and 23% for faces and places, respectively), which may be attributable to the greater variation in visual appearance across the different exemplars within these two categories compared with faces and places (Bell et al., 2009).
We also observed significant differences in the degree to which individual neurons were selective for their preferred category. Figure 5, A and C, shows the average normalized responses for stimulus category for each subpopulation of neuron (i.e., face neurons, body-part neurons, etc.). In the case of the excitatory responses, face neurons had relatively weak responses (or none at all) to stimuli from the remaining three categories (Fig. 5A; for an example, see Fig. 3A). A similar trend was observed for place neurons (Fig. 5A). In contrast, body-part and object neurons showed relatively robust responses to certain nonpreferred categories (in particular, objects and body parts, respectively).
To quantify these differences in category selectivity across the four subpopulations of neurons, we calculated a CSI on the raw responses for each neuron, which expresses the ratio of the average excitatory response to all stimuli from the preferred category of the neuron to those from the remaining three categories (see Materials and Methods). Figure 5B shows the average CSI values for all neurons that responded preferentially to each of the four categories (i.e., face neurons, body-part neurons, etc.). Face neurons were the most selective (for their preferred category of faces), with an average CSI of 0.29 ± 0.01 (indicating that the average response to face stimuli was ∼82% greater than the average response to the remaining three categories). Body-part neurons were the next selective (for body parts), with an average CSI of 0.24 ± 0.01 (∼63%). Place neurons had an average CSI of 0.22 ± 0.02 (∼56%), and object neurons were the least selective, with an average CSI of 0.19 ± 0. 01 (∼47%). In other words, a neuron that responded most strongly to faces was not likely to respond strongly to stimuli from any other category, whereas a neuron that responded most strongly to objects was likely to show robust responses to stimuli from other categories.
Similar disparities were observed for the suppressed responses (Fig. 5C,D). Neurons whose activity was most suppressed by faces or places showed little suppression to stimuli from the remaining categories (average CSI: −0.36 ± 0.01, approximately −53% and −0.30 ± 0.02, approximately −46%, respectively). Thus, these suppressive effects were strongly category specific for these two categories; in fact, the responses to the other categories were often above baseline. Conversely, neurons most suppressed by body parts or objects tended to also show decreases in activity in response to stimuli from other categories (objects and body parts, respectively) (average CSI: −0.28 ± 0.02, approximately −44% and −0.24 ± 0.02, approximately −39%, respectively).
Finally, neurons also exhibited differences in response latency related to their category preferences (Fig. 5E). Face neurons had significantly shorter response latencies (average response latency, 110 ± 1 ms) compared with neurons that preferred each of the other three categories (123 ± 1 ms for body-part neurons; 124 ± 2 ms for object neurons; 125 ± 2 ms for place neurons; p values <0.05).
These data highlight several differences among category-selective neurons in IT cortex of monkeys, independent of their location relative to the fMRI regions. Specifically, face neurons were (1) more selective and (2) had shorter response latencies compared with the other three neuron types. These observations, together with the disproportionately large number of neurons selectively suppressed by faces, suggest that faces (and face neurons) represent a special category of visual stimuli/neuron (see Discussion).
Spatial distribution of category selectivity in IT cortex
We next compared the distribution of all visually responsive neurons, relative to the location of the individual fMRI-identified category-selective regions. Figure 6 shows the fMRI-identified category maps for monkeys S and W. We subdivided the maps into four and five subdivisions for monkeys S and W, respectively, corresponding to the individual fMRI-identified category-selective regions located in the inferior bank of the STS accessible from our recording chambers. Below these are the corresponding distributions of all visually responsive neurons for each subdivision, separated into excitatory and suppressed responses. We chose to include all visually responsive neurons in this analysis (as opposed to just neurons that exhibited a certain level of selectivity for a particular category) based on the assumption that the MR signal would be correlated with the overall distribution of actively firing neurons and not with the distribution of a select subgroup. We conducted individual χ2 tests on each distribution to assess whether neurons that preferred each of the four categories were evenly distributed within each subdivision. These tests revealed that the majority of subdivisions showed a significantly biased distribution (p < 0.05). In the case of the excitatory responses, this bias matched the category selectivity identified by fMRI. For example, in the case of monkey S, the fMRI-identified body-part-selective region (Fig. 6, subdivision 1) contained 56% body-part neurons. Immediately adjacent to this region was the anterior fMRI-identified face-selective region (Fig. 6, subdivision 2), which contained 52% face neurons. No such pattern was observed for the suppressed responses: in almost all cases, neurons suppressed maximally by faces comprised the largest proportion of suppressed responses, regardless of the selectivity predicted by the fMRI data. Thus, unlike the excitatory responses, the distribution of category selectivity for the suppressed responses showed little correspondence to the fMRI-identified regions. From these data, we cannot infer that a relationship existed between the fMRI signal and the processes associated with suppressed spiking responses in this experimental context. Therefore, the remaining analyses were restricted to the excitatory responses.
Figure 7 compares the proportion of neurons that preferred a given category found within (In), near (Near) (located between 1 and 4 mm from the edge), and outside (Out) (>4 mm from the edge) the corresponding fMRI-identified category-selective region. These three zones are represented in the accompanying grid maps in Figure 7 (In, colored according to the selective category; Near, gray boundaries; Out, all remaining sampled locations. Note that sites defined as Out for the anterior face region do not include those found in the posterior face region and vice versa). In all but one case (the object-selective region for monkey W), the greatest proportion of neurons that preferred a given category was found within recording sites that targeted the corresponding fMRI region. Furthermore, in the case of the face-selective regions, a greater proportion of face neurons were found in the anterior face-selective regions compared with the posterior face-selective regions. In the majority of cases (all but the face-selective regions in both monkeys and the object-selective region in monkey W), the next greatest proportion was found nearby (Near, gray bars), and the lowest proportion was found in recording sites located farthest from the fMRI region. This relationship between fMRI and neuronal distribution was most pronounced for face and body-part regions (in which the proportion of face and body-part neurons ranged from 41 to 69%) and weaker for object and place regions (which contained 20–35% object and place neurons). As illustrated in Figure 6, these biased distributions within the individual fMRI-selective region were significantly different from chance (χ2 test, p < 0.05) in all cases except the object-selective and place-selective regions in monkey W.
Overall, these data show that category-selective voxels identified with fMRI correspond to a local increase in the proportion of neurons that prefer that category and that this concentration decreases further from the borders of these regions. This relationship was specific to neurons that increase their firing in response to the relevant stimuli (i.e., excitatory responses) and was more pronounced for faces and body parts and weaker (or absent) for places and objects.
Contrasting fMRI and neuronal responses
Figure 8, A and B, shows the fMRI time series and the corresponding spike–density functions for two different fMRI-identified category-selective regions. In the first example (anterior face-selective region from monkey W), the fMRI activation in response to faces was almost twice that to the next most active category (objects). The corresponding neuronal distribution was strongly biased toward faces (69%), and the population response was highly selective for faces. In contrast, in the second example (object-selective region from monkey S), the fMRI activation was only weakly selective for objects, as was the underlying neuronal distribution. Furthermore, there was very little bias in the population response: all four categories evoked robust responses among the population of neurons found within this region.
To quantify this relationship between the selectivity of the fMRI response profiles within individual category-selective regions and those of the underlying neuronal populations, we correlated the selectivity indices for each category response within each fMRI-identified region for the fMRI and neuronal populations (Fig. 8C). This analysis showed that, as the strength of the neuronal response to faces (as an example) increased relative to the responses to non-faces, so too did the strength of the corresponding fMRI activation. Thus, in this experiment, we might infer on the basis of this analysis that the fMRI signal predicts the preferred/nonpreferred ratio of the responses of the underlying neuronal population. However, note that, although this analysis revealed modest correlation values, it was only marginally significant (p = 0.04) and failed to reach statistical significance when evaluated with a nonparametric analysis method (Spearman's rank correlation coefficient).
Nonetheless, based on these examples, it is tempting to conclude that fMRI activations might correlate with the response magnitudes of the neuronal populations. However, caution must be taken when contrasting fMRI activation with spiking activity. For example, although the majority of neurons within a given region might respond most strongly to a particular category, this does not necessarily imply that the remaining neurons respond weakly to stimuli from another category (e.g., consider 100 face neurons each firing 10 spikes/s to faces compared with 10 object neurons each firing 100 spikes/s to objects). Furthermore, peak-firing rate is only one method of quantifying a neuronal response. Because the hemodynamic response operates on a much longer timescale than neuronal activity, it is possible that weak but sustained responses might have a greater impact on the fMRI signal. Given these caveats, it was not surprising that the magnitude of the fMRI response to each category within a given region did not correlate significantly with the corresponding population neuronal response (Fig. 8D).
Comparing neuronal properties inside versus outside fMRI-identified category-selective regions
In addition to a correspondence between fMRI selectivity and the spatial distribution of neurons, we also investigated whether neurons found within the fMRI-identified regions are functionally different from those located outside these regions. Specifically, we compared the average CSI for the excitatory responses of neurons found inside (In) versus outside (Out) the corresponding fMRI-identified category-selective regions (Fig. 9). Interestingly, both monkeys showed the identical trend: greater selectivity was found among face and body-part neurons within the category-selective regions compared with those found outside. This trend was statistically significant in monkey W (p < 0.05) but failed to achieve statistical significance in monkey S (p > 0.05). Thus, in addition to indicating a concentration of category-selective neurons, fMRI-identified category-selective voxels may also reflect an increase in the selectivity of those neurons found within those voxels.
This study addressed four questions about category-selective cortex. First, are face neurons found outside of fMRI-identified face-selective regions? Our data showed that face neurons were located throughout the sampled area of IT cortex but were most concentrated in fMRI-identified face-selective regions. Second, does the relationship between neuronal preferences and fMRI selectivity extend to categories other than faces? We found that regions identified by fMRI as being selective for body parts, objects (in both monkeys), and places (in monkey W) also showed a greater proportion of neurons that respond most strongly to these categories. Third, does the nature of this relationship differ for neurons that increase their activity in response to visual stimuli compared with those that decrease their activity? We found that the fMRI data reflected the distribution of excitatory but not suppressed responses of IT neurons. Fourth, are there differences between neurons found within the fMRI regions, and those found outside? We observed a trend for greater selectivity among face and body-part neurons found within the corresponding category-selective regions. Although not statistically significant in both subjects, this result does raise the possibility that the observed fMRI signal might reflect both a change in the local distribution and the selectivity of visually responsive neurons. Below, we discuss the significance of these results to object processing and how they clarify the physiological basis of the fMRI signal.
Linking neuronal distributions with fMRI selectivity
Tsao et al. (2006) provided the first clear demonstration of a significant bias for face neurons within fMRI-identified face-selective regions. They sampled neurons from a single fMRI-identified face-selective region (corresponding to what we define as the posterior face-selective region) and reported that 97% of the visually responsive neurons sampled were strongly face selective, either significantly increasing (90%) or decreasing (7%) their firing rate in response to face stimuli. We found similar trends in both the anterior and posterior fMRI face-selective regions (Figs. 6, 7). However, the proportions of face neurons found within the fMRI regions reported here are markedly reduced compared with those reported by Tsao, Freiwald, and colleagues (Tsao et al., 2006; Freiwald and Tsao, 2010) (e.g., 90 vs 41–69% of the excitatory responses reported here; Figs. 5, 6). The most likely source of this discrepancy is the method by which the regions were sampled. Tsao et al. (2006) sampled from a small number of locations (six penetration sites in two monkeys) that specifically targeted the center of a given face-selective region. We sampled from a much larger number of locations (49 sites in two monkeys) spanning both the center and margins of the fMRI-identified category-selective regions. We did find isolated penetrations that contained very high proportions of face neurons. However, in the case of the posterior face-selective region (the site sampled by Tsao and colleagues), our recording sites primarily targeted the margins of this region because of the placement of our recording chambers. As our data demonstrate (Fig. 7), the relative proportion can drop precipitously along the boundaries of the fMRI regions. Accordingly, the proportion observed was reduced compared with that reported previously.
Nonetheless, the critical observation here remains the same: an fMRI-identified face-selective region includes a substantial increase in the relative proportion of neurons that increase their firing rate in response to face stimuli. Furthermore, we showed that this relationship holds true for other stimulus categories and that the proportions drop when one moves beyond the borders of the fMRI regions. Studies in which the electrode penetrations were not guided by fMRI data typically reported between 15 and 35% face neurons (Perrett et al., 1982; Desimone et al., 1984; Tanaka et al., 1991; Eifuku et al., 2004; Kiani et al., 2005, 2007), which corresponds well to the proportion of face neurons we identified outside the fMRI-identified face-selective regions.
Distribution of category selectivity throughout IT cortex
There are several models of how object representations are organized in the ventral stream. One model proposes that IT cortex contains discrete patches specialized for individual visual categories (e.g., face-processing takes place within the fMRI-identified face-selective regions; see Reddy and Kanwisher, 2006). Another model proposes that complex stimuli are represented by distributed populations of neurons organized according to their feature selectivity (Tanaka et al., 1991; Fujita et al., 1992; Tanaka, 2003; Brincat and Connor, 2004, 2006). Evidence for a modular organization arises primarily from neuroimaging studies (which have a coarse spatial resolution), whereas evidence for a distributed organization derives primarily from physiological studies (but see, for example, Haxby et al., 2001). Here, we bridge the gap between these two techniques, allowing us to gain a better understanding of the organization of IT cortex as well as the relationship between these two methodologies.
Our data confirmed that face neurons are concentrated within fMRI-identified face-selective regions (Tsao et al., 2006) and that this relationship extends to at least one other category, namely body parts. However, our data also showed that neurons preferring a given category are found outside the fMRI-identified regions, a finding that has been demonstrated previously using anatomical tracers (Borra et al., 2010). Thus, although neurons selective for a given category might be clustered into discrete patches, similarly selective neurons can be found throughout IT cortex, supporting a more distributed organizational scheme. As such, we argue that object representations in IT cortex are likely organized according to some hierarchical model, incorporating both modular and distributed elements (Kriegeskorte et al., 2008; Weiner and Grill-Spector, 2010). For example, the neural structures responsible for categorizing stimuli (into faces, body parts, etc.) may be organized into modules, whereas the processes responsible for discriminating among individual stimuli within a category (e.g., one face vs another) may rely on a more distributed, yet finer-scale, organization.
This experiment was not designed to directly address the relative importance of low-level visual features versus semantic relationships, as they relate to the organization of IT cortex. Our stimuli were controlled for overall luminance and color but not for shape or texture. It is therefore possible that the observed neuronal and/or fMRI selectivity may be attributable, at least in part, to systematic variations in shape (e.g., circular faces vs rectangular places) and not high-level categorical distinctions. However, the evidence suggests that differences in low-level features cannot explain all of the observed selectivity. For example, body-part neurons responded to all body-part stimuli, regardless of their shape (Fig. 4). Conversely, body-part and object stimuli showed the greatest variability in terms of visual appearance (Bell et al., 2009), and body-part and object neurons exhibited the greatest proportion of stimulus selectivity (Table 2). Nonetheless, because both experiments used very similar stimuli, the critical observation remains the same: the selectivity observed with fMRI matches the averaged selectivity of the underlying neuronal population.
Face processing in the primate brain
We found three interesting features among face neurons, suggesting that they may represent a special class of IT neuron. First, face neurons exhibited higher selectivity (for faces) compared with other neurons in IT cortex (Fig. 5A). Unlike many other studies that use strict guidelines as to what constitutes a “face neuron” (e.g., response to faces must be at least twice the magnitude as that to non-face stimuli), we did not set any criteria for what we defined as a face neuron (other than the greatest mean response must be to faces). Despite this liberal definition, the vast majority of excitatory face neurons we encountered were highly selective for faces, showing, on average, almost twice the response to faces than to non-face stimuli (Fig. 5B).
Second, face neurons responded (to faces) with significantly shorter response latencies compared with responses among body-part, object, and place neurons (to body parts, objects, and places; Fig. 5C). Kiani et al. (2005) found a similar result and further demonstrated that the latency changed according to the species of face: non-primate faces evoked longer response latencies compared with primate (human) faces. Similarly, Eifuku et al. (2004) found that the response latency to faces differed according to viewpoint (but see Oram and Perrett, 1992). Thus, it appears that differences in response latency among face neurons might be used to encode both the presence of a face as well as certain facial features (Tamura and Tanaka, 2001).
Finally, presentation of faces resulted in a disproportionately large number of suppressed responses relative to stimuli from the other three categories. The existence of suppressed responses to specific stimuli among IT neurons has been known for some time (Gross et al., 1972, 1979; Desimone et al., 1984), comprising between 12 and 28% of the total number of visually responsive neurons encountered. Freiwald and Tsao (2010) recently reported many more neurons suppressed by faces in the anterior face-selective regions as opposed to regions farther posterior. Unfortunately, we know relatively little about the function of these suppressed neurons with respect to visual processing. One possibility is that they serve to further enhance the overall signal-to-noise ratio for faces relative to non-face stimuli in IT cortex by minimizing background activity, making faces “stand out” in a cluttered visual environment. Additional study will be necessary to determine the function of these neurons, but they nonetheless serve to highlight the unique nature of face processing in the primate brain.
This work was supported by National Institutes of Health Grants R01 MH67529 and R01 EY017081 (R.B.H.T.), the Athinoula A. Martinos Center for Biomedical Imaging, the National Center for Research Resources, and the National Institute of Mental Health Intramural Research Program (A.H.B., N.J.M., E.L.M., F.H-B., and L.G.U.). We thank the following individuals: Wim Vanduffel, Hauke Kolster, and Leeland Ekstrom for their invaluable assistance with the imaging experiments conducted while at Massachusetts General Hospital, and Lucy Guillory for her assistance with animal training and collection of the imaging data.
- Correspondence should be addressed to Andrew H. Bell, MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge, United Kingdom, CB2 7EF.