Abstract
The mechanism underlying the processing of spatially separated multiple local features to form a unique whole object is an important issue in visual object recognition. We tested whether, in behaving monkeys, the spike correlation between pairs of inferior temporal (IT) neurons dynamically changes depending on the spatial configuration of the local features within a whole object. We prepared more than 60,000 face-like objects (FOs) and their corresponding non-face-like objects (NFOs) that consisted of random arrangements of the same set of local features as those in FOs. The spike correlation between a pair of neurons was quantified by the peak height of the shift predictor-subtracted cross-correlogram. For both neurons of the pair, the local features in a whole object were determined so that they elicited as high a response as possible to enable a reliable cross-correlation analysis. We found that the FOs thus constructed elicited neuronal activities that were more strongly correlated than the corresponding NFOs. Firing rates of the same neurons did not show such a consistent bias depending on the feature configuration. Furthermore, receiver operating characteristic analysis revealed that this FO dominance of spike correlation was robust enough to discriminate between different feature configurations at the population level. Spike correlation of the cell pairs exhibited significant FO dominance within 300 ms after stimulus onset. The present results suggest that feature configuration within a unique whole object can be reflected in the rapid modulation of spike correlation among a population of neurons in the IT cortex.
- cross-correlation analysis
- cell assembly
- visual object recognition
- inferior temporal cortex
- macaque monkey
- neurophysiology
Introduction
A fundamental step in visual object recognition is the integration of spatially separated multiple local features into a single whole object. Many imaging studies and single-unit studies suggest that the higher-order association cortex is responsible for this process (Miyashita, 1993; Logothetis and Sheinberg, 1996; Tanaka, 1996; Tootell et al., 1996; Ishai et al., 1999; Rolls, 2000; Orban, 2001; Grill-Spector and Malach, 2004). In the human fusiform and inferior temporal gyri and the monkey inferior temporal (IT) cortex, there are various classes of neurons that respond specifically to familiar or extensively learned complex objects (Miyashita, 1988; Allison et al., 1994; Logothetis et al., 1995; Kobatake et al., 1998; Baker et al., 2002; Palmeri and Gauthier, 2004). In addition, for many complex objects, IT neurons respond equally well to geometrically less complex features within a whole object (Perrett et al., 1987; Desimone, 1991; Tanaka, 1996; Tsunoda et al., 2001). However, even a limited number of discrete features yield, when combined, an enormous number of unique whole objects, in each of which local features are arranged in particular configuration. The exact encoding schemes of such a unique whole object and configural information still remain essentially unknown despite their importance. One possible mechanism lies in the population rate coding, in which many IT neurons with broad but different stimulus selectivities can encode, as a population, any unique whole object (Pouget et al., 2000; Rolls, 2000). Another possible mechanism, although not exclusive of the former possibility, lies in correlated discharges of multiple neurons, each of which encodes local features within a whole object (von der Malsburg, 1981; Singer and Gray, 1995). Several studies have shown IT neurons to exhibit correlated discharges (Gochin et al., 1991; Gawne and Richmond, 1993; Tamura et al., 2004; Aggelopoulos et al., 2005), but it has not yet been examined whether the spatial configuration of multiple local features in a whole object can be reflected in the spike correlation in the IT cortex and whether such a neuronal correlation will emerge rapidly enough in response to a presented stimulus to mediate recognition.
Our aim, therefore, was to test the hypothesis that more strongly correlated discharges will emerge among IT neurons in response to the presentation of a whole object in which the local features are arranged in face-like configuration [face-like object (FO)] than to the presentation of a random arrangement of the same local features [non-face-like object (NFO)]. Face-like arrangement of local features leads to more rapid and precise recognition than random arrangement [“the face superiority effect” (Gorea and Julesz, 1990)]. In the present experiment, an enormous (>60,000) number of FOs were constructed by combining a variety of facial feature-like parts. We designed this procedure so that a single IT neuron could not be easily tuned to a specific FO (Kobatake et al., 1998; Baker et al., 2002). Simultaneous extracellular recordings from pairs of IT neurons were conducted during a visual discrimination task between FOs and NFOs. Neuronal correlation was examined by cross-correlating the neuronal activity elicited by FOs or NFOs.
Whole objects and their constituent parts used in a behavioral task. A, Examples of whole objects: FOs (left) and NFOs (right) composed of the same set of parts. The background was dark blue, and the objects were yellow. B, Constituent parts of the whole objects shown in A. We prepared 120 facial-feature-like parts (40 eye-like parts, 40 nose-like parts, and 40 mouth-like parts) to construct 64,000 (equal to 403) FOs and NFOs.
Materials and Methods
Behavioral tasks and visual stimuli. Two monkeys (Macaca fuscata) were initially trained to perform a delayed matching-to-sample (DMS) task (Miyashita, 1988) using 120 facial-feature-like parts (40 eye-like parts, 40 nose-like parts, and 40 mouth-like parts) (Fig. 1 B) with the delay period of 2 s. The IT neurons have been shown to reveal learning-dependent plasticity (Miyashita, 1993; Logothetis et al., 1995; Logothetis and Sheinberg, 1996; Kobatake et al., 1998; Baker et al., 2002). We aimed, therefore, to tune the IT neurons to the 120 parts through the extensive training on the DMS task using these stimuli. After extensive training on this task (at least 600 times of exposures for each part), monkeys were trained to perform an FO/NFO judgment task, in which neuronal responses to FOs and NFOs were examined. While a monkey fixed its gaze within 1.0-1.4°, an FO or NFO was presented for 1 s (cue period). Eye position was monitored using a scleral search coil (Judge et al., 1980). After a delay period (500 ms), the go signal appeared, which required the monkey to push the right or left button within 1 s depending on the presented stimulus (FO or NFO). The hand used to perform the task was controlled and counterbalanced between the monkeys. Monkeys performed this task with 99.7 ± 0.4% (mean ± SEM) correct responses during recording sessions. The monkeys were also trained to perform a passive viewing task, in which, while the monkey fixated within 1.0-1.4°, five different parts were sequentially presented for 350 ms each, with an interstimulus interval of 600 ms. During recording sessions, neuronal responses to the large (120) set of parts were assessed in a short time using this task. Whole objects were composed of four individual parts (size of each part, <2.3° × 2.3°) arranged in facial (FO) or random (NFO) configurations within a radius of ∼3° against a surrounding contour (7.8° high and 6.1° wide) without spatial overlapping. All animal experiments were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the regulations of the University of Tokyo School of Medicine
Recording procedures. Multiple single-unit activities were recorded from the IT cortex in three hemispheres of the two monkeys using Tetrodes (Thomas Recording, Giessen, Germany) (Aronov et al., 2003; Tamura et al., 2004). Neuronal signals were amplified, bandpass filtered (500 Hz to 5 kHz) (Csicsvari et al., 1998; Tamura et al., 2004; Tomita and Eggermont, 2005), and sorted on-line into a pair of single units using the standard window discrimination technique. The optimal stimuli were determined during recording sessions for these on-line-sorted cell pairs (see below). Neuronal signals were also stored and digitized off-line at 25 kHz to sort more precisely into multiple single units by waveform analysis (DataWave Technologies, Longmont, CO) (Alonso et al., 1996; Usrey et al., 2000; Roy and Alloway, 2001; Lee et al., 2005). The presence of refractory period was confirmed in the auto-correlogram (Alonso et al., 1996; Csicsvari et al., 1998; Usrey et al., 2000). If the number of spikes with interspike intervals of shorter than 2 ms exceeded 1% of the total for a given unit, that unit was discarded or reisolated. These off-line-sorted spike data were used in the analyses of both responsiveness and spike correlation in the present study.
During a recording session, the parts comprising the optimal whole objects were determined by the minimax algorithm (Baky et al., 1981) for each on-line-sorted neuron pair so that they elicited as high a response as possible from both neurons. We first examined the responses of each neuron of the pair to the 40 eye-like parts during the passive viewing task and determined the one eye-like part to which the smaller of the responses of the two neurons was the largest among all of the 40 eye-like parts. This part was defined as the most effective eye-like part. The second and third most effective eye-like parts were similarly determined, as were the three most effective nose-like parts and mouth-like parts. This set of parts yielded 33 = 27 possible combinations of parts in a whole object for testing the responsiveness. Using these combinations of parts, we next examined the responses of the two neurons to FO and NFO in the FO/NFO judgment task. The spatial arrangement of parts for NFOs in this procedure was randomly determined before starting each recording session, and the same arrangement was used for all of the 27 combinations of parts in this procedure. A set of parts in a whole object was defined as the best if the smallest response of the two cells to the two arrangements (FO and NFO) was the largest among all the 27 combinations of parts. This best set of parts was used for the optimal FO (pFO) and the optimal NFO (pNFO) of the cell pair. Note that the spatial arrangement of the parts in the optimal NFO was changed in a different recording session, and the same arrangement was never used again. Fourth and fifth most effective parts were also determined (in the passive viewing task), and a similar procedure was applied to determine the five least effective parts. The resultant 30 parts (5 × 3 most effective parts and 5 × 3 least effective parts) were termed pooled parts. Nonoptimal FOs and NFOs were randomly constructed from all of the possible combinations of the pooled parts other than the parts used in the optimal FO/NFO (that is, 9 × 3 parts were available for nonoptimal FOs/NFOs). Optimal and nonoptimal FOs/NFOs were presented to the animal in a pseudorandom order.
During a recording session, we isolated a pair of single cells and determined optimal stimuli for that cell pair. Off-line spike sorting was later conducted to more precisely isolate the recorded multiple single cells to obtain the neuronal data for conducting additional analysis. Neuronal correlation was not examined during recording sessions, ensuring that the recorded cell pairs or their optimal stimuli could not be selected so that the cell pairs revealed spike correlation in response to the optimal whole objects or any other stimuli. For the optimal and nonoptimal stimuli, we calculated the minimum firing rates of single cells in each off-line-sorted cell pair. This minimum firing rate should be higher for optimal stimuli than for nonoptimal stimuli if the firing properties of the on-line-sorted cells were preserved for the off-line-sorted cells, and we confirmed that this was true for all the analyzed off-line-sorted cell pairs.
Data analysis. In the present study, the neuronal data were analyzed only in the correct trials. In the FO/NFO judgment task, we defined a cell as responsive to a whole object if the firing rate during the cue period (assessed in a 450 ms period from 80 ms after cue onset) was significantly (paired t test, p < 0.05) different from that in the corresponding period just before cue onset. We conducted cross-correlation analysis for a cell pair only if both constituent cells showed significant responses to both their optimal FO and NFO that were determined during the recording session before the cross-correlation analysis. We constructed raw cross-correlograms for lag times within 100 ms (1 ms resolution) using spikes recorded during a 1 s period beginning 80 ms after cue onset in the FO/NFO judgment task. The cross-correlogram for the optimal FO or NFO was constructed from spikes recorded in at least 55 trials for each stimulus and was accepted only if the available spikes exceeded 1600 (mean ± SEM, 4357 ± 228 for all of the analyzed cell pairs, at least 600 per cell). A shift predictor, calculated using one-trial-shifted spike trains (Constantinidis et al., 2001, 2002; Tamura et al., 2004; Kohn and Smith, 2005), was smoothed (five-bin boxcar averaging) (Nowak et al., 1995) and then subtracted from the raw cross-correlogram to remove the stimulus-locked component (Perkel et al., 1967; Nowak et al., 1995; de Oliveira et al., 1997; Das and Gilbert, 1999; Steinmetz et al., 2000; Usrey et al., 2000; Bair et al., 2001; Aggelopoulos et al., 2005; Kohn and Smith, 2005), yielding the shift predictor-subtracted cross-correlogram (SSCC). The peak height of the SSCC was then normalized by the SD of the shift predictor to calculate a z-score (de Oliveira et al., 1997; Constantinidis et al., 2001). Peaks were identified as significant if the z-score exceeded the level corresponding to p = 0.05 [one-tailed, z > 2.81, detected within the lag time of 10 ms (-10 to +10 ms), corrected for multiple comparisons]. The cross-correlograms for nonoptimal FOs and NFOs were constructed similarly from spikes in at least 75 trials and were accepted only if available spikes exceeded 1300 (3921 ± 277, at least 350 per cell). We adopted these criteria for the minimum number of spikes because nonoptimal stimuli did not drive the cell pairs as strongly as the optimal stimuli did. However, we confirmed, for nonoptimal whole objects, that the feature configuration dependence of the spike correlation remained unchanged when the criteria for the minimum number of spikes were the same as those for the optimal whole objects (data not shown). The number of spikes used to calculate SSCCs was not significantly different between the optimal and nonoptimal whole objects (p > 0.3, paired t test). Because the spike-sorting method used in the present study did not separate multiple spikes fired simultaneously (within 1.28 ms), leading to under-estimation of spike counts on the resultant raw cross-correlograms within this window, counts on SSCCs within the corresponding lag times (within 2 ms) were not included in our analyses (Constantinidis et al., 2002; Tamura et al., 2004). Regarding the type of neuronal interaction, we defined the peak as “one-sided-like” when the bins with significant correlation were detected only at one side of the SSCC. We also defined the “presumable presynaptic and postsynaptic neurons” according to the putative side of the one-sided-like peak in the SSCC. For the neurons that exhibited significant correlation in response to pFOs, stimulus selectivity was defined by calculating the proportion of FO stimuli that elicited over half the maximum response after subtracting the spontaneous firing rate.
Time course of neuronal correlation was analyzed using SSCCs calculated from spikes within a sliding time window (see Fig. 6) (Nowak et al., 1995; deCharms and Merzenich, 1996; de Oliveira et al., 1997; Constantinidis et al., 2002). In this analysis, data from different time windows were not independent of one another, because SSCCs for different time windows were constructed using partially overlapping populations of spikes. Bonferroni's method was therefore applied after paired t tests for each time window to correct for overlapping spike samples and for multiple comparisons. Because the probability of detecting false positives with repeated applications of statistical tests is lower when the data are mutually correlated than when the data are mutually independent, Bonferroni's correction for our data provided statistically conservative results (Frison and Pocock, 1992). In the correlation analysis using a short (200 ms) time window (see Fig. 7), raw cross-correlograms and shift predictors were calculated at a resolution of 2 ms (one cell pair with available spikes of <400 was excluded from the analysis for the period of 100-300 ms), the SSCCs were calculated by subtracting the shift predictor from the raw cross-correlogram, and z-scores were calculated using the square root of the expected correlation strength instead of the SD of the shift predictor (Eggermont, 1992; de Oliveira et al., 1997; Thiele and Stoner, 2003). We used the above method in this analysis, because the spikes were collected in a short period, and thus the truncated spike trains produced artificially higher bin counts around the lag time of 0 ms in the shift predictor, leading to overestimation of the variance of the SSCC bin counts if the SD of the shift predictor was used for estimating the variance.
We also quantified the spike correlation by neural correlation coefficient (NCC) (Abeles, 1982; Eggermont, 1992; Roy and Alloway, 2001; Tomita and Eggermont, 2005). The NCC was calculated as defined in the previous studies, using the peak height and the bin width of the raw cross-correlogram, the number of spikes from each neuron of the pair, and the total recording time (Eggermont, 1992; Tomita and Eggermont, 2005).
Receiver operating characteristic (ROC) analysis was conducted to assess the stimulus discriminability of spike correlation at the population level (Macmillan and Creelman, 1991; Palanca and DeAngelis, 2005) (see Fig. 5). The hit rate for detecting pFOs at a given threshold was defined as the proportion of cell pairs that exhibited higher correlation in response to pFOs than the threshold. The false alarm rate for detecting pFOs at the threshold was defined as the proportion of cell pairs that exhibited higher correlation in response to pNFOs than the threshold. The threshold was moved throughout the distribution of SSCC peak height from the highest to the lowest value among the stimuli, and hit/false alarm rates were calculated for each threshold to construct the ROC curve. To quantify the stimulus discriminability, the area under the ROC curve was calculated. For assessing the statistical significance of the area under the ROC curve, 5000 surrogate ROC curves were constructed from the permutated distributions of SSCC peak height among the different feature configurations (pFOs and pNFOs), and the area under the ROC curve was calculated for each surrogate curve to compare with the real value. ROC analysis for the average firing rates of individual cell pairs was also similarly conducted.
The information about the feature configuration carried in a single trial by spike correlation or by firing rate was calculated as in the previous study (Franco et al., 2004). Briefly, the spike correlation of a cell pair in each trial was quantified using Pearson's correlation coefficient. Then, the expected stimulus from the spike correlation for that trial was determined by the vicinity of the correlation coefficient for that trial to the mean value for pFOs or pNFOs, which was calculated with the current trial excluded. The expected stimulus was determined for all trials, and the mutual information between the expected and the actual stimulus was calculated to estimate the information carried by the correlated spikes. The firing rates of the two neurons of a pair in each trial were represented as a two-dimensional vector. The dot product between the vector for each trial and the vector for the mean (pFOs or pNFOs, the current trial was excluded from the calculation of the mean) was computed and was normalized by the product of both of the vector lengths. The expected stimulus for each trial was determined by comparing these normalized dot products for pFOs and for pNFOs, and the information in the firing rates of both neurons of the pair was estimated from the mutual information between the expected and the actual stimulus.
All of the statistical tests in the present study were two-tailed unless otherwise stated.
Results
We prepared 120 facial-feature-like parts (40 eye-like parts, 40 nose-like parts, and 40 mouth-like parts) to construct 64,000 (403) FOs and their corresponding NFOs that consisted of random arrangements of the same constituent parts (Fig. 1). This large number of FO/NFO repertoires enabled us to test the responsiveness and the spike correlation of neuron pairs to a unique whole object, which the monkeys had encountered only a few times before it was used in a recording session for examining neuronal correlation. Simultaneous extracellular recordings from pairs of neurons were conducted in the IT cortex of two monkeys while the animals performed a visual discrimination task in which the configuration of local features (FO or NFO) had to be distinguished. In a recording session, we first examined the responses of a pair of cells to all of the 120 parts and then determined the pFO and the pNFO of the cell pair to elicit as high a response as possible from both neurons. Both the pFO and pNFO were composed of the same constituent parts, the only difference being their configurations (Fig. 1). The spike correlation between a pair of neurons was quantified based on the peak height of the SSCC (Nowak et al., 1995; de Oliveira et al., 1997; Das and Gilbert, 1999; Aggelopoulos et al., 2005).
Feature-configuration-dependent spike correlation of a pair of IT neurons. A, Waveforms, auto-correlograms, and peristimulus time histograms of a pair of IT neurons to their optimal FO and optimal NFO. Fifty traces of action potentials were superimposed. Horizontal bars below each peristimulus time histogram indicate 1 s duration of stimulus presentation (cue period). B, Correlated activity of the cell pair. Top, Raw cross-correlograms of the cell pair obtained with the optimal FO (left, red) and optimal NFO (right, blue). Abscissa, Spike time of cell 1 relative to that of cell 2. Bin width, 1 ms. Gray traces, Shift predictors. Bottom, SSCCs of the cell pair obtained with the optimal FO (left, red) and optimal NFO (right, blue). The cross-correlograms for the bins within ±2 ms were drawn with thin lines and were not included in the analyses (see Materials and Methods). Horizontal gray lines, Confidence limit (p = 0.05, corrected for multiple comparisons).
Neuronal correlation in the IT cortex depends on the feature configuration
The cell pair shown in Figure 2 exhibited differential spike correlation that depended on the feature configuration in a whole object. Both cells in this pair responded significantly to both the pFO and pNFO (Fig. 2A). Raw cross-correlogram for this cell pair showed a prominent peak only with the pFO (Fig. 2B, top), despite the fact that both the pFO and pNFO were composed of the same constituent parts (Fig. 2B, top inset). The peak height of the SSCC was significant for the pFO (p < 0.001) (see Materials and Methods) (Fig. 2B, bottom left) but not for the pNFO (p > 0.3) (Fig. 2B, bottom right). The full-width at half-maximum of the pFO-derived SSCC was 4 ms, and the lag time of the peak was 4 ms.
Population analyses were then performed with a total of 134 cells (from 48 recording sessions) that exhibited significant responses to both of their pFOs and pNFOs. Among these, 30 cell pairs composed of 50 cells showed significant peaks in their SSCCs (z > 2.81; p < 0.05) in response to either the pFO or pNFO of the cell pair and were further analyzed. For the significant majority of cell pairs (23 of 30 pairs, 77%; p < 0.004, χ2 test), SSCC peak height was higher for pFOs than for pNFOs (Fig. 3A). In total, the z-score of the peak height was significantly greater for discharges elicited by pFOs than by pNFOs (p < 0.003, paired t test; n = 30). This pFO-dominant spike correlation was consistently observed in both monkeys (supplemental Table 1, available at www.jneurosci.org as supplemental material). It has been shown that the correlated spikes within ∼10 ms could have an impact on the firing probability of the postsynaptic neurons (Usrey et al., 2000; Roy and Alloway, 2001). In the present study, therefore, a peak in an SSCC was detected within a 10-ms-lag window (Thiele and Stoner, 2003; Tamura et al., 2004; Kohn and Smith, 2005; Tomita and Eggermont, 2005), but the result was also significant when this window size was extended to 20 ms (p = 0.011) or 30 ms (p = 0.014). The “correlation strength” was calculated by computing the number of spikes under the SSCC peak bin divided by half the number of spikes from both neurons of the pair. The proportions of spikes under the SSCC peak bin were 1.6 ± 0.2% for pFOs and 1.1 ± 0.2% for pNFOs (mean ± SEM; n = 30). The difference was significant (p < 0.02, paired t test), consistent with the results regarding the peak height. Moreover, we also quantified the spike correlation by using another index, NCC (Abeles, 1982; Eggermont, 1992; Roy and Alloway, 2001; Tomita and Eggermont, 2005) and confirmed that this index also provided similar results [mean ± SEM, 0.017 ± 0.002 for pFOs, and 0.013 ± 0.002 for pNFOs; n = 30; the difference was significant (p < 0.01, paired t test)] as that provided by the original index derived from an SSCC. The above FO-dominant spike correlation was not observed when nonoptimal whole objects were analyzed (supplemental Table 1, available at www.jneurosci.org as supplemental material). We next examined the average firing rates of individual cell pairs during the same period as that used for the above correlation analysis. In contrast to the spike correlation, firing rates did not reveal a consistent bias that depended on the feature configuration (pFO vs pNFO; p > 0.8; n = 30) (Fig. 3B). This result also indicates that these neurons do not belong to a population of “face neurons” (Baylis et al., 1985; Desimone, 1991; Perrett et al., 1992; Rolls, 2000).
Population data for spike correlation and firing rate. A, Peak heights of SSCCs in z-scores for all the cell pairs, obtained for optimal FOs (ordinate) and NFOs (abscissa) (n = 30). Horizontal and vertical lines, Confidence limit (p = 0.05). p value depicted in the figure was derived from paired t test. Triangles represent outliers and were rescaled preserving the ratio of the peak heights for the two conditions. B, Mean firing rates for all the cell pairs in the same period as that for the above correlation analyses, obtained for optimal FOs (ordinate) and NFOs (abscissa) (n = 30).
The scattergram in Figure 4 shows the relationship between spike correlation and firing rate in each cell pair (n = 30). Of 23 cell pairs with higher spike correlation for pFOs than for pNFOs, 12 pairs (52%) showed higher firing rates in response to pNFOs, indicating that there were many cases in which the firing rate to the pFO was lower but the spike correlation was higher. Overall, Spearman's r of the scattergram was 0.16 and was not significant (p > 0.4; n = 30). Thus, the observed pFO dominance of neuronal correlation for each cell pair cannot be explained only by the difference in the firing rate between the pFO and pNFO for the same cell pair.
Relationship between spike correlation and firing rate for each cell pair. Difference in the SSCC peak height between the optimal FO and NFO for each cell pair was normalized by the sum (ordinate) and plotted against the normalized difference in the firing rate for the same cell pair (abscissa). Positive value indicates that the optimal FO evoked higher spike correlation (ordinate) or firing rate (abscissa) than the corresponding optimal NFO.
ROC analysis for assessing the reliability of spike correlation to discriminate between different feature configurations. A, ROC curves for spike correlation (black) and for firing rate (gray). Ordinate and abscissa represent the hit and false alarm rate for detecting pFOs, respectively (see Materials and Methods). B, Area under the ROC curve for spike correlation (black) and for firing rate (gray). Dotted line indicates the chance level. Statistical significance of the area under the ROC curves were assessed by constructing 5000 surrogate curves by permutating the original distributions of SSCC peak height or of firing rate among the different feature configurations (pFOs and pNFOs).
If the differences between the spike correlation for pFOs and pNFOs were smaller than the overall variance of correlation among the population of neurons, spike correlation might not provide a reliable signal to discriminate these feature configurations for that population of neurons. To address this possibility, we conducted ROC analysis, in which stimulus discriminability of spike correlation was assessed after pooling the data across cell pairs (Palanca and DeAngelis, 2005). Spike correlation consistently revealed a higher hit rate than false alarm rate for discriminating pFO from pNFO (Fig. 5A, black trace). We assessed whether the area under the ROC curve, the index of reliability for discriminating the stimuli, significantly exceeded the chance level (0.5) by comparing it with 5000 surrogate curves that were constructed by permutating the real data. The area under the ROC curve for spike correlation was 0.72 and was statistically significant (p < 0.001, permutation test; n = 30) (Fig. 5B, right). This result suggests that configuration dependence of spike correlation can be a reliable signal in a population of neurons to discriminate the presented stimulus. The firing rates did not reveal significant stimulus discriminability (the area under the ROC curve was 0.55; p > 0.2, permutation test; n = 30). We next conducted information theoretic analysis to assess whether the strength of spike correlation carried reliable information about the feature configuration in a single trial. A significant amount of information about the feature configuration was carried by the spike correlation compared with that calculated from the trial-shifted spike trains, even in a single trial (0.020 ± 0.003 vs 0.006 ± 0.001 bits, mean ± SEM; p < 0.005, paired t test; n = 30). In 17% of the neuron pairs (5 of 30 pairs), the information in the correlated spikes exceeded 80% of that in the firing rates of both neurons of the pair. However, in most neuron pairs, a substantially larger amount of information was available in the firing rates of both neurons of the pair relative to that in the spike correlation (0.27 ± 0.05 vs 0.02 ± 0.003 bits, mean ± SEM; p < 0.001, paired t test; n = 30), consistent with the previous result (Aggelopoulos et al., 2005). Note that the information analysis did not discriminate which configuration was more effective or whether the effective configuration was consistent among the population of neurons. Consistent bias and robust stimulus discriminability of spike correlation thus raise the possibility that spike correlation also provides a reliable signal for discriminating the stimulus in a population of neurons.
In the present study, the central bins of the SSCC were not observable, and thus it cannot be determined whether the observed peaks indeed straddled the time 0 or were one sided. To examine the type of neuronal interaction between the recorded neurons, we attempted to define the one-sided-like peak in the SSCC (see Materials and Methods). Of 29 (17) pairs of neurons that exhibited significant correlation for pFOs (pNFOs), 20 (9) pairs showed one-sided-like peaks. We did not observe a significant difference in the stimulus selectivity between the presumable presynaptic and postsynaptic neurons (0.20 ± 0.04 vs 0.19 ± 0.05; n = 20; p > 0.8, paired t test).
Temporal dynamics of neuronal correlation
The correlation analyses in the previous section did not provide information about when or how long the neuronal correlation elicited by pFOs exceeded that elicited by pNFOs. To address these issues, we next examined the time course of neuronal correlation (Nowak et al., 1995; deCharms and Merzenich, 1996; de Oliveira et al., 1997; Constantinidis et al., 2002). Figure 6A shows the data of a representative cell pair. The SSCC at the time point of 0 ms was calculated using spikes recorded during the 500 ms period just before cue onset. This 500 ms window was then successively shifted in steps of 100 ms. The resultant surface plot for this cell pair constructed from the SSCCs contained a ridge of peaks at a lag time of 4 ms. The pFO-induced spike correlation first exceeded those elicited by the pNFO at the time point of 500 ms, a time when the SSCC was constructed using spikes that occurred during the period from 0 to 500 ms after cue onset (Fig. 6A). In contrast, there was little difference in the average firing rate of this cell pair elicited by the pFO and pNFO throughout the analyzed period (Fig. 6C).
Temporal dynamics of spike correlation. A, Time course of spike correlation (z-score) of a pair of neurons in response to the optimal FO (left) and NFO (right). The surface plots were constructed from SSCCs calculated using spikes in a 500 ms window that was successively shifted in steps of 100 ms. SSCC at the time 0 was calculated using spikes in a 500 ms period just before cue onset. B, C, Auto-correlograms of the pair of neurons (B) and time course of normalized firing rates of the cell pair in response to the optimal FO (red) and NFO (blue) (C). D, E, Average time courses (thick lines) of spike correlation (D) and firing rate (E) for optimal FOs (red) and NFOs (blue) across all of the cell pairs (n = 30). Thin lines, Average ± SEM. Seven and six cell pairs were excluded from the analyses for the time points of 0 and 100 ms, respectively, attributable to low firing rates. Asterisks, Significant differences between optimal FOs and NFOs [paired t test, corrected for multiple comparisons and overlapping sampling of spikes in different windows (Frison and Pocock, 1992); see Materials and Methods].
The population data obtained with the analysis described above are shown in Figure 6, D and E. The SSCC peak height for each cell pair in each period was normalized using its maximum and minimum values among all of the periods and stimuli (pFO and pNFO) for the same cell pair. These normalized values of SSCC peak height were then averaged across all of the cell pairs. The averaged time course of the firing rates was calculated similarly. Neuronal correlation elicited by pFOs was significantly stronger than that elicited by pNFOs at the time points of 500 and 600 ms [p < 0.02 and p < 0.01, respectively, paired t test; the effects of both multiple comparisons and overlapping samplings were corrected (Frison and Pocock, 1992); see Materials and Methods] (Fig. 6D). We confirmed that the NCC also provided similar results (p < 0.02 and p < 0.03 for the time points of 500 and 600 ms, respectively). The firing rates elicited by pFOs and pNFOs were not significantly different at any time examined (p > 0.2) (Fig. 6E).
To more precisely determine when the observed difference in the neuronal correlation emerged, we further divided the period from 0 to 500 ms after cue onset, during which the differential correlation was observed (Fig. 6D), into two periods (from 100 to 300 ms and from 300 to 500 ms, after cue onset) and examined the spike correlation in each period. The peak height of the SSCC for pFOs was significantly higher than that for pNFOs as early as 100-300 ms after cue onset (p < 0.04, paired t test, the effect of multiple comparisons for the divided time window was corrected; n = 29, one cell pair was excluded from the analysis because of the small number of spikes within the window; see Materials and Methods) (Fig. 7A, top left, B, left). Firing rates were highest in this period for both pFOs and pNFOs at the population level (data not shown) and did not reveal significant dependence on the feature configuration (p > 0.3) (Fig. 7A, top right, C, left). During the period from 300 to 500 ms after cue onset, feature configuration dependence was significant in neither the spike correlation (p = 0.12) (Fig. 7A, bottom left, B, right) nor the firing rate (p > 0.9) (Fig. 7A, bottom right, C right), although the spike correlation still revealed weak pFO dominance. Thus, neuronal correlation was modulated by feature configuration in a whole object within 300 ms after stimulus onset, whereas the firing rates did not reveal configuration-dependent bias in the same period during which the majority of recorded neurons fired maximally.
Discussion
In the present study, we found that the discharges of cell pairs elicited by pFOs were more strongly correlated than those elicited by pNFOs. ROC analysis revealed the robustness of correlation difference between pFOs and pNFOs compared with the overall variance among the recorded population of cell pairs, indicating reliable stimulus discriminability by spike correlation. The pFO-dominant spike correlation emerged within 300 ms after stimulus onset, which is rapid enough to mediate recognition of the presented stimulus. Firing rates did not reveal consistent dependence on the feature configuration even in this period, in which the majority of the recorded IT neurons fired maximally. Our findings suggest that the spatial configuration of multiple local features in a unique whole object can be reflected in the temporally correlated activity of a population of neurons in the IT cortex.
Rapid emergence of feature configuration-dependent spike correlation. A, Scatter grams showing peak heights of SSCCs in z-score (left) and mean firing rates (right) for all of the cell pairs, elicited by optimal FOs (ordinate) and NFOs (abscissa) in the periods of 100-300 ms (top; n = 29, one pair with small number of spikes was excluded) and 300-500 ms (bottom; n = 30) after cue onset. B, C, Average peak heights of SSCCs in z-scores (B) and average firing rates (C) across all of the cell pairs elicited by optimal FOs (gray) and NFOs (white) during the periods of 100-300 ms (left; n = 29) and 300-500 ms (right; n = 30) after cue onset. Asterisk, Significant difference assessed by paired t test corrected for multiple comparisons for the divided time window. Error bars represent SEM.
Detection of configuration-dependent spike correlation
We quantified the spike correlation of cell pairs by calculating z-scores of the peak height of SSCCs that were constructed by subtracting a shift predictor from the raw cross-correlogram. We also calculated the NCC (Abeles, 1982; Eggermont, 1992; Roy and Alloway, 2001; Tomita and Eggermont, 2005) as another index of correlation strength and confirmed the consistency of the results obtained with these two different statistical indices. When the spike correlation is assessed during the period of sensory stimulation, as in the present study, the raw cross-correlogram exhibits a sharp peak derived from the neuronal connectivity atop a broad foothill derived from the stimulus-locked covariation of firing rates. The component derived from the neuronal connectivity thus can be extracted from the raw cross-correlogram by subtracting the shift predictor, which reflects only the stimulus-locked component (Perkel et al., 1967; Nowak et al., 1995; de Oliveira et al., 1997; Das and Gilbert, 1999; Steinmetz et al., 2000; Usrey et al., 2000; Bair et al., 2001). This method for quantifying the spike correlation is, however, sensitive to the variance of the bin counts in the shift predictor (Brody, 1999). Conversely, the NCC is not directly affected by the bin count variance of the shift predictor, because the expected value of the correlation, which is subtracted from the raw cross-correlogram, is a single value calculated from the number of spikes, the recorded duration, and the bin width of the raw cross-correlogram. However, because the broad structure in the shift predictor is considered indirectly in calculating the NCC, it might be difficult to extract the component derived from the neuronal connections if the shift predictor has a complex structure. The consistency of our results obtained with those two different methods thus suggests that the present results are robust regardless of the methods for quantifying the spike correlation.
The advantage in the spike correlation for pFOs over pNFOs rests on a relatively small fraction of the total spikes fired by each neuron (1.6 vs 1.1% for pFOs and pNFOs; see Results). Note, however, that the pFO dominance of spike correlation was robust against overall variance of spike correlation among the population of neuron pairs (Fig. 5). Correlated spikes have an additional impact on the firing probability of the postsynaptic neurons (Alonso et al., 1996; Usrey et al., 2000; Roy and Alloway, 2001). Thus, relatively small difference in the spike correlation as observed might be effective in a network of IT neurons to discriminate between different feature configurations.
To construct reliable cross-correlograms, a sufficient number of spikes were needed in both pFO and pNFO trials. For that purpose, the minimax algorithm was used in recording sessions to determine the pFOs and pNFOs, and the off-line-sorted cells that responded to the pFOs or pNFOs with low firing rate were not included in the analyses (see Materials and Methods). Our results were thus obtained from this subpopulation of IT neurons, and it is an open question how another subpopulation of IT neurons would have participated in the representation of unique whole objects.
Possible sources of pFO-dominant neuronal correlation
In the present study, pFO-induced neuronal correlation was shown to be stronger than that induced by pNFO. Several possibilities can be raised as the source of this pFO-dominant neuronal correlation. One possibility is the common input from the neurons that selectively respond to a specific pFO but not to the corresponding pNFO. Regarding the possible existence of such a pFO-selective neuron, 64,000 different FOs were presented in a nearly trial-unique manner before they were used for examining the neuronal connectivity. Thus, we expect IT neurons not to have become selective for a specific FO. Although the recorded cell pairs indeed responded well to pFOs, they responded equally well to their corresponding pNFOs (Fig. 3B), suggesting that stimulus selectivities of these neurons might be related to the facial-feature-like parts used in the pFOs/pNFOs. It is thus unlikely that the observed pFO-dominant spike correlation was evoked by common input from neurons that were tuned to a specific pFO.
A second possible source of pFO-dominant spike correlation is the common input from the face neurons that respond to many faces (Perrett et al., 1992). It might be difficult, however, to explain the observed pFO-dominant spike correlation by this mechanism because the face neurons would also respond to nonoptimal FOs, leading to higher spike correlation in response to nonoptimal FOs than to nonoptimal NFOs, which is not consistent with our finding (supplemental Table 1, available at www.jneurosci.org as supplemental material). Instead, the pFO-dominant spike correlation might be explained by considering more complex modulations of local synaptic transmission in the neuronal circuit rather than only a simple common input from the face neurons.
A third possibility is that the pFO-dominant spike correlation was induced by stronger attention to FOs than to NFOs, because neuronal synchronization is enhanced by attention (Steinmetz et al., 2000; Fries et al., 2001). If the spike correlation was enhanced by stronger attention directed to FOs than to NFOs, then nonoptimal FOs should also exhibit stronger correlation than nonoptimal NFOs, which was not the case in the present study (supplemental Table 1, available at www.jneurosci.org as supplemental material). Therefore, it is unlikely that the pFO-dominant neuronal correlation was provided by general enhancement of the spike correlation by stronger attention to FOs. Still, the firing rate of neurons can be modulated by attention in a stimulus-dependent manner (McAdams and Maunsell, 1999; Reynolds et al., 2000). Therefore, the pFO-dominant spike correlation might occur if there exist some mechanisms to transmit such a stimulus-specific attentional modulation to the recorded cell pairs without systematic changes in their firing rates.
Neuronal correlation in response to a face-like-pattern
Previous psychophysical studies have revealed the existence of the face superiority effect, in which the parts arranged in a face-like pattern can be more rapidly and accurately recognized by human observers than those arranged in a random patterns (Homa et al., 1976; Gorea and Julesz, 1990; Tanaka and Farah, 1993). The pFO-dominant spike correlation we found might be one of the possible mechanisms underlying this face superiority effect. In human EEG experiments, stronger gamma-band coherence was induced in response to upright Mooney face, which was perceived as a face, than to the inverted one, which was not perceived as a face (Rodriguez et al., 1999). Another EEG study showed that a line-drawn image of a real object evoked stronger gamma oscillation than a moderately scrambled image of the same object (Herrmann et al., 2004). We did not encounter pFO-induced gamma oscillations in the present study (Singer and Gray, 1995). This might be because the cross-correlation between single-unit activities is less sensitive in detecting oscillatory synchronization than the cross-correlation between multiunit activities or local field potentials (Fries et al., 2001; Lee et al., 2005). Regarding the origin of the observed higher spike correlation with the facial configuration of constituent parts, it might have been acquired through the extensive training on the FO/NFO judgment task in which the facial configuration was repeatedly presented with a variety of local features, as discussed in human imaging studies (Palmeri and Gauthier, 2004). Whether the observed spike correlation is also involved in the representation of objects in general by IT neurons is an important issue to be resolved in future studies.
Several studies have been conducted to assess the spike correlation in the IT cortex of anesthetized (Tamura et al., 2004) or of awake (Gochin et al., 1991; Gawne and Richmond, 1993; Aggelopoulos et al., 2005) monkeys. However, it has not yet been demonstrated that the spike correlation in the IT cortex is dynamically modulated by the presentation of complex visual object. The present results suggest that feature configuration within a unique whole object can be reflected in the rapid modulation of spike correlation among a population of neurons in the IT cortex of behaving monkeys. This study also demonstrated that these cross-correlation analyses would be an important tool for shedding light on the local structures within a cell assembly in the IT cortex in which the cognitive computations are implemented.
Footnotes
This work was supported by Grant-in-Aid for Specially Promoted Research 14002005 (Y.M.) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. We thank Y. Naya, K. Nakahara, M. Takeda, and J. Kishimoto for discussion and H. Morimoto for technical support.
Correspondence should be addressed to Dr. Yasushi Miyashita, Department of Physiology, The University of Tokyo School of Medicine, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan. E-mail: yasushi_miyashita{at}m.u-tokyo.ac.jp.
Copyright © 2005 Society for Neuroscience 0270-6474/05/2510299-09$15.00/0