Abstract
A crucial step in understanding the function of a neural circuit in visual processing is to know what stimulus features are represented in the spiking activity of the neurons. For neurons with complex, nonlinear response properties, characterization of feature representation requires measurement of their responses to a large ensemble of visual stimuli and an analysis technique that allows identification of relevant features in the stimuli. In the present study, we recorded the responses of complex cells in the primary visual cortex of the cat to spatiotemporal random-bar stimuli and applied spike-triggered correlation analysis of the stimulus ensemble. For each complex cell, we were able to isolate a small number of relevant features from a large number of null features in the random-bar stimuli. Using these features as visual stimuli, we found that each relevant feature excited the neuron effectively in isolation and contributed to the response additively when combined with other features. In contrast, the null features evoked little or no response in isolation and divisively suppressed the responses to relevant features. Thus, for each cortical complex cell, visual inputs can be decomposed into two distinct types of features (relevant and null), and additive and divisive interactions between these features may constitute the basic operations in visual cortical processing.
An important goal in studying the receptive-field properties of visual neurons is to understand how they respond to complex spatiotemporal inputs, including those encountered in natural scenes. To analyze the responses to complex stimuli, a useful approach is to decompose the stimuli into a set of basic features (basis set) and to characterize how each feature contributes to the neuronal response. Several methods have been used to define basis sets for the efficient representation of visual stimuli, including principal component analysis (PCA), independent component analysis (Bell and Sejnowski, 1997; van Hateren and Ruderman, 1998), and/or analysis based on sparse coding (Olshausen and Field, 1996). For studying the response properties of a given visual neuron, it is desirable to construct a basis set so that the neuron responds to only a small number of visual features in the set. The segregation between a small number of “relevant” visual features and a large number of “irrelevant” features can greatly facilitate experimental characterization of the visual neuron.
For neurons with a linear stimulus–response relationship, relevant visual features can be identified by estimating their linear receptive fields using a spike-triggered average of the stimulus ensemble (also called “reverse correlation”) (de Boer and Kuyper, 1968). This method has been widely used to measure the spatiotemporal receptive fields of neurons in the early visual pathway (Jones and Palmer, 1987;Reid et al., 1997); the resulting receptive fields can largely account for the neuronal responses to complex spatiotemporal stimuli (Brodie et al., 1978; Dan et al., 1996). However, in the visual cortex most of the neurons are complex cells with nonlinear stimulus–response relationships that cannot be characterized with the spike-triggered average. In the present study, we have used spike-triggered correlation analysis of the stimulus ensemble (de Ruyter van Steveninck and Bialek, 1988; Yamada and Lewis, 1999; Brenner et al., 2000) to construct the basis set for each complex cell. We found that visual features in such a set are clearly segregated into two categories: a small number of relevant features and a large number of null features. Using visual stimuli consisting of either a single feature or a combination of features, we directly measured the contribution of each type of feature to the cortical responses.
MATERIALS AND METHODS
Physiological preparation. Adult cats (2–3 kg) were initially anesthetized with isoflurane (3%, with O2) followed by sodium pentothal (10 mg/kg, i.v., supplemented as needed). During recording, anesthesia was maintained with sodium pentothal (3 mg · kg−1 · hr−1, i.v.), and paralysis was maintained with pancuronium bromide (0.1–0.2 mg · kg−1 · hr−1, i.v.). The pupils were dilated with 1% atropine sulfate, nictitating membranes were retracted with 2.5% phenylephrine hydrochloride, and the eyes were mechanically stabilized and optimally refracted. End-expiratory CO2 was maintained at 4%, the core body temperature was kept at 38°C, and the electrocardiogram and EEG were monitored continuously. All experimental procedures were performed as approved by the Animal Care and Use Committee at the University of California, Berkeley.
Recording. Extracellular recordings were made with tungsten electrodes (A-M Systems, Carlsborg, WA). Unit isolation was based on the cluster analysis of waveforms and the presence of a refractory period determined from the autocorrelograms. Cells were classified as simple if their receptive fields had clear on and off subregions (Hubel and Wiesel, 1962) and if the ratio of the first harmonic to the DC component of the response to an optimally oriented drifting grating was >1 (Skottun et al., 1991). All other cells were classified as complex. Among the 61 complex cells recorded, one was excluded from analysis because of its low firing rate in response to random-bar stimuli (<1 spike per second).
Visual stimulation. Visual stimuli were generated with a personal computer and presented with a Barco monitor (size, 40 × 30 cm; refresh rate, 120 Hz; maximum luminance, 80 cd/m2). Luminance nonlinearities were corrected using software written in our laboratory. The random-bar stimuli were presented in a rectangular patch covering the receptive field of each cell. This patch was divided into 16 bars aligned to the optimal orientation of the cell; the length of the bars was equal to or slightly longer than the receptive field. The contrast of each bar was temporally modulated according to a pseudorandom binary m-sequence (Sutter, 1987) (luminance, ±39 cd/m2 from the mean of 40 cd/m2). The full m-sequence was 32,767 frames long and was updated every other frame, for an effective frame rate of 60 Hz. To measure the contrast–response functions of individual features (see Fig. 5), we randomly interleaved short movies (16 frames per movie) of relevant and null features, each at a range of contrasts (positive and negative, see below for definition of contrast), with no gap between movies. To measure the interaction between two relevant features (see Fig. 6), we generated a set of short movies containing all possible linear combinations of the two features. The number of repetitions for each short movie varied between 1 and 120, which was proportional to the probability of the corresponding feature contrast in the random-bar stimuli (for example, the probability of a high contrast for a given feature in the random-bar stimuli is generally lower than the probability of a low contrast for the same feature; thus, the movie of the feature at the high contrast was repeated fewer times). Note that in each movie, which contains either a single feature or a combination of features, the luminance of each bar must be between −1 and 1 (corresponding to 0 and 80 cd/m2, respectively), which limits the maximum contrast of each feature that can be presented (the definition of contrast is described below).
Spike-triggered correlation analysis. In general, if certain features in the visual stimuli affect the firing probability of the cell, the spike-triggered stimulus ensemble should exhibit a different probability distribution from the entire stimulus ensemble (see Fig.1B; compare the distribution of the filled circles and the distribution of all of the circles). Although a change in the probability distribution can be reflected in a change in the first-order (mean), second-order (variance), or higher-order moments, the correlation analysis aims to identify features with changed variance. Because PCA results in a set of components with their variance ranking from the highest to the lowest, it is ideally suited for the identification of features with outstanding variance. Practically, identification of relevant features was achieved by finding eigenvalues of the spike-triggered correlation matrix that were significantly different from the eigenvalues of the control correlation matrix (computed by randomly sampling the entire stimulus ensemble). For each cell, responses to three to four repeats of the random-bar stimuli (∼9 min) were used for spike-triggered correlation analysis. Each pattern in the stimulus ensemble consisted of luminance at 16 bar positions at 16 frames (assuming that neuronal spiking probability depends only on the immediate stimulus history within 16 frames, lasting for 268 msec), which was uniquely specified by 256 parameters. The spike-triggered correlation matrix, [Cm,n] (m,n = 1, 2, … , 256) was computed as follows: where Sm(i) andSn(i) are themth and nth parameters of the stimulus pattern preceding the ith spike, respectively, and N is the total number of spikes in the response. The resulting matrix is closely related to the second-order Wiener kernel (Wiener, 1958;Marmeralis and Marmeralis, 1978) of the neuron. Eigenvalues and eigenvectors of this spike-triggered correlation matrix were then computed. To compute each control correlation matrix, we generated a random spike train with the same number of spikes as in the recorded response but with random spike timing; the correlation matrix was computed based on this simulated random spike train. Because subsequent experiments required fast identification of the significant eigenvectors, we computed only five control correlation matrices for each cell during the experiments; the confidence interval for the control eigenvalues was set at mean ± 5.2 SD (corresponding top < 10−4). Eigenvectors with eigenvalues outside of the control confidence interval were considered significant. Subsequent offline analyses with 100 control matrices confirmed that the significant eigenvectors were identified reliably using only five control matrices.
An important question is how the constraints of the above method affect the outcome of the analysis. For example, the eigenvectors must be orthogonal, which could affect the visual features identified. As shown in Figure 4, the two significant eigenvectors for most complex cells exhibited similar spatial frequencies; one might suspect that the ∼90° spatial phase difference between them resulted from the orthogonality between the two vectors. However, this is not the case. First, both vectors are spatiotemporal patterns. The fact that the dot product of them is 0 (summed over all temporal delays) does not uniquely specify their spatial relationship at each temporal delay. Second, two Gabor functions (which were used to fit the spatial profiles of the vectors in Fig. 4) with a 90° phase difference are generally not orthogonal to each other. Even if the two vectors are orthogonal at each temporal delay (which is not imposed by the method), their phase difference still may not be 90°. Thus, the spatial phase relationship we have shown is not a trivial consequence of the method but is a reflection of the response property of complex cells. Another question is whether this method allows identification of visual features that are not orthogonal to each other. Generally, even if the features are not orthogonal, this method can still be used to identify linear combinations of the features. Subsequently, the relationship between the visual features and the neuronal response may be revealed by measuring the joint contrast–response function of the significant eigenvectors (see Fig. 6A). Finally, although PCA is a linear method for decomposing each stimulus into the sum of multiple eigenvectors, it does not require additive interaction between different eigenvectors in the response of the neuron. Even if the cell does not sum the responses to different features, this method can still identify either the individual features or linear combinations of them. The type of interaction between visual features can then be determined through analysis of the joint contrast–response function. These points can be demonstrated using simulated responses of model cells with the feature selectivity described above (data not shown). Finally, it is important to keep in mind that this method does not necessarily identify all of the features that affect the responses of the neuron, especially those that contribute weakly to the response.
Contrast–response function. For measuring the contrast–response functions, the contrast of the kth eigenvector in the stimulus, Vk, is defined as the dot product between the stimulus vector and the eigenvector, as follows: where −1 ≤ S(x,t) ≤ 1 represents luminance at the tth temporal frame in thexth bar position of the stimulus pattern. Because the eigenvectors are normal: the scaling factor 1/16 in the definition ensures that the contrast of each stimulus pattern is bound between −1 and 1. In the joint contrast–response function, the contrasts of both eigenvectors (Figs. 6A,B, 7A, contrast 1 and contrast 2) are defined the same as above.
Estimation of upper limit for correlation coefficient. To estimate the upper limit for the correlation coefficient between the predicted and measured contrast–response functions of relevant visual features (see Fig. 6), we simulated the functions measured from a finite number of repeats using a parametric bootstrap (Efron and Tibshirani, 1993). Briefly, for each stimulus that was repeatedL times in the experiment with recorded responsesr1,r2, … ,rL, we simulated the response by drawing random samples (r1′,r2′, … ,rL′, from a Gaussian distribution with the same mean and variance as the recorded responses (r1,r2, … ,rL) and computed the average of the simulated responses, as follows: Repeating this step for all of the contrast levels resulted in a simulated contrast–response function with a noise level comparable with that measured experimentally. We then computed the mean ± 95% confidence interval (obtained from 500 simulations) of the correlation coefficient between contrast–response functions obtained in different trials of the simulation. This was used as an estimate of the upper limit for the correlation coefficient between the predicted and measured contrast–response functions set by noise in the measured responses.
RESULTS
Segregation between two types of visual features
Single-unit recordings were made from complex cells in the striate cortex of anesthetized adult cats. The stimuli consisted of 16 bars along the preferred orientation of the cell, with each bar varying randomly between light and dark at 60 Hz (Fig.1A). To construct a basis set for each neuron that isolates the relevant visual features, we collected the spatiotemporal visual signals within a window of 268 msec (16 frames) before each spike and performed principal component analysis of this spike-triggered stimulus ensemble (Fig.1B, filled circle) (see Materials and Methods). Unlike the spike-triggered average, which is the mean of the spike-triggered stimulus ensemble, the present method identifies a set of visual features (represented by eigenvectors of the spike-triggered correlation matrix) that account for different amounts of variance (the corresponding eigenvalues) in the ensemble. A visual feature with an outstanding variance (significantly larger or smaller than the variance of the control ensemble) (Fig. 1B, open circle) is directly relevant to the spiking response of the neuron.
Figure 2A shows the 30 largest eigenvalues of the spike-triggered correlation matrix for a complex cell. Two eigenvalues (filled circles) conspicuously stood out from the rest (open circles), suggesting that the corresponding visual features (eigenvectors) are particularly relevant to the cell. The dashed lines indicate the confidence interval for eigenvalues of the control stimulus ensemble, sampled randomly from the random-bar stimuli (see Materials and Methods). The first two eigenvalues of the spike-triggered ensemble were well above the control, indicating significance of the corresponding eigenvectors. Figure 2B shows three eigenvectors, two corresponding to the significant eigenvalues (first and second) and one to a nonsignificant eigenvalue (nth). Although the spatiotemporal structure of the nonsignificant eigenvector appeared to be random, the significant eigenvectors had spatially separate on and off subregions evolving smoothly over time. To further confirm the distinction between these two types of eigenvectors, we compared both their eigenvalues and the correlation in their structures (which is a measure of nonrandomness) for a population of complex cells (n = 60). Figure 2C shows the distributions of the significant and nonsignificant eigenvalues; Figure2D shows the distributions of the correlation of the eigenvectors (legend to Fig. 2). The two types of eigenvectors showed little overlap in both properties, indicating an unambiguous segregation between them.
For most (47 of 60) of the complex cells studied, we found two significant eigenvectors (Fig. 3), corresponding to the two largest eigenvalues. These two eigenvectors exhibited separate on and off spatial subregions (Fig.2B), resembling the receptive fields of simple cells. The relationship between the two vectors was revealed by fitting their spatial profiles at the peak temporal delay (∼40 msec preceding spike) with Gabor functions (Fig.4A). In all cases, the Gabor fits for the two vectors exhibited similar spatial frequencies but a difference of ∼90° in phase (Fig. 4B), reminiscent of the relationship between different subunits in the energy model for complex cells (Movshon et al., 1978; Pollen and Ronner, 1981; Adelson and Bergen, 1985; Heeger, 1991). As explained in detail in Materials and Methods, this phase relationship reflects the response property of complex cells and is not a trivial consequence of the orthogonality between eigenvectors, which is imposed by the method. In a few cases (3 of 60), we found only one significant eigenvector for each complex cell; these vectors also exhibited spatiotemporal profiles resembling simple-cell receptive fields. In the remaining cases, more than two eigenvalues reached significance. However, these additional eigenvectors (corresponding to third, fourth, … , largest eigenvalues) tended to exhibit much less spatiotemporal structure than the first two eigenvectors, and their eigenvalues were much smaller, suggesting less functional importance.
Responses of cortical neurons to individual visual features
The clear segregation between the significant and nonsignificant eigenvalues suggests that the corresponding eigenvectors contribute differently to the cortical responses. To test this idea directly, we measured the responses of each complex cell to individual vectors in both categories. Each vector (a 268 msec movie) was presented at a range of positive and negative contrasts (see Materials and Methods for the definition of contrast), and the peristimulus time histograms (PSTHs) of the cell were measured (Fig.5A). Note that only the last bin (indicated by an arrow) of each PSTH reflects the neuronal response to the complete spatiotemporal visual feature represented by the eigenvector; its amplitude was used to measure the contrast–response function. Figure 5B shows the contrast–response functions of a complex cell for two significant eigenvectors and one nonsignificant eigenvector. For each significant eigenvector, the response increased with the absolute value of the contrast at both positive and negative polarities, consistent with the known polarity invariance of complex cells (Hubel and Wiesel, 1962). The nonsignificant eigenvector, however, evoked no contrast-dependent response. We fitted the left and right sides of each contrast–response function separately with a power function,y(x) =β∣x∣γ , wherex and y represent the vector contrast and the neuronal response, respectively, and β and γ are free parameters. For the significant eigenvectors, the exponent γ was found to be 2.7 ± 0.1 (SEM; n = 34), similar to the exponent of contrast–response functions measured with drifting gratings (Albrecht and Geisler, 1991; Anzai et al., 1999). The ratio between the response at maximal vector contrast and that at zero contrast was 96.6 ± 11.8. Thus, visual features represented by the significant eigenvectors can each drive the cortical neuron effectively in a contrast-dependent manner; they are referred to as relevant features. For the nonsignificant eigenvectors, the ratio between the responses at maximal and zero contrasts was 3.8 ± 0.9 (n = 24), much lower than that for the significant eigenvectors. Thus, visual features represented by the nonsignificant eigenvectors evoked little contrast-dependent response and were therefore termed null features.
Additive interaction between relevant visual features
Each spatiotemporal random-bar pattern can be decomposed into a combination of relevant and null features in the basis set. To understand cortical responses to arbitrary random-bar stimuli, it is necessary to characterize not only the contrast–response functions for individual features (Fig. 5) but also the interaction between features. First, we measured the responses of each complex cell to combinations of relevant features. For each neuron with two significant eigenvectors, we constructed a set of visual stimuli, each of which was a 268 msec movie consisting of a linear combination of the two significant eigenvectors. Figure6A shows the responses of a complex cell at various combinations of the two vectors, which is referred to as the joint contrast–response function (see Materials and Methods). The response increased with the absolute value of contrast of either vector independently of their polarities, consistent with the contrast–response functions measured with individual vectors (Fig.5B). Note that each combination of the two vectors also exhibited spatially separate on and off subregions (small outer plots), with the spatial phase shifting with the relative weights of the two vectors. The approximate circular symmetry of the joint contrast–response function indicates that the response is insensitive to the spatial phase of the stimuli, a well known property of cortical complex cells (Hubel and Wiesel, 1962; Movshon et al., 1978).
The approximate circular symmetry of the joint contrast–response function also suggests additive interaction between the two significant eigenvectors. To examine this idea quantitatively, we predicted the response to each combination of the two vectors by summing the response to each vector at the corresponding contrast. As shown in Figure6B, this prediction reproduced well the overall profile of the measured contrast–response function. Figure6C shows the predicted responses (Fig. 6B) plotted against the measured responses (Fig. 6A) at corresponding contrasts; the correlation coefficient between them was found to be 0.84. To determine whether the difference between the predicted and the measured responses was attributable to systematic errors of the additive model or to the noise in the measured responses, we estimated the upper limit of the correlation coefficient set by noise in the responses. A contrast–response function measured in a single experiment was simulated with a Monte Carlo method, taking into consideration the variability of the measurement (see Materials and Methods); the contrast–response functions simulated in different trials were compared. As shown in Figure 6D, the correlation between the responses simulated in different trials (correlation coefficient, 0.87) was comparable with that between the predicted and measured responses, indicating that the additive model is consistent with the experimental results within the limit set by noise. Figure 6E summarizes the correlation coefficients between the measured and the predicted contrast–response functions for the 13 complex cells analyzed. In 12 of the cells, the correlation coefficient was not significantly different from that between simulated responses (p > 0.05), indicating that the model based on additive interaction provides an adequate description of the cortical responses to combinations of relevant visual features.
Divisive effect of the null features
Certain visual stimuli that do not evoke spiking responses on their own may nevertheless modulate cortical responses to other stimuli. Such nonlinear effects are well known for stimuli at nonpreferred orientations (Bonds, 1989) or nonclassical receptive fields (Allman et al., 1985; Walker et al., 2000) of cortical neurons. Here we tested whether the null features, which evoked little response when presented in isolation (Fig. 5), can modulate the cortical responses evoked by relevant features. The interaction between the relevant and null features was revealed by comparing the responses of each complex cell to relevant features alone (Fig.6A) and to the random-bar stimuli (Fig.1A) that contain both the relevant and the null features. Figure 7A shows the joint contrast–response function of a complex cell for the two relevant features, either in the absence (left) or presence (right) of null features. Although the two contrast–response functions exhibited similar shapes, the amplitude of the response to the random-bar stimuli was much lower (Fig.7B), indicating a suppressive effect of the null features. Similar suppressive effects were observed for all of the cells examined.
The simplest models for this type of suppression are subtractive and divisive, and we evaluated both models in describing the effects of the null features. First, we fitted the contrast–response function for each relevant feature, either in the presence or in the absence of null features, with power functions (Fig. 5B). The average scaling factor of the fit (β) was found to be 0.01 ± 0.002 (n = 52) in the presence of the null features and 0.03 ± 0.005 in the absence of them. The average exponents (γ) were 2.65 ± 0.13 (n = 52) and 2.95 ± 0.13 in the presence and absence of the null features, respectively. Although the null features reduced the scaling by a factor of ∼3 (p < 0.0005; paired t test), they did not change the exponent systematically (p > 0.10). This is consistent with the observation that the null features changed the amplitude but not the shape of the contrast–response functions (Fig. 7A), suggesting a divisive effect. To compare directly the divisive and the subtractive models, we used both models to predict the joint contrast–response function measured in the presence of null features (Fig. 7A, right plot) from the function in the absence of null features (Fig. 7A,left plot). Each model contained a single free parameter (a scaling factor for the divisive model and a subtractive constant for the subtractive model) to ensure the fairness of the comparison. We found that for all of the cells analyzed (n = 13), the divisive model performed significantly better than the subtractive model (p < 0.02), as measured by the correlation coefficient between the predicted and measured responses (Fig. 8). Finally, we also fitted the predicted response based on the subtractive model with power functions. We found that the mean exponent of the fit was 5.60 ± 1.07 (n = 52), significantly larger than that for the measured responses (p < 0.005; pairedt test). Together, these results support a divisive rather than a subtractive model for the suppressive effect of the null features.
DISCUSSION
In the present study, we have found that for each complex cell, visual inputs can be decomposed into two types of visual features, each having a distinct effect on the response of the cell. The two relevant features found for most complex cells resemble the receptive fields of simple cells, with a phase difference of ∼90° in their spatial profiles; their contrast–response functions exhibited contrast polarity invariance and expansive nonlinearity reminiscent of a squaring function. Thus, the additive interaction between these relevant features corresponds closely to the energy model for complex cells (Movshon et al., 1978; Pollen and Ronner, 1981; Adelson and Bergen, 1985; Heeger, 1991). Although the energy model is well known, it has not been tested quantitatively with complex spatiotemporal stimuli in previous studies. The main difficulty in testing this model with complex stimuli comes from the fact that the parameters describing the underlying subunits of the energy model (simple-cell receptive fields) could not be determined easily for each cell. In the present study, relevant visual features were identified with spike-triggered correlation analysis, which allowed us to measure the contribution of each feature to the cortical response directly and to demonstrate the additive interaction between them (Fig. 6). This result is also consistent with the finding that neural networks trained to predict the responses of complex cells to random-bar stimuli contained additive subunits resembling simple cells (Lau et al., 2002).
Divisive interactions have also been used to model the responses of both simple and complex cells (Heeger, 1992; Carandini et al., 1997); they can account for the suppressive effects of visual stimuli at nonpreferred orientations or nonclassical receptive fields of the cortical neurons. Such divisive suppression may reduce the redundancy in information carried by neighboring neurons and enhance the efficiency of coding for natural scenes (Schwartz and Simoncelli, 2001). Here, identification of a small number of relevant features for each cell allows us to specify the additive components in the visual inputs and to predict their contributions to the neuronal response. The number of null features that contribute to the suppression of cortical responses may be considerably larger. A similar spike-triggered analysis technique may be used to identify the null features that contribute maximally to the divisive suppression of the responses (Schwartz et al., 2001).
For sensory neurons with nonlinear stimulus–response relationships, it is often difficult to know a priori what visual stimuli are relevant for probing the response properties (Touryan and Dan, 2001). In the present study, we first isolated relevant features from null features for each cell using spike-triggered correlation analysis of the responses to a large ensemble of random spatiotemporal stimuli. This allowed us to construct new visual stimuli for each cell to measure the contribution of each type of features to the cortical response efficiently. Although this method has been used here to analyze the responses to random-bar stimuli, it is also applicable to studying cortical responses to more complex stimuli that vary in both dimensions of space (although with an increased number of parameters this analysis will require more data). Such a two-step approach may also prove to be useful for understanding the response properties of nonlinear neurons in other cortical areas and other sensory modalities.
Footnotes
This work was supported by National Eye Institute Grant R01 EY12561-01 and Office of Naval Research Grant N00014-00-1-0053. We thank William Bialek, Timothy Kubow, and Gidon Felsen for helpful discussions.
Correspondence should be addressed to Dr. Yang Dan, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720. E-mail: ydan{at}uclink4.berkeley.edu.
B. Lau's present address: Center for Neural Science, New York University, New York, NY 10003.