Abstract
A recent computational theory suggests that visual processing in the retina and the lateral geniculate nucleus (LGN) serves to recode information into an efficient form (Atick and Redlich, 1990). Information theoretic analysis showed that the representation of visual information at the level of the photoreceptors is inefficient, primarily attributable to a high degree of spatial and temporal correlation in natural scenes. It was predicted, therefore, that the retina and the LGN should recode this signal into a decorrelated form or, equivalently, into a signal with a “white” spatial and temporal power spectrum. In the present study, we tested directly the prediction that visual processing at the level of the LGN temporally whitens the natural visual input. We recorded the responses of individual neurons in the LGN of the cat to natural, time-varying images (movies) and, as a control, to white-noise stimuli. Although there is substantial temporal correlation in natural inputs (Dong and Atick, 1995b), we found that the power spectra of LGN responses were essentially white. Between 3 and 15 Hz, the power of the responses had an average variation of only ±10.3%. Thus, the signals that the LGN relays to visual cortex are temporarily decorrelated. Furthermore, the responses of X-cells to natural inputs can be well predicted from their responses to white-noise inputs. We therefore conclude that whitening of natural inputs can be explained largely by the linear filtering properties (Enroth-Cugell and Robson, 1966). Our results suggest that the early visual pathway is well adapted for efficient coding of information in the natural visual environment, in agreement with the prediction of the computational theory.
- lateral geniculate nucleus
- coding
- natural scenes
- information theory
- power spectrum
- efficiency
- visual cortex
In natural environments, visual signals are highly redundant, so the representation of the input by the activity of photoreceptors is inefficient. Efficiency of information coding, however, potentially has significant evolutionary and computational advantages (Atick, 1992). It is thus reasonable to assume that an important task of the early stages of the visual pathway is to recode the incoming visual signals to improve efficiency (Barlow, 1961, 1989;Atick and Redlich, 1990; Atick, 1992).
The primary sources of redundancy in the visual signals at the level of the photoreceptors are the temporal and spatial correlations in natural scenes. The activity of photoreceptors is not independent at different times and between different cells. In other words, much information is represented repetitively over time and by different neurons. To improve efficiency, the neuronal signals must be recoded into a decorrelated form. When transformed into the frequency domain, this decorrelation is expressed as the flattening or “whitening” of the temporal and spatial power spectra of the neuronal signals. Previous studies have shown that the power spectrum of light intensity in the natural visual environment obeys a simple statistical rule: it is proportional to 1/k2, where k is the spatial frequency and, at low spatial frequencies, 1/ω2, where ω is the temporal frequency (Field, 1987; Dong and Atick, 1995a,b).
It has been proposed that the retina and the lateral geniculate nucleus (LGN) are dedicated to recoding and whitening the input signals (Barlow, 1961, 1989; Atick and Redlich, 1990; Atick, 1992). Using information theory (Shannon and Weaver, 1949) to assess the efficiency of information representation, Atick and coworkers performed a series of theoretical studies of retinal and geniculate processing. They derived a theory of retinal processing that successfully explained the spatial and, in the primate, the chromatic receptive fields of retinal ganglion cells for the entire range of adaptation levels. Their only assumptions were that retinal processing serves to spatially whiten natural inputs and that there was a certain level of noise (Atick and Redlich, 1992; Atick et al., 1992). Theoretical analysis oftemporal decorrelation led to an explanation of not only the temporal tuning properties of LGN neurons but also the existence of lagged and nonlagged cells (Dong and Atick, 1995a), which have been observed experimentally in the cat (Mastronarde, 1987; Humphrey and Weller, 1988a,b).
The spatial and temporal response properties of the LGN cells of the cat have been well characterized over the past few decades (So and Shapley, 1981; Dawis et al., 1984; Saul and Humphrey, 1990). It is not certain, however, to what extent the responses of LGN neurons to the simple stimuli used in these studies can predict their function in coding natural visual signals. In particular, nonlinearities such as the contrast gain control (Shapley and Victor, 1978, 1981), rectification at zero spikes per second, and saturation can profoundly alter the responses to stimuli with different statistics. Because the visual system develops—and, of course, evolved—in the natural environment, an important step in understanding its function would be to study the input–output relationship with stimuli that resemble natural scenes. The major difficulty in studying the visual system with natural stimuli resides in the complexity of the input signal and the lack of appropriate methods for characterizing it. To overcome this difficulty, we used a statistical approach to study the visual system with a complex input ensemble. In contrast to the conventional, deterministic approach, in which the properties of neurons are studied by correlating their responses to individual, simple stimuli, the statistical approach characterizes both input and output by measuring their ensemble properties. As demonstrated in our studies, this approach can provide new insights into the function of the visual system and may prove to be an important complement to conventional approaches.
In this experimental investigation, we characterized the statistical properties of LGN neurons in response to natural visual input. We tested directly the hypothesis that the representation of natural visual information at the LGN is temporally decorrelated. Movies of natural scenes were used as visual inputs, responses of single LGN neurons were recorded, and their temporal correlations and power spectra were analyzed. Our results largely confirm the prediction based on the assumption of efficient coding and information–theoretic analysis. Further investigation of the mechanism of recoding indicates that the temporal whitening of natural signals is largely attributable to the linear filtering properties of LGN neurons [see Golomb et al. (1994) for a similar relationship between linear response properties and temporal coding by LGN neurons].
MATERIALS AND METHODS
Physiological preparation
Adult cats ranging in weight from 2 to 3 kg were used in all the experiments. The animals were initially anesthetized with ketamine HCl (10 mg/kg, i.m.), followed by sodium pentothal (20 mg/kg, i.v., supplemented as needed). A local anesthetic (lidocaine) was injected before all incisions. Anesthesia was maintained for the duration of the experiment with sodium pentothal at a dosage of 6 mg/hr.
A tracheostomy was performed for artificial ventilation. Then the cat was transferred to a Horsley–Clarke stereotaxic frame. The cat was suspended by clamping the spinous process of one of the lumbar vertebrae to minimize respiratory movements.
Pupils were dilated with a topical application of 1% atropine sulfate, and the nictitating membranes were retracted with 10% phenylephrine. Eyes were refracted, fitted with appropriate contact lenses, and focused on a tangent screen. The positions of the areae centrales were plotted with the aid of a fundus camera. Eye positions were stabilized mechanically by gluing the sclerae to metal posts attached to the stereotaxic apparatus.
A craniotomy (∼0.5 cm in diameter) was made over the LGN, and the underlying dura was removed. The hole was filled with 3% agar in physiological saline to improve the stability of the recordings.
The animal was paralyzed with Norcuron (0.2 mg/kg/hr, i.v.) and artificially ventilated. Ventilation was adjusted so that the end-expiratory CO2 was near 3.5%. Core body temperature was monitored and maintained at 38°C. The electrocardiogram and electroencephalogram were also monitored continuously.
Electrophysiological recording
Individual LGN neurons were recorded with a single tungsten electrode or a multielectrode array (System Eckhorn, Marburg, Germany) (Eckhorn and Thomas, 1993). The array allows seven fiber electrodes to be positioned independently with a vertical accuracy of 1 μm. We used a glass guide tube to restrict the lateral scattering of the electrodes in the array. The inner diameter at the tip of the guide tube was <400 μm. All recordings were made in layer A or A1 of the LGN.
Recorded signals were amplified, filtered, and passed to an 80486 PC running Datawave Discovery software (Broomfield, CO). The system accepts inputs from up to eight single electrodes. Up to eight different waveforms can be discriminated on a single electrode, but two or three is a more realistic limit. The waveforms of the spikes were saved on disk. The spike discrimination was first done roughly during the experiment. The sorting was carried out more rigorously in postprocessing.
Visual stimulation
The data-acquisition PC contained an AT-Vista graphics card (Truevision, Indianapolis, IN), which was used to present a variety of visual stimuli at a frame rate of 128 Hz. All stimuli were programmed using subroutines from a runtime library, YARL, written by Karl Gegenfurtner. Spatiotemporal white-noise stimuli were generated to map the receptive fields of the neurons. The system is well suited for the efficient real-time production of these stimuli using the m-sequence temporal signal (Sutter, 1987; Reid and Shapley, 1992). Spatially, the white-noise stimuli were made up of 16 × 16 grids of square regions (pixels). The pixel sizes were adjusted to map receptive fields with a reasonable level of detail (0.2–0.4° at 5–10° eccentricity). For every frame of the stimulus, the pixels were either black or white according to the m-sequence. The receptive field maps of the neurons were calculated using the reverse correlation method (Jones and Palmer, 1987). For each delay between stimulus onset and action potential, the average spatial stimulus that preceded each impulse was calculated. This calculation was performed with the fast m-transform (Sutter, 1987). Full-field white noise, in which the whole screen was temporally modulated by a single m-sequence signal, was also used to study the dynamics of some neurons in response to low spatial frequency stimuli.
Drifting gratings of various spatial and temporal frequencies were used to measure the spatial and temporal tuning properties of the neurons. Contrast reversal gratings were used in the null test to make the X/Y classification (Enroth-Cugell and Robson, 1966; Hochstein and Shapley, 1976). Only X-cells were included in the analysis, because this is the type of cells on which the computational theory was based. Few lagged cells were encountered with these electrodes, and none was included in this study.
Video recordings of time-varying natural scenes were used as stimuli to study the statistical properties of the LGN response. It was assumed that all the long sequences of natural, time-varying images have common statistics, i.e., they tend to have the same spatiotemporal power spectra (Field, 1987; Dong and Atick, 1995a,b) regardless of the details of the images. Because we were interested in the coding of natural scenes in general, we chose not to impose any restriction in our selection of movies other than that they were not disproportionally dominated by static scenes. Up to 10 different movies were used. Figure1a shows an image from Casablanca, one of the movies used in the experiments. The power spectra of the LGN responses to different movies were qualitatively very similar, as long as the movies were longer than several minutes. We therefore pooled all the data in the analysis. In some experiments, a videocassette recorder and a television monitor were used to present movies 20–60 min long. In others, movie clips 2–3 min long were presented repetitively over a similar duration with the computer software Media Player. There was a small 15 Hz artifact in these movies, as can be seen from the small secondary peaks in Figure 2a, cells 2 and 3. The AT-Vista board was not used for studying the statistical properties of the LGN responses, because its limited memory precluded the presentation of long movies with appropriate statistics (see below).
For the linear prediction of the response to natural scenes, we presented eight different short movies with the AT-Vista board. These short movies were digitized segments of video recordings. The use of the Vista board in this study was necessary, because the prediction of the instantaneous firing rate signals requires precise spatiotemporal alignment between the receptive fields, which were measured with white noise, and the movie stimuli. Because of the limited memory of the Vista board, each movie was restricted to 16 sec long. Each frame contained 64 × 64 pixels, with a spatial resolution of 0.2°/pixel. To test the linear prediction, short movies were in fact desirable, because multiple repeats were required to assess the reproducibility of the responses. To obtain an “actual” response, each movie was repeated eight times. A post-stimulus time histogram (PSTH) was obtained for the response to each repeat with a bin width of 7.7 msec (the same as the interframe interval of the short movies and the white-noise stimuli). The instantaneous firing rate of the LGN neuron was calculated as the PSTH averaged over all eight repeats or over interleaved repeats—1, 3, 5, 7 or 2, 4, 6, 8.
Data analysis
Calculation of autocorrelation function. The recorded spike train was originally represented as a list of times for the occurrence of spikes with a resolution of 0.1 msec. This list was binned with a bin width of 5 msec to yield a spike-rate signal sampled at 200 Hz. The autocorrelation function of this signal was then computed. So that only the contribution from different spikes was considered, the total number of spikes was subtracted from the central bin of the autocorrelation function.
Calculation of power spectrum. We calculated the two-sided power-spectral density functions of the spike trains by Fourier transforming overlapping segments of data and windowing (Press et al., 1988). The spike trains (20–60 min long) were binned with a bin width of 4 msec and divided into 4 sec segments, with 2 sec overlaps between consecutive segments. For each segment, a Welch window was applied to reduce the spectral leakage caused by the finite duration of the segments (Harris, 1978), and a two-sided power spectrum was calculated using the standard fast Fourier transform procedure, with a frequency resolution of 0.25 Hz and a range from −125 to 125 Hz. Finally, the power spectrum of the whole spike train was obtained by averaging all the data segments.
Linear prediction of responses to natural movies. We predicted the responses of LGN cells to natural visual inputs by performing a linear convolution of the spatiotemporal receptive fields with the luminance signals of the movies, followed by a rectification procedure. The underlying assumption is that the output, which is the firing rate of the LGN neuron, is the result of a rectifying spike-generation mechanism operating on the intracellular potential, which is linearly related to the visual input (Rodieck, 1965;Enroth-Cugell and Robson, 1966; Brodie et al., 1978). Because, of course, we did not record the intracellular signal, its receptive field was calculated from the spike train recorded extracellularly. The receptive field of the intracellular potential is equivalent to the first-order Wiener kernel (Marmarelis and Marmarelis, 1978) calculated from the spike rate multiplied by a factor of 2, assuming a perfect half-wave rectification of the LGN cells in response to white noise. (This is a reasonable first-order assumption, since the resting-state firing rate or the threshold for spike generation is, in general, much lower than the response to the white-noise stimuli with a 100% contrast.) The linear convolution is given by: where R(t) is proportional to the estimated intracellular potential but in units of impulses per second;K(x, y, t′) is the first-order Wiener kernel of the neuron measured in units of impulses per second per unit contrast;S(x, y, t − t′) is the luminance of individual pixels in the movie, normalized so that the mean luminance of the entire movie is 0 and the minimum value is −1; and x andy are the positions of the pixels. The white-noise stimuli used for measuring the receptive fields had the same pixel size and frame rate as those of the movies. These two stimuli were spatially aligned so that the correspondence between the pixels in the receptive field K(x, y, t′) and those in the movies S(x, y, t − t′) could be determined unambiguously.
The intracellular potential R(t) thus estimated had both positive and negative values. The output of the neuron O(t) was predicted by applying a simple rectification procedure, which presumably simulates the spike-generation mechanism: where H is the Heaviside step function defined as: When N is positive, it represents the resting-state firing rate of the cell; when negative, N represents the threshold for spike generation. The value of N was adjusted so that the predicted mean firing rate 〈O(t)〉 over the duration of the movie was equal to the actual mean rate of the same cell.
RESULTS
Responses of LGN neurons to natural scenes and white noise
In the first part of the study, we characterized the statistical properties of the LGN spike trains in response to time-varying natural visual stimuli. A typical image is shown in Figure1a. The position of the stimulation monitor was adjusted so that the receptive field of the LGN neuron fell within the screen. The movies were presented to the cat, and the spike trains of LGN neurons were recorded for 20–60 min to accumulate a minimum of 10,000 spikes. Autocorrelation functions and power spectra of these spike trains were calculated. Figure 2,a and b, shows the autocorrelation functions and the power spectra, respectively, of three LGN neurons in response to movies. Figure 2c summarizes the power spectra of 51 LGN neurons. For 45 cells, the mean firing rate during the period of movie presentation was 13.1 impulses/sec, whereas that in the absence of visual stimuli was 6.0 impulses/sec. Among these, 33 cells showed an increase in mean firing rate by at least 2 impulses/sec during stimulation by movies. A considerable degree of temporal variation of the instantaneous firing rate was observed during the movie presentations, in apparent correspondence to various movie scenes. These observations suggest that the spiking activity of the LGN neurons was significantly modulated by the natural visual input.
The autocorrelation functions of the responses to movies (Fig.2a) showed narrow peaks (centered at 0 msec with half widths of 10–20 msec) and were essentially flat beyond the peak. This indicates that the LGN output was temporally decorrelated. The decorrelation is also revealed by the power spectra of the responses (Fig. 2b,c), which are equivalent to the Fourier transforms of the autocorrelation functions. The spectra were largely flat between 3 and 15 Hz, consistent with the theoretical prediction that the natural visual signals at the level of the LGN are white. Thus the redundancy at the level of the photoreceptors is largely eliminated at the LGN. As discussed below, the deviation from whiteness in the power spectra beyond the range of 3–15 Hz can be accounted for by the finite duration of the neuronal impulse response and the requirement of optimal coding in the presence of noise (Atick and Redlich, 1992).
As a comparison with the temporally decorrelated response to natural scenes, we analyzed the autocorrelations and power spectra of LGN neurons in response to a white-noise input (Sutter, 1987). White noise provides a rich input ensemble, the statistical structure of which differs from that of natural input; therefore, it provides an appropriate control stimulus. Figure 3a shows the autocorrelation functions of the white-noise responses of the same LGN neurons as those shown in Figure 2a. In contrast to the responses to natural input, the autocorrelation functions of the white-noise responses exhibited a dip between 10 and 100 msec. This is reflected in their power spectra, which showed a positive slope between 1 and 10 Hz (Fig. 3b). Figure 3c summarizes the power spectra of 75 LGN neurons in response to full-field white noise. The great majority of these spectra showed a positive slope between 3 and 15 Hz and significantly deviated from whiteness.
To quantify the difference between the power spectra in Figures2c and 3c, each power spectrum was fitted with a quadratic function between 3 and 15 Hz to smooth the data. The average deviation of these smoothed spectra from their midpoint was 10.8% ± 7.3 for the responses to natural stimuli (Fig. 2c) but was 50.7 ± 20.6 for the responses to white noise (Fig. 3c). We presented white-noise stimuli both before and after the movie stimuli and observed a consistent difference between the temporal characteristics of the responses to movies and to white noise. Spatiotemporal white noise (Fig. 1b) and full-field white noise evoked responses with similar power spectra. Thus the LGN cells under study were visually driven, and the power spectra of their responses depended on the nature of the input. As shown below, the LGN responses to white-noise input reflect their temporal-filtering properties, which form the basis of efficient recoding of natural scenes.
Linear prediction of the responses to natural stimuli
To bridge the statistical and the deterministic approaches and to understand the mechanism of recoding at the LGN, we examined whether the temporal whitening of natural visual input can be accounted for by the classical response properties of geniculate cells. It is well known that both retinal and geniculate X-cells behave as approximately linear filters (Enroth-Cugell and Robson, 1966; Hochstein and Shapley, 1976;Derrington and Fuchs, 1979; Dawis et al., 1984), and the temporal-tuning properties of LGN neurons, as reflected by the power spectra of their responses to white noise (see Discussion), are roughly the inverse of the power spectra of natural inputs (Dong and Atick, 1995a,b). It is likely, therefore, that the temporal whitening of natural inputs is largely attributable to the linear filtering properties of X-cells. Figure 4 provides a qualitative demonstration of the sort of arguments used in the theoretical literature. It shows the power spectrum of an X-cell in response to 100% contrast full-field white noise (Fig. 4a), the square of the Fourier transform of its impulse response (Fig. 4b), and the square of its actual temporal tuning function (see legend to Fig.4c). The temporal tuning function was measured with full-field, temporally modulated sinusoidal stimuli at 25% contrast between 0.5 and 15 Hz. All three functions were approximately proportional to ω2, the inverse of the temporal power spectra of natural inputs in the range of low spatial frequencies. It is worth noting, however, that the magnitudes of the response sensitivity measured with these three methods showed a two- to threefold difference. This reflects the existence of nonlinearities such as the contrast gain control (Shapley and Victor, 1978, 1981), rectification, and response saturation. To investigate in more detail the extent to which the linear-filtering properties contribute to the whitening of natural input, we tested whether the responses to natural scenes can be predicted by the linear convolution of the luminance signals of the movies and the spatiotemporal receptive fields of the neurons (Brodie et al., 1978).
The spatiotemporal receptive fields of the cells were measured with white-noise stimuli and the reverse-correlation method. Figure5b shows the time evolution of anon-center/off-surround receptive field between 0 and 116 msec. The magnitudes of center and surround components (the impulse responses) of the receptive field are illustrated in Figure6. Given the spatiotemporal receptive fields, we compared the predicted and the actual responses to eight different short movies, each 16 sec long. Linear convolution of the movie (Fig.5a) and the receptive field (Fig. 5b), followed by a rectification (see Materials and Methods), was used to obtain thepredicted firing rate as a function of time. To measure theactual responses, each movie was presented eight times. The instantaneous firing rate was calculated as the PSTH averaged over multiple repeats. We found that the basic features of the predicted responses closely resemble those of the actual responses. Figure7a shows a 4 sec sample of the predicted response of one LGN neuron to a movie (top trace), its actual response averaged from repeat 1, 3, 5, 7 (middle trace), and that averaged from repeat 2, 4, 6, 8 (bottom trace). The variability of the actual responses measured in different repeats can be appreciated by comparing the middle and the bottom traces in Figure 7a. This was, in general, comparable to the difference between the predicted (top trace) and the actual responses. A more precise comparison was made by plotting the predicted versus the actual response (Fig. 7b) and the actual response averaged from one set of repeats versus that from another (Fig. 7c), all sampled at 128 Hz. Similar correlation was found in both cases, suggesting that the difference between the predicted and the actual responses can be accounted for largely by the intrinsic variability of the neuronal response.
The correlation coefficients between the predicted and the actual responses for 49 cells, each tested with eight movies, are summarized in Figure 8a. The average correlation coefficient between the predicted and the actual responses for the same movies was 0.48 ± 0.11 (SD). This was significantly higher than the average correlation between the predicted and the actual responses for different movies (0.004 ± 0.05, SD), which represents the correlation by chance. The correlation coefficients between the actual responses averaged from interleaved repeats, i.e., repeat 1, 3, 5, 7 and 2, 4, 6, 8, are summarized in Figure 8b, and the correlation between the actual responses from interleaved repeats versus that between the predicted and the actual responses is shown in Fig. 8c for all 49 cells studied. The actual-actual correlation is comparable to but slightly better than the predicted-actual correlation. We believe that this can be accounted for, at least partly, by the fact that the predicted and the actual responses were calculated based on two recordings separated in time and that the condition of the neurons was likely to change over time.
We calculated the power spectra of the predicted and the actual responses evoked by the 16 sec short movies. Figure 9,a and b, shows the power spectra of the predicted and the actual responses, respectively, of one LGN neuron. They agreed quantitatively. These power spectra, however, were not white. This was attributable to the imperfect statistics of the short movies, because the response of the same cell evoked by a long movie exhibited a power spectrum that was white between 3 and 15 Hz (Fig. 9c). Taken together, these results indicate that the responses of LGN cells to natural stimuli can be well predicted from their linear receptive-field properties. Thus the whitening of natural visual signals at the level of the LGN can be largely, if not entirely, explained by the linear filtering properties of the cells.
DISCUSSION
In the present study, we have directly confirmed the prediction that the representation of natural visual information at the level of the LGN is temporally decorrelated, particularly between 3 and 15 Hz. It is important to note that the power spectrum of the LGN activity was white only in response to natural input but not to our control stimulus (white noise). This suggests that white (i.e., random) patterns of activity are not the intrinsic property of LGN neurons. Rather, the early visual pathway has specifically adapted for efficient coding of natural visual information during evolution and/or development.
We would like to point out that the concept of “efficient coding” has been used with a rather specific definition in this paper; it is only one of several mechanisms that may facilitate sensory processing. The temporally decorrelated signal at the LGN is still a faithful, point-to-point and moment-to-moment representation of natural visual input. The improvement of efficiency at this level is independent of the meaning or importance of particular visual scenes. Another useful strategy in sensory processing is to selectively amplify important signals and/or suppress the unimportant ones. This is likely to be achieved at higher levels of the brain and is distinct from the efficient coding discussed here.
The power spectra of LGN neurons in response to natural input deviate significantly from whiteness beyond the range of 3–15 Hz. The failure of whitening below 3 Hz is not surprising, considering the finite duration of the impulse responses of these cells. For a typical LGN cell, the impulse response function has a duration of less than 200 msec. The finite memory of the system limits its ability to selectively attenuate signals below 2–3 Hz. This deficiency, however, may be alleviated at higher levels of the visual pathway, where the neurons tend to integrate visual information over a longer period (Hamilton et al., 1989; Reid et al., 1991). The failure of whitening above 15 Hz may be related to the theoretical finding that whitening at higher frequencies is not advantageous for optimal coding in the presence of noise (Atick and Redlich, 1992). At high frequencies, noise may dominate in the visual input. The attenuation of high frequency signals could therefore serve to avoid amplification of this noise. As a concrete example, it has been demonstrated that the receptive field properties of visual neurons change at different adaptation levels (Shapley and Enroth-Cugell, 1985; Purpura et al., 1988, 1990). This is consistent with the theory of efficient coding, because at low adaptation levels photon noise begins to dominate at higher frequencies.
The temporal tuning of geniculate cells measured in our experiments seemed somewhat different from those reported by several other investigators. This may be explained by the differences in experimental procedures. It is well known that retinal X-cells resemble low-pass temporal filters for low-contrast input and become more bandpass with high-contrast stimuli (Shapley and Victor, 1978; 1981). The use of relatively high-contrast, suprathreshold input in our studies may explain the prominent bandpass temporal tuning that was not observed in some studies using low-contrast stimuli (Lehmkuhle et al., 1980). In addition, different recording electrodes may result in differences in sampling of cells. This may explain why cells recorded in our experiments have, in general, higher cutoff frequencies than those studied by Saul and Humphrey (1990) and Hamamoto et al. (1994). Our electrodes almost certainly sampled larger cells, since very few lagged cells were encountered. It would be interesting to investigate whether the cells that were not well sampled in our current study also serve to temporally whiten natural inputs.
We have shown that the whitening of natural signals is largely attributable to the linear-filtering properties of LGN neurons. The temporal-tuning functions of LGN cells generally show a bandpass behavior: within the range of 3–15 Hz, the response is roughly proportional to the frequency. This tuning property can explain the power spectra of the LGN responses to both natural scenes and white noise. For a linear neuron: This is an approximate description ignoring the spatial dimension. ‖ O(ω) ‖ 2 is the temporal power spectrum of the output, K(ω) is the Fourier transform of the receptive field (first-order Wiener kernel), which is equivalent to the temporal-tuning function of the neuron, and ‖ S(ω) ‖ 2 is the power spectrum of the stimulus. As mentioned above, ‖ K(ω) ‖ 2 ∝ ω2is a good approximation of the temporal tuning functions of LGN neurons within the range of 3–15 Hz. In natural scenes (particularly at low spatial frequencies), ‖ S(ω) ‖ 2 ∝ 1/ω2; therefore, the output is white. For white-noise inputs, however, ‖ S(ω) ‖ 2 ∝ 1; hence, ‖ O(ω) ‖ 2∝ ‖ K(ω) ‖ 2 ∝ ω2.
Finally, lateral interactions and feedback could have resulted in responses to natural scenes that are not quantitatively predictable from the individual receptive fields. The agreement between the predicted and the actual responses to short movies argues that this is not the case. This agreement also establishes the possibility of a firm connection between the statistical and the deterministic approaches to studying sensory neurons.
Footnotes
This research is supported by National Institutes of Health EY05253, EY10115, the Klingenstein Fund. Y.D. is a Schering-Plough Fellow of the Life Sciences Research Foundation. We are grateful to Dr. Torsten Wiesel for his support during all phases of this work. We thank Dr. Robert Shapley for his comments on the earlier versions of the manuscript. Karl Gegenfurtner generously allowed us to use his library of subroutines, YARL, to write programs for our visual stimuli.
Correspondence should be addressed to Dr. R. Clay Reid, Department of Neurobiology, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115.
Drs. Dan and Reid’s current address: Department of Neurobiology, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115.