Abstract
The responses of simple cells in primary visual cortex to sinusoidal gratings can primarily be predicted from their spatial receptive fields, as mapped using spots or bars. Although this quasilinearity is well documented, it is not clear whether it holds for complex natural stimuli. We recorded from simple cells in the primary visual cortex of anesthetized ferrets while stimulating with flashed digitized photographs of natural scenes. We applied standard reverse-correlation methods to quantify the average natural stimulus that invokes a neuronal response. Although these maps cannot be the receptive fields, we find that they still predict the preferred orientation of grating for each cell very well (r = 0.91); they do not predict the spatial-frequency tuning. Using a novel application of the linear reconstruction method called regularized pseudoinverse, we were able to recover high-resolution receptive-field maps from the responses to a relatively small number of natural scenes. These receptive-field maps not only predict the optimum orientation of each cell (r = 0.96) but also the spatial-frequency optimum (r = 0.89); the maps also predict the tuning bandwidths of many cells. Therefore, our first conclusion is that the tuning preferences of the cells are primarily linear and constant across stimulus type. However, when we used these maps to predict the actual responses of the cells to natural scenes, we did find evidence of expansive output nonlinearity and nonlinear influences from outside the classical receptive fields, orientation tuning, and spatial-frequency tuning.
- receptive fields
- visual cortex
- simple cells
- V1
- area 17
- natural scenes
- natural images
- reverse correlation
- linearity
- linear summation
Introduction
Visual systems have evolved to interpret the complex spatiotemporal structure in natural visual stimuli (Srinivasan et al., 1982; van Hateren, 1992; Dan et al., 1996). However, our understanding of neuronal behavior in the mammalian visual system is primarily based on their responses to simple artificial stimuli such as spots of light and sinusoidal gratings. We have little direct knowledge of how visual neurons respond to naturalistic stimuli (Dan et al., 1996; Baddeley et al., 1997; Gallant et al., 1998). Hubel and Wiesel (1959) described “simple cells” in primary visual cortex (V1), the receptive fields (RFs) of which, when mapped with spots of light, predicted the orientation, width, and position of the optimal stimulus. Quantitative studies confirm that, to a first approximation, the responses of simple cells to one set of simple spatial stimuli can be used in a linear model to predict the selectivity to another set (Movshon et al., 1978; Jones and Palmer, 1987b; DeAngelis et al., 1993). If the integration of stimuli by simple cells really was linear, then responses to natural scenes should be predictable from their responses to simple stimuli (cf. Creutzfeldt and Northdurft, 1978).
However, numerous studies have demonstrated that the responses of simple cells, even to artificial stimuli, are not perfectly linear. The rate of action potential production is a nonlinear function of any underlying linear spatiotemporal stimulus integration (Carandini and Ferster, 2000) and, therefore, linear predictions based on action potential counts often fail (Tolhurst and Dean, 1987; Albrecht and Geisler, 1991; Heeger, 1992; DeAngelis et al., 1993; Tolhurst and Heeger, 1997b; Lampl et al., 2001). Moreover, there are more profound nonlinear behaviors. In particular, the presence of a second stimulus, to which the cell would not normally respond [e.g., a stimulus outside the classical receptive field (CRF)], can modulate the responses of a cell to its preferred stimulus (Blakemore and Tobin 1972; Nelson and Frost, 1985; Bonds 1989; Knierim and Van Essen, 1992; Walker et al., 1999; Kapadia et al., 2000). It has even been reported that the classical orientation tuning of a cell depends on the context in which stimuli are presented (Gilbert and Wiesel, 1990; Shevelev et al., 1994; Sillito et al., 1995).
Natural scenes are spatially extensive and contain features at many orientations, widths, and positions (Ruderman, 1994); thus, nonlinear contextual influences could be particularly evident in simple-cell responses to natural scenes (Rao and Ballard, 1999), contributing to efficient coding of information in the scenes (Vinje and Gallant, 2000). The responses to simplistic stimuli may not allow the prediction of the responses to natural scenes. Consequently, we directly examine how simple cells in ferret V1 respond to natural scenes and ask whether these responses are compatible with the responses to simpler stimuli (sinusoidal gratings). Theunissen et al. (2001) and Ringach et al. (2002) have independently investigated this problem, but we describe a novel application of an analytical method for recovering high-resolution receptive-field maps from the responses to natural scenes, allowing detailed and critical comparison with tuning of the cell to other stimuli.
Materials and Methods
Recordings. Extracellular recordings of action potentials were made from single neurons in V1 (area 17) of 12 anesthetized ferrets using tungsten in-glass microelectrodes (Merrill and Ainsworth, 1972). Surgery was performed on adult pigmented ferrets under intramuscular anesthesia (2 ml/kg -1) followed by intravenous injections of Saffan (0.3% alphadolone acetate and 0.9% alphaxalone). During recordings, anesthesia was maintained by artificial respiration with 0.5–1.5% Halothane in a mixture of 75% N2O and 25% O2. End-tidal CO2 concentration was maintained near 4% by an adjustment of respiration rate and stroke volume; rectal temperature was maintained at 37.5°C. The animals were paralyzed by intravenous infusion of gallamine triethiodide (10 mg · kg -1 · hr -1) in a vehicle of saline with 4% glucose at 2.6 ml/hr -1, and the adequacy of anesthesia was assessed from inspection of the heart rate and the waveform of the EEG [full experimental details are given in Baker et al. (1998)]. At the end of the experiment, each animal was given a barbiturate overdose, perfused through the heart with PBS, and then perfused with 4% buffered paraformaldehyde to fix the brain for later histological verification of the recording sites (electrolytic lesions were made on termination of electrode penetrations). All procedures were approved under license from the United Kingdom Home Office.
The pupils were dilated and the accommodation was paralyzed by topical application of homatropine (1% w/v) to the eyes which were then protected with clear zero-power contact lenses. The small eye of the ferret has a large depth of focus (cf. Green et al., 1980), and auxiliary lenses were not considered necessary to focus the eyes on the stimulus display (Price and Morgan, 1987; Baker et al., 1998). The receptive fields of the neurons were generally within a few degrees of the area centralis, and visual stimulation was applied through the eye contralateral to the recorded cortex while the ipsilateral eye was covered.
Visual stimulation. Monochrome visual stimuli were presented on cathode ray tube monitors under the control of a visual stimulus generator 2/4 graphics card (Cambridge Research Systems, Cambridge, UK); this had a pseudo-15 bit analog output allowing precise control of the luminance of each pixel on the display and correction for expansive luminance nonlinearities. Stimuli could be presented with 256 linearly spaced gray levels. Two different display monitors were used. Initially, we used an Eizo Flexscan T562-T monitor with a viewable area that was 28.5 cm wide × 21.5 cm high, viewed from a distance of 57 cm (or sometimes 28.5 cm) so that it subtended 28.5 × 21.5° of visual angle. Stimuli were presented as 800 × 600 pixels, so that each pixel subtended 0.036° of arc [equivalent to a maximum resolvable spatial frequency of 13.8 cycles per degree (cpd)]. This monitor had a mean luminance of 36 cd/m -2 and a frame rate of 100 Hz. In later experiments, a Sony (Tokyo, Japan) GDM-500PST monitor measured 39.6 cm wide × 29.7 cm high; it was viewed from a distance of 28.5 cm, giving a viewing angle of 79.2 × 59.4°. Again, the display had 800 × 600 pixels, so that each pixel subtended 0.099° (equivalent to a maximum resolvable spatial frequency of 5.1 cpd). This monitor had a mean luminance of 54 cd/m -2 and a frame rate of 160 Hz.
We recorded from 148 single neurons with a battery of tests using both moving sinusoidal gratings and sequences of flashed natural scenes. Although 42 cells were classified as simple, we present results from only 25 cells. Seventeen cells were discarded from the analysis: seven cells for responding inconsistently with high response variability across repeats and with receptive-field reconstructions showing no spatial features and 10 cells for responding very sparsely, with receptive-field reconstructions dominated by individual images. The cells were classed as simple because their receptive fields had separate parallel ON and OFF regions (Hubel and Wiesel, 1959), and because their responses to moving gratings were highly modulated in time with the movement of the bars (relative modulation, >1.4) [Movshon et al., 1978; Dean and Tolhurst 1983; Skottun et al., 1991; but see Mechler and Ringach (2002) for a re-examination of cell classification]. OFF is shorthand for the parts of the receptive field in which the presentation of a dark spot of light would cause excitation and where a bright spot of light would be expected to cause inhibition during its presentation and possibly a rebound burst of action potentials at its offset (Hirsch et al., 1998). The orientation tuning and spatial-frequency tuning of each cell were determined with moving sinusoidal gratings of Michelson contrast 0.7. Gratings of up to 16 different orientations and/or directions of movement were presented at a near-optimal spatial frequency. At least 20–30 cycles of the grating (moving at 1–2 Hz) were presented. Next, up to 12 different spatial frequencies was presented at the optimal orientation; again, at least 20–30 cycles of each grating were presented. Modified Gaussian curves were fitted to the graphs of the average firing rate against orientation and spatial frequency to determine the optimal orientation and frequency and the bandwidths at half height of the two tuning curves (Baker et al., 1998). The Gaussians were modified to have different spreads above and below the maximum of the function.
The responses of each simple cell were then determined for monochrome photographs that had been digitized and linearized to >1000 gray levels. Some of these were pictures of animals, people, flowers, trees, and landscapes (Tolhurst et al., 1992), and we also included pictures of ferrets and the “ferret's-eye” view of terrain. A sequence of ≥5000 flashed presentations was presented. Each static picture was flashed on for 100 msec, and after it was removed, the display screen was held at a spatially uniform gray (36 or 54 cd/m -2, depending on the monitor) for 170 msec before the next picture was presented (see Fig. 1 A). The digitized pictures were scaled to have 256 equally spaced luminance steps; the brightest pixel in each picture had twice the luminance of the blank display, whereas the darkest had a nominal luminance of zero. The space-averaged mean luminance of most pictures was less than the luminance of the blank display. Although the changes in overall luminance and contrast between stimuli may invoke subtle response nonlinearities, the task of normalizing natural stimuli actually requires previous knowledge of the receptive field, the very property we are trying to infer (Tolhurst and Tadmor, 1997). We feel it is important to stimulate the system with natural stimuli that are likely to include changes in luminance and contrast. The long flash presentation and 170 msec blank interval allowed us to distinguish clearly between the offset response to one picture and the onset response to the next; although in three simple cells, there was no blank interval between pictures.
The displayed pictures each comprised 150 × 150 pixels, and fragments were drawn at random from a set of 128 larger pictures, each measuring 256 × 256 pixels. The display “zoomed” the displayed fragments by a factor of four to match the 600 pixel screen height. Thus, each effective pixel (0.14 or 0.4°, depending on the monitor) was four times greater than the screen resolution but was still generally smaller than would be resolved by ferret visual neurons, which rarely respond to spatial frequencies as high as 1 cpd -1 (Price and Morgan, 1987; Baker et al., 1998). The pictures measured 21.5 or 59.4° square, compared with a typical “minimum response field” (Barlow et al., 1967) size of ∼4–15° square. For some experiments, the 5000 pictures in the sequence were all different fragments that were cut from 128 digitized photographs. More often, there was one set of just 500 different fragments, and this set was presented ≥10 times, with the 500 fragments presented in a different random order each time.
Although each picture in our experiments was represented as 150 × 150 effective pixels (each of which occupied 4 × 4 screen pixels), the sizes of the arrays were reduced to 50 × 50 to ensure that the computational algorithm was tractable on a personal computer. Usually, this was achieved by averaging the luminance values in groups of 3 × 3 effective pixels. However, in some cases, we took a 100 × 100 region of interest and averaged groups of 2 × 2 pixels or, for very small receptive fields, we took a 50 × 50 region of interest for subsequent analysis.
Response analysis. Most of the simple cells responded sparsely to the sequence of flashed natural scenes (i.e., each simple cell responded to relatively few of the pictures presented but responded reliably to repeated presentations of those few effective pictures) (see Fig. 1 B). Furthermore, as might be expected from cells that show quasilinear spatiotemporal summation, the 25 simple cells responded either at the onset of a particular picture flash or at its offset; they did not give both onset and offset responses to any single picture (see Fig. 1 B).
For each picture, we counted the number of action potentials in a 100 msec interval starting 30–50 msec after the onset or the offset of the stimulus. Offset responses were counted as negative, because the offset of a picture might be the same as the onset of its contrast-reversed image, and we presume that an offset response would have followed inhibition during the 100 msec that the picture was present (Hirsch et al., 1998). In experiments in which the picture fragments were each presented ≥10 times, the individual responses were averaged. Poststimulus time histograms of the responses to ≥5000 presentations were generated to extract the latency of the visual response and hence the position of the 100 msec window. It also identified those cells that reliably showed both onset and offset responses to different pictures (see Fig. 1 B). Most of the cells presented in this study had very little, if any, spontaneous activity. Moreover, our method of subtracting offset responses from onset responses should cancel on average any background activity that is underlying the neuronal response train.
For the few cells (n = 4) that gave only onset (positive) or offset (negative) responses, those stimuli that failed to give any response are ambiguous. If the cells have no spontaneous activity, it is not clear whether an absence of action potentials implies zero response or hidden inhibition (Movshon et al., 1978; Tolhurst and Dean, 1987). For these cells, the many responses of zero were deemed to be ambiguous and were discarded from subsequent analyses. However, in those cells (the majority) in which both clear onset and offset responses were seen, all responses (positive, negative, and zero) to all pictures were used in subsequent analyses, providing a much greater number of constraints on the receptive-field reconstructions. Including a blank interval between picture presentations is one of the strengths of our experimental design.
Gabor models of receptive-field structure. The field of computational visual neuroscience has traditionally used the Gabor function as a realistic model of receptive-field structure of simple cells in V1 (Marcelja, 1980; Jones and Palmer, 1987a; Ringach, 2002). The Gabor model uses a Gaussian-windowed sinusoidal pattern, thereby fitting comfortably with the use of sinusoidal gratings as the stimulus pattern of choice. Given our experimental protocol, one plausible method of RF estimation would be to fit a Gabor function that predicts the responses of a real neuron to a given set of natural stimuli. We achieved this by applying a simulated annealing algorithm (Press et al., 1992), combined with evolutionary computation methods, to fit a seven parameter Gabor function (Marcelja, 1980) that maximizes the linear correlation between the actual and predicted responses to the natural stimuli.
However, the whole point of using natural stimuli to reconstruct receptive fields is to avoid making these “single sinusoid” assumptions about the stimulus tuning of a cell. In theory, we do not know the structure of simple-cell receptive fields to natural stimuli and therefore need a reasonably nonparametric approach. We present an approach below that evolved from reverse-correlation methods that we feel satisfy this criterion.
Receptive-field estimation. If a simple cell were to summate the influences within its receptive field linearly, then the scalar response, r, to a two-dimensional stimulus, s, would be given by the dot product of s and the spatial weights within the two-dimensional receptive field, f, as follows: 1 We presented 500 stimuli (≥10 times each) or ≥5000 stimuli (once each). The set of scalar responses, r, to all of the stimuli, S, can be written more conveniently as a matrix equation as follows: 2 where r and f are both column vectors and S is a matrix in which each row represents one stimulus. We wanted to obtain an estimate, f̂, of the receptive-field structure; initially it seems that this can be done simply by rearranging Equation 2 as follows: 3 where S-1 is the matrix inverse of S. However, this is problematic because the matrix inverse only exists under certain conditions (i.e., when S is square and its rows are linearly independent). These conditions can be met by some stimulus sets (e.g., white-noise stimuli) but not by a randomly chosen set of natural-scene stimuli; the set would typically not be linearly independent because of pixel-to-pixel correlation within scenes, as demonstrated by Field (1987).
Smyth et al. (2000) and Ringach et al. (2002) used iterative methods to find least-squares solutions to Equation 2. These methods provide accurate receptive-field estimates when large numbers of stimuli are presented. However, under circumstances (such as in this study) where the number of available stimulus–response pairs is limited and the responses, r, are subject to response variability, Equation 2 is likely to be underdetermined [fewer equations than unknowns (i.e., fewer stimuli than pixels)] and inconsistent. As a result, any least-squares solution is likely to contain a large amount of high-frequency noise that reflects overfitting of the variable neuronal responses.
Reverse correlation. In the special case in which the stimuli, S, are orthogonal (as well as being linearly independent), inversion of S is straightforward, because the inverse, S-1, is the same as the transpose, ST, which is obtained simply by swapping the rows and columns of S. This is the case when receptive fields are mapped with two-dimensional patterns of random black and white dots or with white noise (Reid et al., 1997). An estimate of the receptive field can then be obtained by reverse correlation; the white noise patterns are simply added together and weighted by the number of action potentials evoked by each pattern as follows: 4 This is the response-weighted average of the stimuli presented. It is tempting to apply the same response weighting to the digitized pictures of our stimuli (i.e., to add the pictures in the set, weighted by the response evoked by each one). However, the lack of orthogonality in the stimulus set means that the transpose of S is not the same as the inverse, and the procedure gives a receptive-field estimate that is biased (Smyth et al., 1999; Theunissen et al., 2001). It is possible to remove the bias from the receptive-field estimate by removing the pixel-to-pixel correlation from the stimulus set as follows: 5 where CS is the pixel-to-pixel cross-correlation matrix of S. This is achieved by dividing the Fourier transform of the response-weighted average by the average of the power spectra of the pictures in S (Theunissen et al., 2001; Willmore, 2002).
Regularized pseudoinverse. Alternatively, a least-squares solution to Equation 2 can be found using singular value decomposition. This provides a pseudoinverse, which is an approximation to S-1. However, for small numbers of stimulus presentations, this is still likely to produce a receptive-field estimate, f̂, which is corrupted by high-frequency noise as a result of overfitting.
To avoid this problem, we propose a method for obtaining a high-resolution estimate of the receptive field, f̂, from relatively few “noisy” responses, r, to a set of pictures, S. We use a regularized pseudoinverse (Press et al., 1992) in which any ambiguities and inconsistencies may be resolved by applying some a priori constraints on the solution. A very simple and plausible constraint is to assume that the sensitivity within the receptive field changes continuously and smoothly with the position. That is, the Laplacian of the field (Ls = ▿2f̂), will be close to zero at all points within the field. We approximate the two-dimensional Laplacian of a receptive field using a 3 × 3 pixel element (which is previously zero) as follows: 6 We construct a matrix L of 2500 such “Mexican-hats,” each embedded in a matrix of 50 × 50 zeroes, and one for each of the 50 × 50 locations in the estimated receptive field f̂. The a priori constraint is that the receptive field should be smooth at each point (i.e., that the dot product of the Laplacian at each point in the receptive field should produce a response of zero) as follows: 7 Thus, we have two sets of equations imposing constraints on the receptive-field reconstruction (Eqs. 2 and 7), and these can be combined to form the following single equation that demands solution: 8 where λ is a scalar “regularization parameter.” The solution of f is now overdetermined, because the number of Laplacian constraints is equal to the number of pixels, and the inconsistencies are resolved using singular value decomposition to find a least-squares solution (Press et al., 1992). The parameter λ determines the relative weight to be given to the a priori (smoothness) and a posteriori constraints (actual responses to pictures) where they conflict. Simulation shows that the value of λ needed for a good solution depends on many factors, such as the number of pictures that evoked a response, the magnitudes of those responses, and the variability of response (Willmore, 2002). It is also (arbitrarily) affected by the fact that the Laplacian (Eq. 6) ranges in value from -1 to 4 and has a total sum-of-squares of 20, whereas each picture ranges from 0 to 255 and has a total sum-of-squares of >107. The effects of changing the value of λ are illustrated in Figures 4 and 5, along with a method for choosing a near-optimal value.
Fourier spectral analysis of receptive-field maps. We applied a two-dimensional Fourier transform to each of our receptive-field map estimates to determine the dominant orientation and spatial frequency. First, each map was windowed using the Welch formula to avoid edge effects (Press et al., 1992). To increase resolution, the 50 × 50 element receptive-field map was embedded in a 500 × 500 array of zeroes before the Fourier transform. In most cases, the spectrum contained one major localized feature, and the dominant orientation and spatial frequency were taken as those of the coefficient with greatest magnitude. We fitted single Gaussians centered on the best orientation and spatial frequency separately to estimate the two bandwidths. In some cases, the spectra were not “clean” enough to make such fitting worthwhile.
Evaluation of correlation coefficients between actual and predicted responses to natural scenes. We compare the magnitudes of the measured responses to natural scenes with the values predicted by the estimated receptive field. Graphs measured against a predicted response (see Fig. 7) show considerable scatter, and the correlation coefficients are not always high. We tried to determine how much of the scatter is attributable to the failure of the receptive-field model and how much is attributable to the inherent response variability of simple cells (Tolhurst et al., 1981, 1983; Vogels et al., 1989; Geisler and Albrecht, 1997). For comparison with real experimental data, we simulated the effects of response variability on the expected correlation coefficients. First, a theoretical noise-free response of each neuron to each picture fragment was calculated as the dot product of the estimated receptive field with the picture. The noise-free estimate was then taken as the parameter of a Poisson distribution. For each picture presentation, one instance was chosen from the Poisson distribution to represent the actual noisy response. For pictures that were repeated ≥10 times, ≥10 simulated responses were averaged. The noisy predicted responses were plotted against the noise-free theoretical prediction, and the correlation coefficient was calculated to show the highest coefficient that could reasonably be expected from each neuron.
Results
We examined the responses of 25 simple cells in ferret V1 to 100 msec of flashed presentations of digitized monochrome photographs of natural scenes, including pictures of ferrets and terrain seen from a ferret's viewpoint. Typically, each cell responded to a relatively small proportion (sometimes <5%) of the pictures presented, as found by previous studies (Baddeley et al., 1997; Gallant et al., 1998; Vinje and Gallant, 2000) and as expected from modeling of Gabor-like simple-cell receptive fields (Field, 1994; Willmore and Tolhurst, 2001). The cells responded either at the onset or offset of effective pictures but not both. When the same pictures were presented repeatedly, the cells responded consistently, but the responses were subject to variability (Fig. 1B). From the responses to the pictures, we attempted to deduce the spatial receptive-field structure of the cell and determine whether the responses of the cells to these spatially complex natural scenes were consistent with their responses to simple sinusoidal gratings.
Reverse correlation
A powerful method for determining receptive-field structure is to determine the responses to a large number of presentations of different spatial noise patterns (Marmarelis and Marmarelis, 1978; Reid et al., 1997). The receptive field is then reconstructed by reverse correlation; the stimulus patterns are added together and weighted in proportion to the response that each evoked (Eq. 4). Figure 2 shows some examples when the same procedure is performed, by analogy, on our results: the stimulus pictures were added together and weighted in proportion to the responses that each evoked. Reverse-correlation maps are shown for five simple cells in the top row. They are represented as gray levels; bright areas suggest excitatory (ON) receptive-field regions, whereas dark areas suggest inhibitory (OFF) receptive-field regions. Other maps are shown in Figs. 4 and 6.
In four of the examples (Fig. 2A–D), the reverse correlation (or response-weighted average) of the pictures shows a relatively distinct feature consisting of ON and OFF regions. The features show an obvious elongation and orientation, as would be expected if these features represented the elongated orientation specific receptive fields of the simple cells. The second row shows the two-dimensional Fourier amplitude spectra of the reverse-correlation maps as gray-level representations of the magnitudes of the Fourier coefficients (see legend to Fig. 2). For Figure 2A–C, the spectra show a primary pair of features reflected about the origin. These localized spectral features seem to reflect the limited orientation and spatial-frequency response of simple cells (cf. De Valois et al., 1982; Jones et al., 1987). These maps and their spectra show that the cells responded primarily to features in the pictures that had a specific location, specific orientation, and specific bright-dark polarity. In other maps, it is possible to discern some specific elements of just a few of the pictures, as if the reverse correlation has been dominated by the responses to these very few pictures (see Fig. 6C,D).
The result for the cell shown in Figure 2E is different. There is no sign of any discrete excitatory or inhibitory features in the top reverse-correlation spatial map, and the Fourier spectrum is diffuse and featureless. There is no clue in these data as to which features in the pictures may have evoked responses from this cell. The cell had all of the properties of a typical simple cell (see Materials and Methods) in its responses to sinusoidal gratings, and its responses to natural images were selective and reliable. We have no explanation for the failure of reverse correlation to suggest any features in this case. The reverse-correlation map is a very coarse indicator of the linearity of spatial summation; any cell producing no reverse-correlation map must have grossly nonlinear behavior and would not be susceptible to our basically linear method. One other cell similarly failed to give a reverse-correlation map; both of these cells were excluded from additional analysis. Although these cells may prove to be the most interesting to study, by providing a behavioral difference between artificial and natural stimuli, we leave such investigations to additional studies using more advanced experimental and analytical tools.
Thus, in most cases, reverse-correlation maps resemble the oriented receptive fields of simple cells, and the Fourier spectra of the maps resemble the very restricted response spectra of simple cells. The question therefore arises as to how similar these reverse-correlation parameters are to those of the actual cells. The circles in Figure 3A show the responses of the simple cell in Figure 2B to gratings of different orientations; the cell responded to a narrow range of orientations just off horizontal (180°). The dotted curve shows the orientation tuning derived from the reverse-correlation map: the magnitude of the spectrum along a circle drawn through the coefficient with the largest magnitude. The curve peaks within 15° of the grating orientation that evoked the largest response from the cell. However, the bandwidth of the dotted curve is considerably greater than the orientation tuning curve of the cell. Figure 3B plots the orientation of the dominant coefficient in the spectrum of the reverse-correlation map against the optimal orientation of sinusoidal grating for the 23 simple cells for which a reverse-correlation map was obtained. With few exceptions, the dominant orientation of the reverse-correlation map is very close to the preferred grating orientation of the cell, and the least-squares regression (solid line; r = 0.91; n = 23) is very close to the line of equality (dashed line).
The circles in Figure 3D show the responses of the simple cell in Figure 2B to sinusoidal gratings of different spatial frequencies. The dotted curve shows the spatial-frequency bandpass of the reverse-correlation map: the magnitudes of the coefficients in the spectrum along a radius drawn through the coefficient with the greatest magnitude. The dotted curve peaks more than an octave below the preferred spatial frequency of the cell, and its bandwidth (in log units or octaves at half height) is much broader than the actual tuning curve of the cell. Figure 3E plots the dominant spatial frequency in the spectrum of the reverse-correlation map against the preferred spatial frequency of sinusoidal grating for the 23 simple cells. The dominant frequency in the reverse-correlation map is consistently lower than the true preferred grating frequency. The regression (solid line; r = 0.64; n = 23) lies 0.5 log units (1.6 octaves) below the line of equality (dashed line).
Figure 3, C and F, plots the orientation and spatial-frequency bandwidths of the two-dimensional Fourier spectra of the reverse-correlation maps against the bandwidths that were actually measured for gratings, for those cells in which the two-dimensional Fourier transform contained a well defined dominant feature that could be fitted (such as those in Fig. 2A–D). As for the single-cell example in Figure 3A, both the orientation and frequency bandwidths of the reverse-correlation maps are substantially broader than those actually measured with gratings.
Thus, simple reverse correlation reveals the preferred orientations of the cells (cf. Smyth et al., 1999; Theunissen et al., 2001; Ringach et al., 2002) but systematically underestimates the preferred spatial frequencies of the cells and overestimates their tuning bandwidths.
Receptive-field reconstruction with regularized pseudoinverse
In the experiments, we presented a set of picture stimuli to a neuron and recorded a set of noisy responses. We used a regularized pseudoinverse method (see Materials and Methods) to estimate the receptive-field map that describes the linear portion of each response of the cells. This incorporates a simple a priori constraint: the sensitivity of the receptive-field estimate should change smoothly as a function of two-dimensional location within the receptive field (Eq. 8). This constraint reduces the noise in the receptive-field estimates, increasing the spatial resolution possible from the limited number of stimulus presentations.
Figure 4 compares receptive-field reconstructions and their two-dimensional Fourier spectra performed with reversed correlation and the regularized pseudoinverse methods. Fields are shown for three cells and five different values of the regularization parameter λ. This parameter balances the constraints of response prediction and smooth receptive fields. For most values of λ, the major features in the pseudoinverse spectra are at the same orientation as those in the reverse-correlation spectra, but they are farther from the central origin (i.e., the dominant frequencies are higher in the pseudoinverse maps because the ON and OFF subregions are smaller and tighter together).
For the cell in Figure 4A (f31205), the regularized pseudoinverse has produced a receptive-field map with distinctly localized parallel ON and OFF regions, and the Fourier spectrum shows a clear, highly localized feature. This is the case for most values of λ illustrated. When λ is low (Fig. 4, column 1 of the Reginv maps), there is obvious spatial noise obscuring the field, and this is reflected as a diffuse pattern in the Fourier spectrum. Here, the solution to the pseudoinverse is dominated by the actual (noisy) responses to the pictures; the system of equations is underdetermined and badly affected by the inconsistencies resulting from response variability and probably from nonlinearities of spatial summation. When λ is high (Fig. 4, columns 2–4 of the Reginv maps), the smoothness constraint becomes more dominant so that the field seems larger and more blurred. Similar behavior is seen with the cell in Figure 4B (f32610). For the cell in Figure 4C (f39101), localized and credible fields are again seen in the reconstructions, although these tend to be badly obscured by noise at low λ values, and the spectral features are less obviously highly localized. At the three second-to-highest λ values, the fields have consistent orientation, and localized spectral features emerge at fixed orientations. If λ is increased much higher (Fig. 4, column 5 of the Reginv maps), then the smoothness constraint dominates too much and the reconstructions become homogenous blobs of only one polarity, obviously bearing little relation to real receptive fields.
It can be seen from the receptive-field maps and the orientation of the paired localized features in the spectra that the dominant orientations of the receptive fields are primarily unaffected by changes in λ, and this confirms that it is relatively easy to extract the orientation of the receptive field from the responses to natural scenes. However, the dominant spatial frequency in the reconstructed fields does depend on λ. Because the parameter λ controls the level of spatial smoothing in the map, the most sensitive tuning measure is the peak spatial frequency. Other measures, such as orientation preference, orientation half-width, and spatial-frequency bandwidth are less sensitive. This dependency is shown for the same three cells in Figure 5 in which the dominant spatial frequency in the reconstructed field is plotted against the value of λ. The dashed lines show the optimal spatial frequency determined from the responses to moving sinusoidal gratings and, for comparison, the dotted lines show the rather lower estimates of optimal spatial frequency derived from reverse-correlation maps. In general, the dominant spatial frequency falls as λ increases, so that there is a choice as to which field we should take as the solution to the problem of deducing the field from the responses to natural scenes. However, in Figure 5, A and B, there is a range of λ in which the dominant spatial frequency changes little (between 104 and 105 for A and 103.1 and 105 for B). We have accepted the best field reconstructions as those with λ values in the midpoints of such plateaus. This midpoint is a compromise between the noisier spatial maps at lower values of λ and weaker response predictability at higher λ values. In most cells, the eye could identify a plateau, but we acknowledge that there is a subjective component to this approach. In contrast, the cell in Figure 5C shows no convincing plateau region from which we could choose the receptive-field reconstruction. It also responded optimally to a particularly high spatial frequency of sinusoidal grating, and our failure to find a convincing field may be because the pixelation in the stimulus pictures was too coarse.
Figure 6 illustrates the receptive-field reconstructions and their two-dimensional Fourier spectra for an additional six simple cells. These show that the fields were not always a clean set of parallel ON and OFF regions, and that the spectra did not always consist of a single reflected pair of discrete features. In some cells, the regularized pseudoinverse produced a spectrum with two features representing orientations at right angles (Fig. 6B), and sometimes the spectrum was quite diffuse (Fig. 6C–F), although a dominant feature could usually be discerned. Although the Gabor model is traditionally used to describe simple-cell receptive fields, it is important to point out that only Figure 6A fits this description. The other examples do not fit such a model, because they are either spatially diffuse or their spectrum implies energy at more than one dominant orientation.
The examples in Figure 6, C and D, are interesting in that the cells responded very sparsely, so that the reverse-correlation maps show clear “ghosts” of single pictures (C shows a child's face). The cell in Figure 6C evoked only 231 action potentials in the entire experiment of 10 repetitions of 500 pictures; 25% of those action potentials were in response to only 3 of the 500 pictures, and an additional 25% of the action potentials were in response to only 10 additional pictures (Fig. 7D). The regularized pseudoinverse has produced credible fields without ghosts from the same response data. The reverse correlation is based on only the very few pictures in which a response was actually generated, whereas the pseudoinverse must also account for why so many (>400) pictures failed to evoke a response. It is also worth pointing out that although some cells appeared to produce their best RF reconstructions with repeated stimuli, others needed 5000 different stimuli. Although this lack of conformity is undesirable, it can be understood in terms of the compromise between covering a sufficient subspace of natural scenes to trigger all relevant properties of the RF and averaging response variability across repeats to avoid the effects of misleading noisy responses on the reconstruction process.
Of the 25 simple cells that we recorded, two were not considered because their reverse-correlation maps had shown no sign of receptive-field structure (Fig. 2E), suggesting their responses were strongly nonlinear. Two additional cells did not produce regularized pseudoinverse maps, although they had produced reverse-correlation maps. For the remaining 21 cells, the dominant orientation in the regularized pseudoinverse reconstruction was robust over a very wide range of λ values. Eighteen of these cells showed a convincing plateau in the graph of dominant spatial frequency plotted against λ (Fig. 5A); the spatial-frequency tuning of the other three could not be evaluated. For many of the 21 orientation cells and 18 spatial-frequency cells, it was possible to estimate the bandwidths of the features dominating the Fourier spectra of the pseudoinverse maps (see Materials and Methods).
Evaluation of the reconstructed receptive fields
Responses to pictures
Receptive-field reconstructions can be used to predict the relative response to any given picture, and this can then be compared with the actual neuronal response for that picture. For four of the simple cells illustrated in earlier figures, Figure 7 shows how well the best regularized pseudoinverse receptive-field reconstructions account for the responses to the pictures. Note that the value of λ chosen for the best reconstruction was determined from the examination of the relative invariance of their Fourier spectra and not from a desire to provide a good fit to the actual picture responses. Indeed, we would expect better fits in the latter respect from the lowest λ values. For the four cells, the reconstructed fields provide convincing predictions of the responses to individual pictures. There are few data that lie away from the main diagonals. The correlation coefficients are high (Table 1, column 4).
The excellent fits may, at first sight, suggest that the simple cell responses to natural scenes have been determined only by linear summation processes. However, the predicted responses do show a nonlinear relationship to the actual responses. This is particularly clear in Figure 7, B and C, in which the actual responses at the extremes of the distribution deviate from the predicted straight line and may reflect the known threshold or expansive response nonlinearity of V1 neurons (Tolhurst et al., 1981; Albrecht and Hamilton, 1982; DeAngelis et al., 1993; Gardner et al., 1999). One consequence of this nonlinearity will be to increase the sparseness of responses from a purely linear model (Baddeley et al., 1997; Vinje and Gallant, 2000). Furthermore, we would not expect perfect correlations because of the well known response variability of cortical neurons. Figure 7E shows the correlation coefficients between actual and predicted responses to the pictures for all 21 cells. Many of the correlation coefficients are high, like those for the four cells illustrated in Figure 7A–D. We expect lower correlation coefficients for some cells, especially those for which picture presentations were not repeated or few pictures contributed to the receptive-field estimates. Figure 7F accounts for this by showing the actual correlation coefficients divided by the highest coefficient that we expected after simulating the experiments (see Materials and Methods). The ratio is high and close to 1 for most cells, implying that our receptive field estimates account for much of the variability in the data. Six cells have a relative correlation of <0.5; it may be that these represent genuinely poor fits, or that the responses of these cells were substantially more variable than the Poisson noise that we modeled.
Our simulations assumed that response noise is Poisson, although many V1 cells have greater variability (Table 1, column 3). Thus, for some cells, the expected correlation between predicted and measured responses should have been less than we found. The picture-response data may have been overfitted, and noise and nonlinear behavior may have been approximated with a linear solution, perhaps by introducing features outside the spatially localized CRF or by introducing spectral features outside the single quasi-Gaussian feature that was expected. Some of the fields and spectra in Figure 6 do show features that would not be expected from a simple CRF and its spectrum. For instance, Figure 6D shows a cell that seems to have two receptive fields, whereas Figure 6B shows a cell that has two pairs of features in its spectrum. However, there are some important kinds of nonlinearity that the method would be unable to approximate with a linear solution (e.g., the identical responses of complex cells to bright and dark stimuli).
Table 1 examines to what extent the picture responses may have been overfitted in four cells that we have used as examples throughout. Column 11, shows what percentage of the total data variance can be explained by the best pseudoinverse reconstruction; the balance must be attributable to response noise and unfitted nonlinearities. Column 12 shows how much of the data were explained when the reconstructed field was windowed to exclude the space outside the localized features that we presume represent the CRF. Column 13 shows how much of the data can be explained when the Fourier spectrum of the field is windowed to include just the single feature that is most like the spectral tuning of a simple cell measured with gratings (De Valois et al., 1982; Jones et al., 1987). Windowing in space and windowing the spectrum might not be exclusive in removing overfitted spatial or spectral features. In all cases, although to different degrees, the quality of the fit is decreased by excluding parts of the reconstruction outside the CRF or the “classical spectral tuning.” It is possible that the regularized pseudoinverse method has used space outside the CRF to fit noise or nonlinearities in the neuronal responses, but the nonlinear processes may actually originate from these specific locations (Walker et al., 1999; Vinje and Gallant, 2000).
One of the implications of these results is that the traditional Gabor model of the receptive fields should be weaker. Although columns 7–10 of Table 1 show that orientation and spatial-frequency peaks from the Reginv and Gabor kernels are similar, column 6 shows that the correlation between the actual responses and the best Gabor-fit to the response data is always less than that between the actual responses and Reginv-predicted responses listed in column 4.
Responses to gratings
It may not be a surprise that a field reconstruction on the basis of the responses to pictures is capable of explaining those responses. Therefore, it is important to ask whether the same field reconstructions can explain the responses to a different set of stimuli altogether. The circles in Figure 8A show the responses of the cell in Figure 4A (the same as Fig. 3A) to gratings of different orientations. The dotted curve shows the orientation bandpass of the reverse-correlation map, replotted from Figure 3A. The dashed curve is the bandpass of the receptive-field reconstructed by the pseudoinverse method using a λ value in the middle of the plateau range, in which the dominant spatial frequency did not change much. This curve is an excellent fit to the actual grating responses of the cell. Not only is the peak orientation close to the true one, but the bandwidth is also now as narrow as the true orientation tuning curve of the cell. Figure 8B plots the optimal orientation predicted from the best reconstructed receptive-field map against the true optimal orientation for all 21 simple cells. The correlation coefficient is very high (r = 0.96; n = 21), and the regression line is almost identical to the line of equality.
The circles in Figure 8D show the responses of the same cell as a function of the spatial frequency of sinusoidal gratings. The dotted curve is the badly fitting bandpass of the reverse correlation map, replotted from Figure 3D. The dashed curve is the bandpass of the best regularized pseudoinverse field reconstruction. The curve is a near-perfect fit (the fits in other cells were not always as good). In particular, the optimal spatial frequency predicted from the receptive-field reconstruction is very close to the true optimum, and the bandwidth of the curve is close to the true bandwidth. Figure 8E plots the optimal spatial frequency predicted from the best reconstructed receptive-field map against the true optimal spatial frequency for the 18 cells for which we could convincingly choose the best field (Fig. 5). The correlation is again high (r = 0.89; n = 18) and, unlike the case with the reverse-correlation maps (Fig. 3E), the regression line (solid line) is close to the line of equality (dashed line). The spatial-frequency optima derived from natural scene stimulation are lower than those derived from gratings by only 0.072 log units (0.24 octaves). This may reflect a small but genuine difference in tuning under the two conditions; however, it may be an artifact resulting from a slightly conservative choice of λ or from the finite size of the pixels in the stimuli (Tadmor and Tolhurst, 1989).
Figure 8, C and F, shows the orientation and spatial-frequency bandwidths of the best pseudoinverse maps, plotted against the bandwidths of the tuning curves actually measured with gratings. Compared with reverse correlation (Fig. 3C,F), the predicted bandwidths are now more nearly the same as those actually measured. However, the predicted orientation bandwidths in particular are still systematically a little greater than the true ones measured with gratings; this is the expected effect of an expansive output nonlinearity (Gardner et al., 1999). We might expect a similar effect in the spatial-frequency bandwidths. The fact that the bandwidths are actually consistent with those measured with gratings may reflect a conservative choice of the regularization parameter λ (as was also suggested by the slight underestimation of optimal spatial frequency). In general, the simple-cell receptive fields reconstructed using the regularized pseudoinverse are in excellent agreement with the major aspects of the orientation and spatial-frequency tuning of the responses of the cells to sinusoidal gratings.
Some previous studies have compared the tuning of spatial receptive fields with that derived from sinusoidal gratings by fitting Gabor functions to the RF structure (Jones and Palmer, 1987a; Ringach, 2002). When we fitted Gabors to maximize response predictability, we found that the spatial frequency optima of the Gabors were indeed well correlated to the spatial frequency optima derived from sinusoidal gratings (r = 0.89; n = 21), a similar relationship compared with that found with Reginv. Similarly, the optimum was consistently underestimated with Gabors by 0.070 log units (0.23 octaves) when compared with the preferred grating.
Discussion
We have shown that standard reverse correlation can recover estimates of the receptive-field maps of simple cells in V1 from their responses to relatively small numbers of natural scene presentations. Although these maps are strongly biased toward low spatial frequencies, they do give estimates of the orientation preferences of cells that are in close agreement with the preferences measured with moving sinusoidal gratings.
More importantly, we have presented a novel application of the regularized pseudoinverse that allows recovery of receptive-field estimates with high spatial resolution and is theoretically accurate in both spatial frequency and orientation (Willmore, 2002). Comparison with the tuning curves measured with gratings shows that, in many cases, the tuning of simple cells in response to natural scenes is compatible with their tuning in response to gratings. However, the data cannot be completely described using a linear model, suggesting that nonlinear mechanisms operate during natural vision.
Comparison with other methods
There are two ways of inferring RF structure from neuronal response data. One can apply a parametric model, such as the Gabor, which assumes selectivity to a single orientation bandpass localized in space (Marcelja, 1980; Jones and Palmer, 1987a; Ringach, 2002), or one can use a nonparametric approach, which makes few assumptions about the exact spatial structure of the RF map (Smyth et al., 2000; Ringach et al., 2002). Although the Gabor model appears to predict tuning parameters very well, this is not surprising, because those same parameters are an explicit component of the fitting process. However, our nonparametric approach can generate not only comparable fits to tuning parameters but also better predictions of response variance (Table 1). This suggests that the Gabor model, although good, must in fact be an incomplete description of the responses of the cells to natural images.
The regularized pseudoinverse is an efficient nonparametric method for recovering linear receptive-field estimates (first-order kernels) from limited numbers of neuronal responses. However, other nonparametric methods exist. Theunissen et al. (2001) have shown that it is possible to recover linear kernels by using reverse correlation but then correcting the resulting spike-weighted averages to remove the bias produced by using nonorthogonal stimuli. Smyth et al. (2000) and Ringach et al. (2002) used iterative methods that find least-squares solutions. All of these methods offer accurate receptive-field estimation in principle, given very large numbers of stimulus presentations. They also have the virtue that they make no a priori assumptions about the receptive-field structure.
However, all assumption-free methods suffer because of neuronal response variability, and the methods do not provide a way to separate signal (receptive-field estimate) from noise (overfitting of response variability). This effectively limits the receptive-field resolution that can be produced from a given number of stimuli. Thus, although these methods have quantified the orientation tuning of cortical cells under natural stimulation, they have not been able to reveal the spatial-frequency tuning. A direct comparison of all of these methods was made previously using simulated neurons under controlled conditions (B. Willmore and D. Smyth, unpublished observations). In summary, the regularized inverse method described here produces more efficient reconstructions than existing methods (Smyth et al., 2000; Theunissen et al., 2001; Ringach et al., 2002).
The Laplacian-constrained estimation method that we have developed is an example of a class of regularized solutions to linear problems. Regularization involves incorporating a priori information about the structure of the signal and noise, to reduce noise while minimally corrupting the signal. The Laplacian constraint implements a minimal assumption that the true receptive-field map is smooth on the scale of the stimulus pixels. This assumption means that a more accurate receptive-field map can be obtained from a given number of stimuli. This has enabled us to show that, for many cells, the orientation and spatial-frequency tuning characteristics under natural stimulation are compatible with those for stimulation with drifting gratings.
The regularized pseudoinverse is a general method for recovering linear kernels from arbitrary stimulation and could be applied to many different classes of quasilinear neurons. The high resolution of the method allows receptive fields to be estimated either at a high level of detail over a large area of visual space or at a large number of stimulus dimensions. A limitation of the technique is that it is only appropriate for receptive fields that are approximately smooth; however, other related regularization methods are available (Press et al., 1992) that might provide constraints that are more appropriate for other classes of neurons.
Another limitation of the present method is that it can only recover the first-order kernel, and therefore only describes the linear part of the neuronal response. David et al. (1999), Theunissen et al. (2001), and Ringach et al. (2002) have shown that by applying nonlinear transformations to the stimuli, it is possible to gain insight into the behavior of neurons that have simple nonlinear behavior (e.g., cortical complex cells). Regularization could be incorporated into these methods to improve their efficiency. A more general approach would be to directly recover the second-order Wiener kernels (Marmarelis and Marmarelis, 1978) using a regularized estimation method.
Linearity of responses to natural scenes
For most of the cells we analyzed, we could recover a linear kernel that conformed to our expectations about the structure of simple-cell receptive fields (Hubel and Wiesel, 1959; Jones and Palmer, 1987a; DeAngelis et al., 1993; Ringach, 2002). Moreover, the spatial frequency and orientation tuning predicted by these receptive-field maps is compatible with the tuning measured with drifting sinusoidal gratings. This suggests that most cells were performing approximately linear summation (although this does not exclude output nonlinearities) and they had roughly Gabor-like receptive fields.
Simple cell f31205 (illustrated throughout this study) is a particularly strong example of this. For this cell, we mapped the receptive field conventionally with small spots of light (Fig. 9A) to reveal a single OFF region above a stronger ON region; this is shown by the diagram drawn over the gray-level representation. This cell responded strongly to the onset and offset of different stimuli (Fig. 9B). Figure 9C shows one particular natural stimulus (320). The cell responded very strongly to the offset of this image. Figure 9C shows that one high-contrast dark-bright edge in the picture fragment was almost perfectly oriented and aligned to the border between the ON and OFF regions. The polarity of the edge is complementary to the polarity of the ON and OFF regions, and so an offset response was evoked. The preferred natural trigger feature of the cell seems exactly what would have been predicted from the conventional receptive field in Figure 9A. Figure 9, D and E, shows other natural images that invoked strong responses to the stimulus onset. Again, the stimulus profile in Figure 9D clearly matches the receptive field, whereas that in Figure 9E is less immediately obvious, because the cell is integrating across several small features. Finally, Figure 9F shows the success of our regularized pseudoinverse method in recovering that field. Note how well it matches the conventional field (Fig. 9A) and optimal stimuli (Fig. 9C–E).
Nonlinearities in the responses to natural scenes
In addition to cells that performed broadly linear summation, our sample included some cells with responses that were poorly described by the linear model and two cells that totally failed to produce a reverse-correlation map. This suggests that, despite being simple cells (as defined by their relative modulation), some of these cells had strongly nonlinear behavior. This is consistent with the recent suggestion (Mechler and Ringach, 2002) that the strict classification of simple and complex cells in V1 may need revision (cf. Dean and Tolhurst, 1983).
In addition, all of the cells in our sample show some nonlinear behavior; even the highly linear cell of Figure 9 shows a clear output nonlinearity (Fig. 7C). It is well known that simple-cell responses are subject to thresholding (Movshon et al., 1978; Schumer and Movshon, 1984; Tolhurst and Dean, 1987; Carandini and Ferster, 2000) or half-squaring (Albrecht and Geisler, 1991; Heeger, 1992, Tolhurst and Heeger, 1997), which is evident in our results. Indeed, this output nonlinearity is responsible for mismatches in the predictions of grating responses from conventional receptive-field mapping (Tadmor and Tolhurst, 1989; Heeger, 1992; DeAngelis et al., 1993; Gardner et al., 1999; Lampl et al., 2001). Future studies should compare the output nonlinearities inferred from responses with gratings and natural scenes. As with conventional mapping, the receptive fields that we recovered from the responses to natural scenes systematically overestimate the orientation tuning bandwidths (Fig. 8C). It is surprising that the spatial-frequency bandwidths (Fig. 8F) seem less affected systematically, although we consider an explanation for this in Results.
Our results also demonstrate more profound deviations from the linear Gabor model of simple cells. First, many of the spatial receptive-field maps that we have recovered (Fig. 6) show structure that is far outside the central receptive field of the cells. It is likely that this structure reflects a linear approximation of the effects of nonlinear contextual mechanisms, such as those found using classical stimuli (Blakemore and Tobin, 1972; Nelson and Frost, 1985; Bonds, 1989; Knierim and Van Essen, 1992; Walker et al., 1999; Kapadia et al., 2000). Similarly, some of the Fourier space maps of the receptive fields (Fig. 6) show structure at orthogonal orientations, suggesting that the cells were influenced by stimuli that lay outside their classical spectral tuning (DeAngelis et al., 1992; Shevelev et al., 1994; Sillito et al., 1995). This may primarily be the result of contextual mechanisms that have been revealed by experiments with classical stimuli (Bonds, 1989).
However, it is also possible that these results reflect novel nonlinear mechanisms that operate only under naturalistic stimulation conditions. They may reflect optimizations of V1 for the efficient coding of the information in natural scenes (Rao and Ballard, 1999; Vinje and Gallant, 2000; Schwartz and Simoncelli, 2001), or perhaps specialization for some perceptual process such as figure/ground segregation (Knierim and Van Essen, 1992; Zipser et al., 1996; Northdurft et al., 1999) or contour integration (Nelson and Frost, 1985; Kapadia et al., 2000). To investigate the relative contributions of known and novel nonlinear mechanisms, it will be necessary to investigate more closely the circumstances under which these mechanisms operate during visual stimulation with more naturalistic temporal properties (Smyth et al., 2002).
Footnotes
This research was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) and McDonnell-Pew. D.S. received a Medical Research Council studentship and was later employed by a BBSRC project grant to I.D.T. and D.J.T. B.W. received a BBSRC studentship and was later employed by a BBSRC project grant to D.J.T. and Professor Tom Troscianko. G.E.B. was supported by the Wellcome Trust. We are very grateful for the technical support from Pat Cordery and experimental assistance from Louise Upton.
Correspondence should be addressed to Darragh Smyth, Laboratory of Physiology, Oxford University, Parks Road, Oxford OX1 3PT, UK. E-mail: darragh.smyth{at}physiol.ox.ac.uk.
B. Willmore's present address: Psychology Department, University of California, Berkeley, 3210 Tolman Hall (1650), Berkeley, CA 94720-1650.
G. E. Baker's present address: Department of Optometry and Visual Science, Northampton Square, City University, London EC1V 0HB, UK.
Copyright © 2003 Society for Neuroscience 0270-6474/03/234746-14$15.00/0