The Journal of Neuroscience, November 16, 2005, 25(46):10577-10597; doi:10.1523/JNEUROSCI.3726-05.2005
Previous Article | Next Article 
Symposia and Mini-Symposia
Do We Know What the Early Visual System Does?
Matteo Carandini,1
Jonathan B. Demb,2
Valerio Mante,1
David J. Tolhurst,3
Yang Dan,4
Bruno A. Olshausen,6
Jack L. Gallant,5,6 and
Nicole C. Rust7
1Smith-Kettlewell Eye Research Institute, San Francisco, California 94115, 2Departments of Ophthalmology and Visual Sciences, and Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48105, 3Department of Physiology, University of Cambridge, Cambridge CB2 1TN, United Kingdom, Departments of 4Molecular and Cellular Biology and 5Psychology and 6Helen Wills Neuroscience Institute and School of Optometry, University of California, Berkeley, Berkeley, California 94720, and 7Center for Neural Science, New York University, New York, New York 10003
Abstract
We can claim that we know what the visual system does once we can predict neural responses to arbitrary stimuli, including those seen in nature. In the early visual system, models based on one or more linear receptive fields hold promise to achieve this goal as long as the models include nonlinear mechanisms that control responsiveness, based on stimulus context and history, and take into account the nonlinearity of spike generation. These linear and nonlinear mechanisms might be the only essential determinants of the response, or alternatively, there may be additional fundamental determinants yet to be identified. Research is progressing with the goals of defining a single "standard model" for each stage of the visual pathway and testing the predictive power of these models on the responses to movies of natural scenes. These predictive models represent, at a given stage of the visual pathway, a compact description of visual computation. They would be an invaluable guide for understanding the underlying biophysical and anatomical mechanisms and relating neural responses to visual perception.
Key words: contrast; lateral geniculate nucleus; luminance; primary visual cortex; receptive field; retina; visual system; natural images
The ultimate test of our knowledge of the visual system is prediction: we can say that we know what the visual system does when we can predict its response to arbitrary stimuli. How far are we from this end result? Do we have a "standard model" that can predict the responses of at least some early part of the visual system, such as the retina, the lateral geniculate nucleus (LGN), or primary visual cortex (V1)? Does such a model predict responses to stimuli encountered in the real world?
A standard model existed in the early decades of visual neuroscience, until the 1990s: it was given by the linear receptive field. The linear receptive field specifies a set of weights to apply to images to yield a predicted response. A weighted sum is a linear operation, so it is simple and intuitive. Moreover, linearity made the receptive field mathematically tractable, allowing the fruitful marriage of visual neuroscience with image processing (Robson, 1975
) and with linear systems analysis (De Valois and De Valois, 1988
). It also provided a promising parallel with research in visual perception (Graham, 1989
). Because it served as a standard model, the receptive field could be used to decide which findings were surprising and which were not: if a phenomenon was not predictable from the linear receptive field, it was particularly worthy of publication.
Research aimed at testing the linear receptive field led to the discovery of important nonlinear phenomena, which cannot be explained by a linear receptive field alone. These phenomena have been discovered at all stages of the early visual system, including the retina (for review, see Shapley and Enroth-Cugell, 1984
; Demb, 2002
), the LGN (for review, see Carandini, 2004
), and area V1 (for review, see Carandini et al., 1999
; Fitzpatrick, 2000
; Albright and Stoner, 2002
). They have forced a revision of the models based on the linear receptive field. In some cases, the revised models have achieved near standard model status, for example, the model of Shapley and Victor for contrast gain control in retinal ganglion cells (Shapley and Victor, 1978
; Victor, 1987
) and Heeger's normalization model of V1 responses (Heeger, 1992a
). By and large, however, the discovery of failures of the linear receptive field has deprived the field of a simple standard model for each visual stage.
This review aims to help move the field toward the definition of new standard models, bringing the practice of visual neuroscience closer to that of established quantitative fields such as Physics. In these fields, there is wide agreement as to what constitutes a standard theory and which results should be the source of surprise.
The review is authored by the speakers and organizers of a mini-symposium at the 2005 Annual Meeting of the Society for Neuroscience. We are all involved in a similar effort: we construct models of neurons and test how accurately they predict the responses to both simple laboratory stimuli and complex stimuli such as those that would be encountered in nature. How accurate are the existing models when held to a rigorous test? By what standards should we judge them? Do they generalize to large classes of stimuli? How should the models be revised?
The review is organized along the lines of the mini-symposium, with each speaker addressing the question "Do we understand visual processing?" at one or more stages of the visual hierarchy. We begin with Background, in which we summarize some notions that are at the basis of most functional models in early vision. Demb follows with an evaluation of standard models of the retina (see below, Understanding the retinal output). Mante formalizes an extension to the linear model of LGN neurons to account for luminance and gain control adaptation effects (see below, Understanding LGN responses). The successes and failures of cortical models are addressed by Tolhurst (see below, Understanding V1 simple cells) and Dan (see below, Understanding V1 complex cells). Gallant discusses novel model characterization techniques and their degree of success in areas V1 and V4 (see below, Evaluating what we know about V1 and beyond). Finally, Olshausen argues that our understanding of V1 is far from complete and proposes future avenues for research (see below, What we don't know about V1). In Conclusion, we isolate some of the common ideas and different viewpoints that have emerged from these contributions.
Background
At the basis of most current models of neurons in the early visual system is the concept of linear receptive field. The receptive field is commonly used to describe the properties of an image that modulates the responses of a visual neuron. More formally, the concept of a receptive field is captured in a model that includes a linear filter as its first stage. Filtering involves multiplying the intensities at each local region of an image (the value of each pixel) by the values of a filter and summing the weighted image intensities. A linear filter describes the stimulus selectivity for a neuron: images that resemble the filter produce large responses, whereas images that have only a small resemblance with the filter produce negligible responses. For example, tuning for the spatial frequency of a drifting grating is described by the center-surround organization of filters in the retina and LGN (Fig. 1A) (Enroth-Cugell and Robson, 1966
), whereas orientation tuning in V1 is described by filters that are elongated along one spatial axis (Fig. 1B) (Hubel and Wiesel, 1962
).
Basic models of neurons at the earliest stages of visual processing (retina, LGN, and V1 simple cells) typically include a single linear filter (Enroth-Cugell and Robson, 1966
; Movshon et al., 1978b
), whereas models of neurons at later stages of processing (V1 complex cells and beyond) require multiple filters (Fig. 1C) (Movshon et al., 1978b
; Adelson and Bergen, 1985
; Touryan et al., 2002
).
The second stage of these models describes how the filter outputs are transformed into a firing rate response. This transformation typically takes the form of a static nonlinearity (e.g., half-wave rectification), a function that depends only on its instantaneous input. In addition, many models implicitly assume that firing rate is expressed into spike trains via a Poisson process.
Although the receptive field has been described thus far as a set of weights arranged in space (Fig. 1), in reality, the concept of receptive field involves three dimensions: two dimensions of space and the dimension of time. The full spatiotemporal receptive field of a neuron specifies what weight is given to each location in space at each instant in the recent past. When only the temporal evolution of the response is considered for a given spatial position (Fig. 2), the receptive field is commonly referred to as a temporal weighting function.
Whether they are specified in space, time, or jointly in space and time, receptive fields are typically endowed with ON and OFF subfields (Fig. 1, white and black regions). An ON region is one in which a bright light evokes a positive response and a dark light evokes a negative response. An OFF region does the opposite. In the early days, these regions were called "excitatory" and "inhibitory" (Hubel and Wiesel, 1962
). However, this name is misleading: their sign has to do with the relative contrast of light, not to the operation of synaptic excitation and inhibition. For instance, an OFF region will deliver substantial excitation in response to a dark stimulus (Hirsch, 2003
).

View larger version (13K):
[in this window]
[in a new window]
|
Figure 1. Basic models of neurons involved in early visual processing. In all models, the response of a neuron is described by passing an image through one or more linear filters (by taking the dot product or projection of an image and a filter). The outputs of the linear filters are passed through an instantaneous nonlinear function, plotted here as firing rate on the ordinate and filter output on the abscissa. A, Simple model of a retinal ganglion cell or of an LGN relay neuron. The model includes a linear filter (receptive field) with a center-surround organization and a half-wave rectifying nonlinearity. Images that resemble the filter produce large firing rate responses, whereas images that resemble the inverse of the filter or have no similarity with the filter produce no response. B, Model of a V1 simple cell as a filter elongated along one axis and a half-wave squaring nonlinearity. As in A, only images that resemble the filter produce high firing rate responses. C, The energy model of a V1 complex cell. The model includes two phase-shifted linear filters whose outputs are squared before they are summed. In this model, both images that resemble the filters and their inverses produce high firing rates.
|
|
The advantage of assuming an initial linear processing stage is that it enables the experimenter to recover a full model of a neuron within the time constraints of an experiment. Recovering the filter weights involves presenting a sufficiently rich stimulus set to the cell (e.g., white noise, flashed gratings, or natural images) and correlating the response of the neuron with the pixel intensities in the images that immediately preceded spikes. For neurons early in the visual system, a single linear filter is often extracted by presenting a random noise stimulus and computing the mean pixel intensity before each spike, the spike-triggered average (Chichilnisky, 2001
). Similar approaches are followed in the sections below on the retina, lateral geniculate nucleus, and V1 simple cells. At later stages of visual processing, the responses of multiple linear filters can be accounted for by looking at the higher-order correlations between random stimuli and the response of a neuron (Simoncelli et al., 2004
). This is the approach followed below in Understanding V1 complex cells. Novel nonlinear mapping techniques provide a bridge between these approaches (see below, Evaluating what we know about V1 and beyond).
Understanding the retinal output
The retina contains a complex network of cells, divided into an estimated
60-80 cell types: 3-4 photoreceptors,
40-50 interneurons, and
15-20 ganglion cells, whose spike trains transmit visual information to the rest of the brain (Masland, 2001
; Sterling and Demb, 2004
; Wässle, 2004
). No predictive model will suffice for all types of ganglion cell, because some cells have "conventional" center-surround receptive fields (Fig. 1A) (Kuffler, 1953
; Enroth-Cugell and Robson, 1966
), whereas others have specialized properties, including direction selectivity and intrinsic photosensitivity (Berson, 2003
; Taylor and Vaney, 2003
; Dacey et al., 2005
). As a starting point, predictive models have focused on four ganglion cell types: the ON- and OFF-center versions of sustained (X/parvocellular type) and transient (Y/magnocellular type) cells. These four cell types express relatively simple receptive fields, they project via the LGN to visual cortex, and they can, with some caveats, be modeled in a relatively straightforward way. For the purpose of the predictive models in question, we could ignore all of the complexity of retinal circuitry; the goal is simply to achieve a thorough understanding of how light at the cornea corresponds to spiking responses in the ganglion cell.
Perhaps surprisingly, most retinal studies have not attempted to "go all the way" and predict responses to natural movies, but rather they have focused on a simple dynamic laboratory stimulus: white noise. A white-noise stimulus is created by drawing intensity values from a Gaussian distribution, defined by a mean and an SD of intensity, every
10-20 ms (Fig. 2). White noise contains approximately equal energy over a range of temporal frequencies (Zaghloul et al., 2005
). The relatively flat temporal frequency spectrum is a nice feature for characterizing the receptive field, but this flat spectrum differs markedly from natural scenes, in which there is decreasing stimulus energy at higher temporal frequencies (Simoncelli and Olshausen, 2001
). Nevertheless, the response to white noise presents a serious challenge for predictive models and reveals several important nonlinearities.
To take an example, we could perform a simple experiment in which we stimulate a cell with a spot of light over the receptive field center and modulate the spot intensity with white noise (Zaghloul et al., 2005
). In this case, we build a model of the temporal response of the cell only [although this approach can easily be extended to model the full spatiotemporal-chromatic response (Chichilnisky, 2001
)]. The first step is to build a linear model of the response of the cell. To do so, we cross-correlate the spike response with the white-noise stimulus (Sakai and Naka, 1995
; Chichilnisky, 2001
). The result is a linear filter that represents the weighting function of the cell (see above, Background). Then, at any instant in time, we can generate the linear response by multiplying the stimulus by the temporal weighting function, pointwise, and summing the result (Fig. 2). To generate the linear response at the next moment, we advance the temporal weighting function in time and repeat the process. Under certain conditions, the linear model alone predicts the cone photoreceptor response to a white-noise stimulus (Rieke, 2001
; Baccus and Meister, 2002
). However, the linear model fails for ganglion cell responses because of several nonlinearities (Shapley and Victor, 1978
; Victor, 1987
; Chichilnisky, 2001
; Kim and Rieke, 2001
; Baccus and Meister, 2002
; Zaghloul et al., 2003
, 2005
).
One major nonlinearity is the spike threshold. Resting discharge of ganglion cells can be as low as 0 spikes/s or as high as 80 spikes/s, but a value of 10-20 spikes/s is common (Kuffler et al., 1957
; Troy and Robson, 1992
; Passaglia et al., 2001
). A nonoptimal stimulus will reduce the firing rate, but firing rates cannot go negative, and so there is a point at which spiking responses are "clipped." Furthermore, an optimal stimulus will increase the spike rate, but spike rates cannot be infinitely high. In a 10 ms period, a cell could fire at most approximately four spikes (or 400 spikes/s) because of the
1-2 ms refractory period after each spike. Thus, the clipping ("rectification") and the maximum rate ("saturation") represent two notable nonlinearities. These nonlinearities can, to some degree, be modeled as "static," meaning that the linear response can be passed through an input-output function that is invariant over time (Fig. 2) (Chichilnisky, 2001
). The combination of a linear filter and a static nonlinearity creates the linear-nonlinear (LN) model of spiking (Figs. 1A, 2). This model predicts the spike rate but not actual spike times; spiking is modeled as a Poisson process, defined by a rate (with equal mean and variance), but spike times are otherwise random. Thus, the model can most properly be termed the linear-nonlinear-Poisson (LNP) model of spiking (Paninski et al., 2004
).
Despite its simplicity, the LNP model works rather well at predicting spike rates. In practice, one can generate the linear prediction of the response using the method described above. One can estimate the static nonlinear function by plotting the linear prediction of the response versus the actual response and fitting a smooth function (Chichilnisky, 2001
). One way to test the model is to build the linear and nonlinear stages based on one dataset and then test how well the model predicts the response to a novel test stimulus (with the same contrast and mean luminance as the stimulus used to generate the model). On such tests, the LNP model predicts the new dataset nearly as well as does a maximum likelihood "gold standard" (Chichilnisky, 2001
; Kim and Rieke, 2001
; Zaghloul et al., 2003
). Another measure is the amount of variance captured by the model (r2). In Figure 2, the LNP model captured 81% of the variance in the spike response. A similar LN model works equally well on subthreshold membrane voltage or current responses (Kim and Rieke, 2001
; Rieke, 2001
; Baccus and Meister, 2002
; Zaghloul et al., 2003
, 2005
).
We could feel rather satisfied by the ability of the LNP model to predict the response to a novel test stimulus. However, model performance would degrade quickly if we changed almost any aspect of the test stimulus. For example, imagine that we changed the contrast (the SD of the Gaussian distribution of luminance values). Increasing contrast reduces the sensitivity of the linear filter (height) and shortens the integration time (width) (Shapley and Victor, 1978
; Smirnakis et al., 1997
; Benardete and Kaplan, 1999
; Chander and Chichilnisky, 2001
; Kim and Rieke, 2001
; Zaghloul et al., 2005
). Thus, to model the response at the new contrast, we would need to use a new filter. However, in many cases, we can model the response at the new contrast with the same nonlinear function as before (Chander and Chichilnisky, 2001
; Kim and Rieke, 2001
; Zaghloul et al., 2005
). Still, to predict the response to multiple contrasts, we would need to know the linear filter for each contrast.
Even if we knew the linear filter for all contrast levels, we would have another problem. Each of our linear filters was calculated using a white-noise stimulus with a contrast level that remained constant during the filter measurement. As soon as we move to a natural stimulus, we can expect that the contrast would change continuously, and so we would need to know how the linear filter changed dynamically over time with the contrast level. There is some evidence that that the filter changes rapidly after a change in contrast, in
10-100 ms (Victor, 1987
; Baccus and Meister, 2002
); however, other measures suggest a slower change over seconds (Kim and Rieke, 2001
). Furthermore, there are cases in which switching contrast to a new level changes not only the filter but also the static nonlinearity (Baccus and Meister, 2002
). This introduces a complication for the LNP model because, even if the LNP model is useful at a given, steady contrast level, we must consider that both the linear and nonlinear stages would change dynamically as contrast varied over time in a natural movie.
The above example considers the response to a luminance modulation in time, a one-dimensional problem, but of course a natural movie varies over time in two dimensions of space (plus there is the issue of color). When we consider space, two complications arise. First, transient (Y-type) ganglion cells combine subregions of their receptive field nonlinearly, apparently because of nonlinearities at the output of presynaptic bipolar cells (Demb et al., 2001
). Furthermore, there are nonlinear signals passed across the retina from outside the classical (center-surround) receptive field that are not captured by the LNP model (Demb et al., 1999
; Roska and Werblin, 2001
; Olveczky et al., 2003
). Some models have characterized these nonlinear influences using quantitative approaches (Shapley and Victor, 1978
; Victor, 1979
). However, it is not clear at present how well these models would predict responses to natural stimuli. Furthermore, certain ganglion cells adapt to the pattern of light over space or time, such that the linear filter becomes less sensitive to the most predictable features of the stimulus (Hosoya et al., 2005
). For example, this type of adaptation would increase sensitivity to horizontal features after prolonged exposure to vertical features. This pattern adaptation will need to be considered in future predictive models.
One direction to push the LNP model is to generate a more realistic pattern of spiking than Poisson output. In fact, ganglion cells, unlike cortical cells, fire spikes much more reliably than a Poisson process (Berry et al., 1997
; Reich et al., 1997
; Kara et al., 2000
; Demb et al., 2004
; Uzzell and Chichilnisky, 2004
). For example, a stimulus that evokes, on average, a burst of nine spikes will show an SD (across repeated trials) of approximately one spike rather than the Poisson value of three spikes (i.e., variance of 9, equal to the mean). One recent approach used a novel method for fitting a model that includes a linear filter followed by an integrate-and-fire spike generator (Paninski et al., 2004
). To model realistic patterns of spiking, the spike generator includes a recovery function, after each spike, mimicking a refractory period (Keat et al., 2001
; Paninski et al., 2004
). One result of this approach is that the apparent contrast-dependent change in the linear filter width as measured by the LNP model may be an artifact related to the refractory period in the data (Pillow and Simoncelli, 2003
). However, intracellular studies, which measure the continuous subthreshold potential, suggest that some amount of the contrast-dependent change in filter width may be real (Kim and Rieke, 2001
; Zaghloul et al., 2005
).
Even given all of the above complications, it is surprising that more retinal studies have not attempted to predict the response to a natural movie. One group tested their model on a full-field stimulus that was modulated by a natural sequence of light fluctuation, and the model did a reasonable job (van Hateren et al., 2002
). The model was based on a linear filter approach and included feedback gain controls, to account for adaptation to the mean intensity and contrast, and a rectifying nonlinearity to model the spike threshold. In fact, many "bursts" of spiking evoked during the stimulus were captured by the model, although there was clearly room for improvement. Furthermore, the study did not test the predictive power of the model on novel datasets. Still, the results were generally encouraging.
What are the next steps for predictive modeling in the retina? Clearly, there are many questions left unanswered by the LNP model. A major question is how we can predict the linear filter at any instant in time, given the previous statistics of the stimulus. A working hypothesis suggests that the retina adapts separately to contrast ("contrast adaptation") and the mean intensity ("light adaptation"). So one advance would be to further understand the rules by which the previous mean intensity and contrast influence the filter (see below, Understanding LGN responses). However, this hypothesis suggests that the mean and contrast are the only relevant parameters and that these parameters control filter adaptation independently; both of these assumptions require additional validation. Also, this theory ignores possible adaptation to higher-order stimulus statistics (Hosoya et al., 2005
). Furthermore, once we get past photoreception and into the retinal circuit, cells no longer adapt to light statistics; rather, they adapt based on changes in neurotransmitter release over time as well as intrinsic cellular properties. Thus, it will be important to further understand cellular mechanisms for adaptation. Intracellular recordings suggested that contrast adaptation occurs partly at the level of synaptic input and partly at the level of ganglion cell spike generation, suggesting an adaptive mechanism intrinsic to the ganglion cell (Kim and Rieke, 2001
, 2003
; Zaghloul et al., 2005
). Further understanding cellular mechanisms for adaptation could provide key insights into the optimal architecture of a predictive model. In other words, we should amend the statement at the beginning of this section about ignoring retinal circuitry: knowledge of the circuitry could indeed guide the development of an appropriate predictive model.
In summary, the response to a dynamic laboratory stimulus, white noise, can be predicted fairly well using a simple linear model followed by a static nonlinearity. Rapid changes in stimulus statistics, as would occur in a natural movie, requires additional understanding of the rules by which the linear and nonlinear stages adapt over time. Furthermore, there are advances to be made in modeling spike times, as opposed to rates, and we need to further understand the multiple ganglion cell types beyond the four types considered here. To many in the field of visual neuroscience, predicting responses in the retina seems much simpler than predicting responses in an extrastriate area, such as V4, and there is clearly truth to this notion. Nevertheless, there is still a ways to go before we can predict retinal responses to an arbitrary stimulus.
Understanding LGN responses
The LGN occupies a strategic position, a strait through which most retinal signals must pass to reach visual cortex. The strongest retinal input to the LGN originates from ganglion cells of the X/parvocellular type and of the Y/magnocellular type. These two cell types have been studied extensively (see above, Understanding the retinal output) and together constitute
50% of ganglion cells in the cat and
80% in primates (Rodieck et al., 1993
; Masland, 2001
; Wässle, 2004
). Additional input to LGN relay cells originates from other geniculate neurons, from subcortical structures, and from cortex (Guillery and Sherman, 2002
).
It would be highly desirable to obtain a complete description of how LGN neurons respond to visual stimuli. Such a description would summarize the computations performed by the retinal and thalamic circuitry and amount to a full understanding of the visual inputs received by primary visual cortex.
The main determinant of the responses of LGN neurons is the linear receptive field, whose broad attributes are similar to those of the afferent retinal ganglion cells. The receptive field is composed of a center and of a larger surround, whose responses interact subtractively (Fig. 1A). Both center and surround have a biphasic temporal weighting function (Fig. 2), i.e., they weigh contributions from the recent and less recent past with opposite polarity (Cai et al., 1997
; Reid et al., 1997
). The linear receptive field accurately predicts the basic selectivity of LGN neuron measured with gratings. For instance, the spatial profile of the receptive field predicts the selectivity for spatial frequency (Kaplan et al., 1979
; So and Shapley, 1981
; Shapley and Lennie, 1985
), whereas the temporal weighting function predicts the selectivity for temporal frequency (Saul and Humphrey, 1990
; Kremers et al., 1997
; Benardete and Kaplan, 1999
). The linear receptive field does not describe only responses to simple laboratory stimuli but also captures the basic features in the responses to complex video sequences (Dan et al., 1996
).
The shape of the temporal weighting function of LGN neurons depends on two strong nonlinear adaptive mechanisms that originate in retina: luminance gain control and contrast gain control (see above, Understanding the retinal output) (Shapley and Enroth-Cugell, 1984
). These gain control mechanisms affect the height (i.e., the gain) and width (i.e., the integration time) of the temporal weighting function. Luminance gain control (also known as light adaptation) occurs primarily in retina. It matches the limited dynamic range of neurons to the locally prevalent luminance (light intensity). Gain and integration time are reduced for locations of the visual field where mean luminance is high and increased where mean luminance is low (Dawis et al., 1984
; Rodieck, 1998
). Contrast gain control begins in retina (Shapley and Enroth-Cugell, 1984
; Victor, 1987
; Baccus and Meister, 2002
) and is strengthened at subsequent stages of the visual pathway (Kaplan et al., 1987
; Sclar et al., 1990
). It regulates gain and integration time on the basis of the locally prevalent root-mean-square contrast, the SD of the stimulus luminance divided by the mean luminance. Gain and integration time are reduced for locations of the visual field in which contrast is high and increased in which contrast is low.
These gain control mechanisms dampen the impact of sudden changes in the mean luminance or contrast of a scene such as those brought about by eye movements. This effect is illustrated in Figure 3, A and B, by the responses of an LGN neuron in an anesthetized, paralyzed cat (Mante et al., 2005
). Stimuli were drifting gratings of optimal spatial frequency. Several seconds after the onset of a grating, either mean luminance (at constant contrast) or contrast (at constant mean luminance) was suddenly increased. LGN responses are barely affected by the change in luminance (Fig. 3A) and only weakly affected by the change in contrast (Fig. 3B). Indeed, consider the responses predicted at high luminance from the linear receptive field measured at low luminance (Fig. 3A, red curves) and the responses predicted at high contrast from the linear receptive field measured at low contrast (Fig. 3B, red curves). The linear predictions are larger and slower than the measured responses, indicating that gain and integration time are reduced when luminance or contrast are increased. This reduction in gain and integration time is completed within a cycle of the drifting grating, demonstrating that the gain control mechanisms operate in <100 ms (Enroth-Cugell and Shapley, 1973a
; Saito and Fukada, 1986
; Victor, 1987
; Lankheet et al., 1993a
; Yeh et al., 1996
; Baccus and Meister, 2002
; Lee et al., 2003
; Mante et al., 2005
). This fast timescale suggests that gain control mechanisms reduce the impact of eye movements, which place the receptive field of neurons in the early visual system over regions of widely different mean luminance and contrast (Mante et al., 2005
).
Although the gain control mechanisms are likely to play a major role during natural vision, most efforts to predict the responses of LGN neurons to natural stimuli have been limited to assuming a fixed linear receptive field (Dan et al., 1996
) and have omitted the effects of gain control. There are several reasons for this omission. First, existing models of gain control are limited in scope: they operate only on simplified stimuli such as gratings, and they lack a definition of luminance and contrast that applies to arbitrary stimuli. Second, with few exceptions (Troy and Enroth-Cugell, 1993
), luminance gain control and contrast gain control were typically only studied in isolation. During natural vision, however, luminance and contrast vary independently of each other (Mante et al., 2005
). Thus, a general model of gain control should predict the shape of the temporal weighting function at every possible combination of luminance and contrast.
Nonetheless, many of the components needed to build a general model of gain control have already been described individually. For instance, studies have separately modeled the effects of luminance gain control (Fuortes and Hodgkin, 1964
; Baylor et al., 1974
; Brodie et al., 1978
; Shapley and Enroth-Cugell, 1984
; Purpura et al., 1990
) and of contrast gain control (Shapley and Victor, 1981
; Victor, 1987
; Carandini et al., 1997
; Benardete and Kaplan, 1999
) on the temporal weighting function of neurons in the early visual system. Many models share the same simple design: the temporal weighting function is obtained by convolving a fixed temporal weighting function with a variable filter, whose parameters depend on luminance or contrast. This design can be easily extended to predict the temporal weighting function at any combination of luminance and contrast. Indeed, a recent study of LGN responses demonstrated that the effects of luminance gain control and contrast gain control are independent of each other (Mante et al., 2005
). Thus, the temporal weighting function can be described by convolving the fixed weighting function with two variable filters, one that depends on luminance and the other that depends on contrast.
These studies provide a number of clues about how retinal or LGN neurons compute the luminance and contrast of an arbitrary stimulus. Luminance and contrast are computed not only very rapidly (Fig. 3A,B) but also very locally. Luminance gain control is driven by the average light intensity falling onto a region that is not larger than the surround of the linear receptive field (Cleland and Enroth-Cugell, 1968
; Enroth-Cugell and Shapley, 1973b
; Enroth-Cugell et al., 1975
; Cleland and Freeman, 1988
; Lankheet et al., 1993b
). Similarly, contrast gain control is driven only by stimuli lying within the linear receptive field (Solomon et al., 2002
; Bonin et al., 2005
). More precisely, contrast seems to be computed from the integrated responses of a pool of small, nonlinear subunits coextensive with the linear receptive field (Shapley and Victor, 1979
; Enroth-Cugell and Jakiela, 1980
; Bonin et al., 2005
).
These insights on gain control can be used to build a nonlinear model of LGN responses that is general enough to predict the responses to arbitrary stimuli (Bonin et al., 2005
; Mante, 2005
). This model predicts a number of nonlinear phenomena in the responses to simple stimuli, none of which would be explained by the linear receptive field alone. (1) Response amplitude is independent of mean luminance at low temporal frequencies, although it is approximately proportional to mean luminance at high frequencies (Shapley and Enroth-Cugell, 1984
; Purpura et al., 1990
). (2) Response amplitude saturates with contrast. As contrast is increased, the gain is decreased, although not so much as to make responses independent of contrast ["contrast saturation" (Derrington and Lennie, 1984
; Cheng et al., 1995
)]. (3) Responses are selective for stimulus size, being maximal for stimuli of intermediate size and being suppressed by larger stimuli ["size tuning" (Jones et al., 2000
; Solomon et al., 2002
; Ozeki et al., 2004
)]. For large stimuli, an increase in size adds only little excitatory drive to the responses, although it strongly reduces gain by recruiting more of the subunits driving contrast gain control. (4) The strength of contrast saturation and size tuning depends on the temporal frequency of the stimulus. Both are strong at low temporal frequencies but absent at high temporal frequencies (Shapley and Victor, 1978
; Sclar, 1987
; Mante et al., 2004
). (5) The response to a test stimulus is reduced by superposition of a mask stimulus ["masking" (Freeman et al., 2002
; Bonin et al., 2005
)].

View larger version (48K):
[in this window]
[in a new window]
|
Figure 3. Predicting responses of LGN neurons to complex video sequences. A, Firing rate responses of an LGN neuron to a drifting grating whose mean luminance is suddenly increased from 32 to 56 cd/m2 (while contrast is kept constant). Red dashed traces indicate the prediction of the linear receptive field fitted to the response before the luminance step, and black solid traces indicate the average response after the step. B, Same, for a stimulus whose contrast suddenly steps from 31 to 100% (while mean luminance is kept constant). C, Responses of an LGN neuron to a sequence from Walt Disney's Tarzan. Red dashed traces indicate the prediction of the linear receptive field alone (measured at optimal luminance and contrast). Black solid traces indicate prediction of a nonlinear model. In the nonlinear model, the gain and integration time of the receptive field are regulated by luminance gain control and contrast gain control. D, Same, for responses to a Cat-cam movie (Kayser et al., 2003 ; Betsch et al., 2004 ). In all panels, calibration is 100 ms and 100 spikes/s, and gray histograms are firing rates obtained by convolving the spike trains with a Gaussian window of width 5 ms (SD). A and B are modified from Mante et al. (2005b). C and D are modified from Mante (2005 ).
|
|
The nonlinear model predicts the responses to complex, natural stimuli better than the linear receptive field alone (Mante, 2005
). For example, the gray histograms in Figure 3, C and D, represent the firing rate of an LGN neuron in response to two complex stimuli: movies taken from the head of a cat roaming through a forest [Cat-cam (Kayser et al., 2003
; Betsch et al., 2004
)] and segments of cartoons (Walt Disney's Tarzan). The linear receptive field, which has the same temporal weighting function throughout the movies, predicts the basic features of the response but not the details (red curves). In particular, it captures the timing of the responses but not their amplitude. The predictions of the nonlinear model, in which gain and integration time are adjusted dynamically, are more accurate than those of the linear receptive field (black curves). Because of luminance gain control, the predictions of the nonlinear model tend to be higher than those of the linear model during the dark Tarzan movie and lower during the bright Cat-cam movie. Because of contrast gain control, the two models make different predictions about the relative magnitude of the responses within a movie.
The comparison between the simpler linear model and the more complex nonlinear model is fair because the models were given the same number of free parameters (two: spontaneous firing rate and maximal firing rate). The remaining parameters were estimated from the responses to gratings and then fixed in the predictions of the responses to complex stimuli. Fixing the parameters in advance is necessary to compare the predictions on an equal footing: the more complex nonlinear model does not necessarily have to predict the data better than the simpler linear model. In fact, given the complexity of the nonlinear model, it would have been difficult to estimate its parameters directly from the responses to complex stimuli. This approach might be useful to characterize also later stages of visual processing, in which neurons exhibit progressively more nonlinear properties.
Even the nonlinear model, however, fails to capture some features in the responses. In particular, the measured responses tend to be more transient than the predicted responses. One factor contributing to the transient responses could be the mechanisms generating bursts of actions potentials: after a hyperpolarization lasting 100 ms or longer, LGN neurons are likely to respond to a depolarization with a burst of action potentials that is not predicted by simple rectification of the membrane potential (for review, see Sherman, 2001
). Bursts are a prominent feature of LGN responses in anesthetized or sleeping animals but less so in awake animals (Guido and Weyand, 1995
; Ramcharan et al., 2005
). Another factor contributing to the transient responses lies in the spike generation mechanisms: firing rates are more transient than predicted by a simple rectification of the membrane potential (see above, Understanding the retinal output). Both mechanisms could be easily incorporated into the nonlinear model (Mukherjee and Kaplan, 1995
; Smith et al., 2000
; Keat et al., 2001
; Lesica and Stanley, 2004
; Paninski et al., 2004
). Finally, the nonlinear model might be easily extended to capture also the nonlinear spatial properties of Y-cells. In fact, at least in retina, the output of Y-cells can be thought of as the sum of a pool of nonlinear subunits, similar to the one driving contrast gain control (Hochstein and Shapley, 1976
; Victor and Shapley, 1979
; Enroth-Cugell and Freeman, 1987
; Demb et al., 2001
).
In summary, there is now a fairly good understanding of the linear and nonlinear components required to model responses of the broad majority of LGN neurons. Many nonlinear properties of LGN neurons can be captured by a single model that is general enough to operate on arbitrary stimuli that vary in both space and time. This model will be a useful tool to explore the effects of gain control during natural vision. Once extended with bursting and spiking mechanisms, it promises to provide a tractable description of the responses of LGN neurons and thus, of the input to primary visual cortex.
Understanding V1 simple cells
Simple-cell receptive fields were first described in area V1 of the cat by Hubel and Wiesel (1959
), who defined them as follows: "... these fields were termed `simple' because like retinal and geniculate fields (1) they were subdivided into distinct excitatory and inhibitory regions; (2) there was summation within the separate excitatory and inhibitory parts; (3) there was antagonism between excitatory and inhibitory regions; and (4) it was possible to predict responses to stationary or moving spots of various shapes from a map of the excitatory and inhibitory areas." (Hubel and Wiesel, 1962
).
If a neuron failed any part of this four-part definition (particularly point 1), then it would be termed a "complex cell." These definitions were qualitative; many subsequent studies have enquired whether successful quantitative definitions are possible.
Point 4 is crucial to the definition: can a straightforward receptive-field map predict how the neuron responds to other visual stimuli? We must first acknowledge that predicting responses to time-varying stimuli (e.g., moving ones) requires knowledge of the time courses of responses in different parts of the receptive field. As explained above in Background, a static receptive field should be replaced by a spatiotemporal receptive field map (McLean and Palmer, 1989
; Reid et al., 1991
; DeAngelis et al., 1993a
), which documents differences in response time course (impulse or step response) in different parts of the field (Movshon et al., 1978a
; Dean and Tolhurst, 1986
). The essence of prediction is the same, but, strictly, the field and stimuli should be considered as functions of time as well as functions of space.
Those who follow Hubel and Wiesel's definitions of simple and complex cells generally find the two classes of neuron in approximately equal numbers in V1. The clear definition of a simple cell has been massively influential in visual science because it offers the promise that, from relatively simple experiments, we may understand how approximately half of the neurons in V1 would respond in more complex situations, such as viewing of natural scenes. The definition says essentially that spatiotemporal summation in simple cells is linear: only very simple arithmetic is needed to calculate how a given simple cell will respond to some arbitrary stimulus. Such is the starting point for modeling human psychophysical experiments (Watson, 1987
) or for hypothesizing how natural information may be most efficiently coded in V1 (Willmore and Tolhurst, 2001
). The simple-cell definition offers so much that we are reluctant to ask whether it really works. This section asks whether simple experiments on simple cells really do allow quantitative predictions about the responses to other, more complicated stimuli.
Movshon et al. (1978a
,b
) examined the linearity of spatial summation in simple and complex cells in cat V1, followed by Andrews and Pollen (1979
). These studies compared spatial receptive-field maps with the tuning for sinusoidal gratings and found that important aspects of summation in simple cells were, indeed, linear when tested quantitatively. Later studies have convincingly shown that the spatiotemporal receptive field of a simple cell precisely predicts the optimal orientation, spatial frequency, and temporal frequency of sinusoidal gratings of the neuron (Movshon et al., 1978a
; Jones and Palmer, 1987
; Tadmor and Tolhurst, 1989
; DeAngelis et al., 1993b
; Gardner et al., 1999
). However, even the 1970s experiments noted nonlinearities in simple-cell behavior. A saving device was to consider the simple cell as a black box with two stages: a linearly summating stage, followed by one or more nonlinearities that do not affect the underlying initial linear sum (Fig. 1B).
Although the spatiotemporal receptive field of a cell predicts the cell's optimal stimulus well, it poorly predicts the relative magnitude of responses to nonoptimal stimuli (e.g., it overestimate the bandwidths of orientation and frequency tuning curves) (Jones and Palmer, 1987
; Tadmor and Tolhurst, 1989
; DeAngelis et al., 1993b
; Gardner et al., 1999
). Linear models also fail to explain the relative magnitudes of response in the two directions of movement orthogonal to the preferred orientation of the neuron (Albrecht and Geisler, 1991
; Reid et al., 1991
). These failures of the linear model have an easy rationalization in the two-stage model (Fig. 1B). Most experiments record neuronal response extracellularly as trains of action potentials, but the membrane potential changes of V1 neurons must exceed a threshold before spiking activity is evident (Carandini and Ferster, 2000
). Many failures of the simple linear model can be accounted for arithmetically, by supposing that a simple linear sum is transformed at the output of the neuron by passage through a nonlinear transducer function; this may simply have a threshold nonlinearity or might be a sigmoidal function of stimulus contrast (Schumer and Movshon, 1984
; Tolhurst and Dean, 1987
, 1991
; Tadmor and Tolhurst, 1989
; Albrecht and Geisler, 1991
; DeAngelis et al., 1993b
). Indeed, intracellular recordings in simple cells (which presumably show the black box before any nonlinear output transform) do suggest that the strength of directional selectivity and the orientation tuning bandwidth can be described by a linear model in the first stage of the black box (Jagadeesh et al., 1993
; Lampl et al., 2001
).
Post hoc application of a nonlinear transducer to the linear prediction may work arithmetically, but there are inconsistencies between experiments (Tolhurst and Heeger, 1997b
). Furthermore, there are other nonlinear behaviors that cannot be explained in such a way. Notable nonlinearities (shared with complex cells) are response saturation at high contrasts (Albrecht and Hamilton, 1982
), and "nonspecific suppression" (Bonds, 1989
; DeAngelis et al., 1992
; Tolhurst and Heeger, 1997b
) in which the response of a simple cell to its optimal stimulus is suppressed by simultaneously presenting stimuli that evoke no overt response when presented alone. Heeger (1992a
,b
) proposed a neuronal circuit that embraces these and many other nonlinear behaviors of simple cells: essentially, each simple cell performs a first-stage linear sum of its spatiotemporal inputs, "half-squares" that linear sum giving an energy response (half-squaring achieves much the same as a threshold nonlinearity), and is then subject to divisive inhibition from all other neurons whose receptive fields cover the same part of visual field. The divisive inhibition gives rise to "contrast normalization." Application of this model (Heeger, 1993
; Tolhurst and Heeger, 1997a
) resolves subtle failures (Reid et al., 1991
; Tolhurst and Dean, 1991
) in the predictions of the relative magnitudes of response to moving and stationary-modulated gratings, which cannot be resolved by simply running a linear response sum through a nonlinear output transducer.
Elaborations of the contrast-normalization model (Carandini and Heeger, 1994
; Carandini et al., 1997
) have embraced additional nonlinear behaviors (previously unaccounted), such as "phase advance" at high contrasts (Dean and Tolhurst, 1986
). The contrast-normalization model has been influential in psychophysical modeling (Watson and Solomon, 1997
) as well as in understanding the details of simple-cell (and complex-cell) responses; it is ironic that its proponents now suggest a very different neurophysiological mechanism (Carandini et al., 2002
; Freeman et al., 2002
), although the arithmetic remains more or less the same.
Nonspecific suppression results from stimuli within the receptive field. There is another nonlinearity, sometimes confused with it: stimuli outside the "classical receptive field" of a simple cell may also suppress or facilitate its responses to its preferred stimuli, as first described by Blakemore and Tobin (1972
) and Maffei and Fiorentini (1976
). It is growing clearer that different mechanisms of suppression are involved within the classical receptive field and outside (Sengpiel et al., 1998
; Freeman et al., 2002
; Li et al., 2005
; Sengpiel and Vorobyov, 2005
), but we do not yet have simple arithmetic rules to describe these nonlinearities. Suppression or facilitation from outside the classical receptive field may result from local connections within V1 or from feedback from more anterior visual areas, perhaps subserving selective attention or perceptual grouping. There is a large literature on this topic that is beyond the present scope (for review, see Fitzpatrick, 2000
; Freeman et al., 2001
; Chisum and Fitzpatrick, 2004
).

View larger version (53K):
[in this window]
[in a new window]
|
Figure 4. Predicting responses of V1 simple cells to complex images. A, The receptive field of a ferret simple cell (see Fig. 1 B). The dashed lines outline the field for comparison in B and C. B, The digitized photograph that evoked the largest OFF response in this simple cell. C, A Gabor patch that approximately matches the disposition of the actual receptive field in A. D, The ordinate plots the actual responses of the simple cell to 500 different photographs (average of 10 presentations each, measured in 100 ms bins). Positive values are ON responses: spikes generated during the 100 ms presentation. Negative values are OFF responses: spikes generated on removal of the photograph. The abscissa shows the responses predicted on a totally linear model, which includes solely the receptive field shown inC and not the nonlinearity of the output (see Fig. 1 B).
|
|
Thus, many nonlinearities of simple-cell behavior are not evident from the receptive-field structure, but they can be accommodated neatly into the two-stage black box: the linearly summing first stage is followed by half-squaring and contrast normalization. The arithmetic is fairly easy and it works well. However, how important are all of these nonlinearities in the overall behavior of the neurons? Smyth et al. (2003
) examined how simple cells in anesthetized, paralyzed ferret V1 respond to 100 ms flashes of digitized photographs of natural scenes to understand how neurons might respond under natural vision. Others have also sought to understand how the receptive field structure or grating responses of simple cells relate to their responses to complex, natural scene stimuli (Ringach et al., 2002
; Vinje and Gallant, 2002
; Weliky et al., 2003
; David et al., 2004
). Figure 4A shows the spatial receptive field of one simple cell recorded by Smyth et al. (2003
), mapped with small bright and dark squares. According to point 4 of the simple-cell definition, we expect this neuron to respond particularly well to a bright-dark border, slightly off horizontal and toward the top right of the stimulus area. Figure 4B shows the photograph that elicited (by far) the most activity from this neuron. There is, indeed, just the border predicted, although its polarity is reversed compared with the ON and OFF regions of the field: the photograph evoked strong OFF responses. Figure 4C shows a Gabor function model of the receptive field (Field and Tolhurst, 1986
; Jones and Palmer, 1987
), fitted by eye. It was used to estimate how a totally linear field might respond to the 500 photographs presented; no nonlinearities are modeled in the calculation. Figure 4D plots the actual response of the simple cell to the photographs against the responses predicted from the stylized linearly summating field. The linear model conveys the gist of the actual responses (r = 0.73); there are no astonishing outliers. In truth, the simple cell of Figure 4 is the one whose responses to natural scenes were best predicted by linear modeling (Smyth et al., 2003
). However, the results for this and other simple cells suggest that, although output nonlinearities may reduce response magnitudes below the linear prediction, there is little evidence here that nonlinear effects could fundamentally alter the basic "trigger features" for activating a simple cell.
It is important also to recognize that simple cells are heterogeneous; some simple cells may differ little from complex cells (Dean and Tolhurst, 1983
; Mechler and Ringach, 2002
). Movshon et al. (1978a
) described some "nonlinear simple cells" in which the ON and OFF receptive-field regions overlap; Dean and Tolhurst (1983
) found that receptive-field structure was continuously graded from simple cells exactly fitting the Hubel and Wiesel definition, through nonlinear simple cells and "discrete complex cells" to frank complex cells (Priebe et al., 2004
; Mata and Ringach, 2005
). Indeed, simple and complex cells may not form a dichotomy at all. Of course, this is not to say that all cells are the same; for instance, it is clearly understood that cells at the "simple end" of the continuum are found in different cortical layers than those at the "complex end" (Martinez et al., 2005
). Receptive-field mapping techniques typically subtract the responses, say, to dark stimuli from those to bright stimuli so that the receptive field seems to be single valued at each point. The resulting linear receptive field is an incomplete reflection of the overall responses of the neuron. For the varied population of simple cells, it is unclear what proportion of response is dependent only on the idealized linear receptive field and what has been obscured by ignoring the inherent nonlinearities of summation.
The two-stage model of Figure 1B is inaccurate; its simplicity and convenience may be misleading. Geniculate inputs are inherently nonlinear and may be subject to depression leading to the nonspecific suppression noted above (Carandini et al., 2002
). Nonlinear inputs would result in inherent nonlinearities of simple-cell summation unless, say, there were "push-pull" inhibition (Glezer et al., 1982
). The role of such inhibition has been explored further (Palmer and Davis, 1981
; Tolhurst and Dean, 1987
; Ferster, 1988
; Tolhurst and Dean, 1990
; Hirsch et al., 1998
). In particular, Tolhurst and Dean (1987
, 1990
) and Atick and Redlich (1990
) proposed that breakdown of push-pull inhibition underlies the appearance of nonlinearities of spatial summation even in the first stage of the two-stage model (Fig. 5). Indeed, all that may distinguish many complex cells from simple cells might just be the strength of the inhibitory signals that mask inherently nonlinear summation (Wielaard et al., 2001
; Mechler and Ringach, 2002
).

View larger version (14K):
[in this window]
[in a new window]
|
Figure 5. Toward a complete model of V1 simple cells. A popular conception of simple-cell response behavior involves a first stage that shows strictly linear spatiotemporal summation, and a subsequent stage may be subject to a variety of nonlinear phenomena, which do not impinge on the fundamental linearity of the first stage (see Fig. 1 B). This schematic shows a revision of this convenient model, which includes a number of nonlinear mechanisms. Some of these mechanisms (those depicted as affecting the output nonlinearity) spare the fundamental linearity of summation. The remaining ones, however, cause nonlinear summation, which differs only in degree from the obvious nonlinear behavior of complex cells.
|
|
Literal adherence to point 4 of the definition of a simple cell means that any significant failure of prediction would require the neuron to be reclassified as a complex cell. Thus, we would be bound to understand simple cells; any problematic neuron must be a complex cell. Failure of the linear receptive field model is not a problem for neurophysiologists, but it is for those computational modelers who would like all neurons in the visual cortex to be described in a few lines of elegant code with their receptive-field parameters neatly spaced along a theorist's dimensions. We need not confuse failure of the attractive linear receptive field model of the simple cell with failure to understand the visual cortex as a whole (compare with below, What we don't know about V1). Many of the bolt-on nonlinearities of simple cell behavior can be parameterized coherently; the failures of push-pull are an irritation to modelers, but there is no need to believe that they portend any dramatic change in neuronal behavior over the linear model, and, most significantly, much progress has been made in understanding how (frankly nonlinear) complex cells respond to naturalistic stimuli (compare with below, Understanding V1 complex cells).
Understanding V1 complex cells
Although the orientation and spatial frequency selectivity of each simple cell is directly related to the spatial profile of its receptive field, which consists of elongated ON and OFF subregions (see above, Understanding V1 simple cells), for complex cells this relationship is far less obvious. A complex cell usually exhibits mixed ON and OFF responses throughout its receptive field (Hubel and Wiesel, 1962
). For example, the response of the cell to a bar stimulus depends on both the orientation and width of the bar in a manner similar to simple cells, but the cell responds indiscriminately to light and dark bars, as long as the bar stands out from the gray background. Such insensitivity to contrast polarity is an important form of nonlinearity that renders spike-triggered average ineffective for measuring complex-cell receptive fields.
Significant progress in understanding complex-cell receptive fields was first made by measuring the nonlinear interactions between a pair of bars at the preferred orientation of the cell (Movshon et al., 1978b
; Emerson et al., 1987
). These studies have revealed the existence of "subunits" of the complex-cell receptive field, whose spatial structure can predict the frequency tuning of the cell. More recently, an alternative method has been used to characterize complex-cell receptive fields that uses large ensembles (tens of thousands) of complex visual stimuli, such as white noise or natural images, and a spike-triggered covariance (STC) analysis to receptive-field estimation.

View larger version (22K):
[in this window]
[in a new window]
|
Figure 6. Models of V1 complex cells recovered from a covariance analysis. A, Experimental protocol. Top, A segment of the natural image ensemble; white box indicates area shown in experiments. Bottom, Spike train. Spike-triggered ensemble was generated by collecting the image preceding each spike by a single frame (42 ms). B, Two significant eigenvectors of a complex cell. Scale bar, 2°. Solid line, Spatial profiles of each eigenvector along the axis perpendicular to the preferred orientation. Dashed line, Gabor fit. The Gabor fits of the two eigenvectors had a phase difference of 85°. C, Contrast-response functions of the two eigenvectors. Average firing rate is plotted against the contrast of each eigenvector shown in B. Error bar indicates ±SEM. Dashed lines, Fits of the data with the function r(x) = x + r0, where r is the firing rate, x is contrast, and , , and r0 are free parameters. D, E, Prediction of cortical responses to natural images. Correlation coefficients between the predicted and measured responses based on the eigenvectors were plotted again those based on the linear receptive field (D) and against the estimated upper bound (E). Each symbol represents one complex cell.
|
|
As a first step in STC, the stimulus preceding each recorded spike is collected to form the spike-triggered stimulus ensemble, just like in spike-triggered average. Then, the covariance matrix, instead of the mean, of this spike-triggered ensemble is computed. Eigenvectors of this matrix with "significant eigenvalues" (those significantly different from the control eigenvalues calculated based on random spike trains) represent visual features that directly affect the neuronal response. This method has been used effectively to analyze the nonlinear response properties of fly visual neurons (De Ruyter Van Steveninck and Bialek, 1988
; Brenner et al., 2000
) and the receptive fields of mammalian V1 complex cells, with either random-bar stimuli aligned to the preferred orientation of the cell (Touryan et al., 2002![Go]()