Abstract
We studied the changes that neuronal receptive field (RF) models undergo when the statistics of the stimulus are changed from those of white Gaussian noise (WGN) to those of natural scenes (NSs), by fitting the models to multielectrode data recorded from primary visual cortex (V1) of female cats. This allowed the estimation of both a cascade of linear filters on the stimulus, as well as the static nonlinearities that map the output of the filters to the neuronal spike rates. We found that cells respond differently to these two classes of stimuli, with mostly higher spike rates and shorter response latencies to NSs than to WGN. The most striking finding was that NSs resulted in RFs that had additional uncovered filters compared with WGN. This finding was not an artifact of the higher spike rates observed for NSs relative to WGN, but rather was related to a change in coding. Our results reveal a greater extent of nonlinear processing in V1 neurons when stimulated using NSs compared with WGN. Our findings indicate the existence of nonlinear mechanisms that endow V1 neurons with context-dependent transmission of visual information.
SIGNIFICANCE STATEMENT This study addresses a fundamental question about the concept of the receptive field (RF): does the encoding of information depend on the context or statistical regularities of the stimulus type? We applied state-of-the-art RF modeling techniques to data collected from multielectrode recordings from cat visual cortex in response to two statistically distinct stimulus types: white Gaussian noise and natural scenes. We find significant differences between the RFs that emerge from our data-driven modeling. Natural scenes result in far more complex RFs that combine multiple features in the visual input. Our findings reveal that different regimes or modes of operation are at work in visual cortical processing depending on the information present in the visual input. The complexity of V1 neural coding appears to be dependent on the complexity of the stimulus. We believe this new finding will have interesting implications for our understanding of the efficient transmission of information in sensory systems, which is an integral assumption of many computational theories (e.g., efficient and predictive coding of sensory processing in the brain).
- adaptation
- data-driven modeling
- primary visual cortex
- receptive field
- stimulus statistics
- visual information processing
Introduction
Our understanding of sensory coding in the visual system is largely based on the stimulus–response characterization of neurons. Traditionally, a basic set of stimuli (e.g., bars or gratings) was used to parameterize neuronal responses in terms of a restricted choice of stimulus parameters [e.g., orientation, spatial frequency (SF)]. This analysis allowed the measurement of tuning functions. However, such techniques provide only partial understanding of the neuronal response function and are particularly limited when the processing is nonlinear.
Later methodological improvements enabled the recorded responses of neurons to be characterized using statistically richer stimuli (Van Steveninck and Bialek, 1988; Schwartz et al., 2001; Ringach et al., 2002; David et al., 2004; Sharpee et al., 2004; Touryan et al., 2005; Chen et al., 2007). Recently, appropriate mathematical tools for a comprehensive neuronal characterization under arbitrary stimulus regimes have emerged (Rapela et al., 2010; Fitzgerald et al., 2011; Williamson et al., 2015; Liu et al., 2016). A prime example is the probabilistic framework wherein model estimation is performed by maximizing the likelihood of the model given the recorded responses and stimuli (Kouh and Sharpee, 2009; Park and Pillow, 2011; Park et al., 2013). An example of this framework, which we have adopted, is the nonlinear input model (NIM; McFarland et al., 2013; Almasi et al., 2020). In this case, characterization is achieved by estimating the parameters of a receptive field (RF) model. Typically, the model first applies a cascade of linear filters on the stimulus. In the second stage, static nonlinearities map the output of the linear filters to neuronal spike rates.
How do these characterizations depend on the choice of the stimulus? To answer this question, first it is essential to control for artifactual fits, because the fitting method may extract the statistical regularity inherent in the stimulus rather than the stimulus–response relationship. This can be overcome by using maximum likelihood estimation as it provides an unbiased and consistent estimate. Second, it is possible that the system adapts to the stimulus statistics such that the stimulus–response relationship is altered. In this case, it is the system that changes over some finite period of time because of the ongoing relative presence of different types of features. Third, stimuli with different statistics may simply allow us to sample different modes of operation of a neural circuit (Butts, 2019). For example, some modes of operation are effectively unobservable if the stimuli that drive these modes are not included in the set. In this case, including the missing stimuli would immediately engage the new mode of the circuit, revealing new types of processing of these stimuli, while leaving the processing of the previous stimuli, via other modes, unaltered. Here, our experiments aim to explore the last two points.
We studied the changes that the cortical RFs undergo when presented with different image statistics. We applied the NIM to the recordings of single cells in cat primary visual cortex (V1; Fig. 1a). Cells were stimulated with two types of stimuli with distinct statistical properties: white Gaussian noise (WGN) and natural scenes (NSs), with the same global root mean square (rms) contrast. WGN has a Gaussian distribution of contrasts, with a heavy overemphasis on low contrasts. NSs have more high contrasts and tend to be sparser. The NIM makes minimal assumptions about the kind of underlying neuronal processing and fits the RF filters as well as nonlinearities that neurons use to pool the output of their filters (McFarland et al., 2013; Almasi et al., 2020). We estimated for each cell the spatial filters constituting the neuronal RF and their corresponding nonlinearities. The number of spatial filters for each cell is determined by a validation technique over a test dataset.
We found that V1 cells respond differently to the two stimulus types, with mostly higher spike rates and shorter response latencies to NSs than to WGN. Responses of V1 cells to NSs revealed the presence of more RF filters compared with WGN. This difference was not related to the higher spike rates of cells to NSs. Instead, we found that specific feature-contrasts attain much higher values in NSs compared with WGN and are believed to be responsible for the differences in the number of RF filters. Our findings imply that V1 cells adapt to the statistics of the visual stimulus to improve their coding efficiency, which enhances their capacity for information transmission.
Materials and Methods
Experimental design
Preparation and surgery.
Extracellular recordings were made primarily from area 17 in cat cortex, but some recording locations were on the border with area 18, making unequivocal identification difficult. As both areas are retinorecipient, we refer to our recordings as being from V1. Recordings were made from V1 in six anesthetized cats using methods described previously (Meffin et al., 2015; Almasi et al., 2020; Sun et al., 2021). Experiments were conducted according to the National Health and Medical Research Council Australian Code of Practice for the Care and Use of Animals for Scientific Purposes. All experimental procedures were approved by the Animal Care Ethics Committee at the University of Melbourne (ethics ID 1413312).
Anesthesia was induced in three adult female cats (2–6 kg) with an intramuscular injection of ketamine hydrochloride (20 mg/kg, i.m.) and xylazine (1 mg/kg). The cats were intubated, cannulated, and placed in a stereotaxic frame. Once intubated, oxygen and isoflurane (1–2%) were used to maintain deep anesthesia during all surgical procedures. A craniotomy was performed to expose cortical areas 17 and 18. Isoflurane was used during the surgery because it is safe for humans. Anesthesia was switched to gaseous halothane in a fully closed system during data recording (0.5–0.7%), and the depth of anesthesia was determined by monitoring a variety of standard indicators (Sun et al., 2021). Halothane was used during recordings because it has been shown to maintain anesthesia but have a less suppressive impact on cortical responses (Villeneuve and Casanova, 2003). To avoid eye movements during recordings, muscular blockade was induced and maintained with an intravenous infusion of vecuronium bromide at a rate of 0.1 mg/kg/h. Mechanical ventilation was used to maintain end-tidal CO2 between 3.5% and 4.5%. After an experiment, the animal was humanely killed without regaining consciousness with an intravenous injection of an overdose of barbiturate (pentobarbital sodium, 150 mg/kg). Animals were then perfused immediately through the left ventricle of the heart with 0.9% saline followed by 10% formol saline, and the brain extracted.
Visual stimuli and data recording.
Visual stimuli were generated using a ViSaGe visual stimulus generator (Cambridge Research System) on a calibrated, γ-corrected LCD monitor (1920 × 1080 pixels; refresh rate, 60 Hz; response time, 1 ms; model VG248QE, ASUS) at a viewing distance of 57 cm. WGN and NS stimuli comprising 90 × 90 pixels over 30° of the visual field were used to estimate the neuronal receptive fields of V1 cells using the NIM framework (see Materials and Methods below). The WGN and NS images used in the stimuli had a mean value equal to the mid-luminance of the display monitor. The WGN images had an SD chosen to result in a 10% saturation rate for individual pixels (i.e., the mean had a normalized intensity of 0.5), and 10% of pixels had a value of either 0 or 1, corresponding to the lowest and highest luminance of the monitor. The NS stimuli comprised 90 × 90 pixels and were randomly extracted image patches from a database of natural images (Van Hateren and Van der Schaaf, 1998). Each NS stimulus block contained patches that were drawn from 100 randomly chosen images in the database, with each image 1536 × 1024 pixels. Both WGN and NS stimuli had their global rms contrast matched, which was set to ∼0.3. WGN and NS stimuli were presented in separate blocks of 12,000 images and with blocks interleaved to ensure the physiological comparability of the recordings. Each image frame was presented for 1/30 s, followed by a blank screen of mean luminance (intensity, 0.5), displayed for the same duration in blocks of 12,000. The blank period aimed to increase the overall response of the cell to the stimuli by increasing the temporal contrast. The total duration of a block was ∼14 min.
Extracellular recordings were made with single shank probes with iridium electrodes (linear 32-electrode arrays, 6 mm length, 100 µm electrode site spacing; NeuroNexus), which were inserted vertically using a piezoelectric drive (Burleigh inchworm and 6000 controller, Burleigh Instruments). Extracellular signals were acquired from 32 channels simultaneously using a CerePlex acquisition system and Central software (Blackrock Microsystems) sampled at 30 kHz and 16 bit resolution on each channel. Filtering was performed by postprocessing.
Postprocessing and spike sorting.
Spike sorting of recordings was performed using KiloSort (Pachitariu et al., 2016) and the graphical user interface phy (Rossant et al., 2016). Single units were identified as previously described in the study by Almasi et al. (2020).
Statistical analyses
Model definition and parameters estimation.
We have used an adapted version (Almasi et al., 2020) of the NIM, originally introduced by McFarland et al. (2013). The model is depicted in Figure 1a and describes the firing rate of the cell as a nonlinear function of the input stimulus, as follows:
The model assumes that the responses
Significance test to determine the number of RF filters.
We determined the number of spatial filters within the receptive field of each cell using cross-validation. In doing so, the number of filters for each cell was systematically varied while the statistical significance of each filter was evaluated by bootstrapping. For this, we divided the data into a training set, which comprised four-fifths of the data, and a test set, which comprised the other one-fifth of the data. For each specified number of filters, we used the training set to estimate the filters, and then assessed the performance of the model by computing its log-likelihood using resampling from the test set (this was repeated 500 times). Thus, for each number of filters, we found a distribution for the log-likelihood computed on the test set. The optimal number of RF filters was found as the combination of filters in the model that gave the significantly highest log-likelihood on the test set (z score > 2; Hastie et al., 2008). We rejected adding additional filters that did not lead to a significant improvement in the log-likelihood for two reasons. First, including any filter that led to a greater log-likelihood value without checking whether this was significant could result in adding a filter that gave a greater log-likelihood by chance. Second, we found that beyond a point, additional filters tended to lie in the same subspace (up to noise) as the model with fewer filters. This typically coincided with the point at which adding more filters did not lead to significant improvements in the log-likelihood. Technical details of implementing this test are given in the study by Almasi et al. (2020).
Estimation of response latency and strength.
The latency in the response to the stimulus was obtained by fitting a von Mises distribution defined as follows:
For the response latency analysis, we defined the latency
As described previously in the Model definition and parameters estimation section, the response of V1 cells to each image was obtained by counting the number of spikes that occurred during a window of the same duration as the image was presented on the screen and to the cell (i.e., T/2 = 1/30 s). Knowing when the maximum response occurred, and assuming that the response curve is symmetric around its maximum (as it is in a von Mises distribution), we aimed to collect the maximal number of spikes for model estimation that fell within a 1/30 s window after stimulus onset. This can be done by collecting all spikes that fall within 1/60 s of the maximum response. Therefore, only for the purpose of RF estimation, we obtained the responses of cells by (1) finding when the maximum response occurred in the response PSTH, which is equivalent to t0 in the von Mises function, and (2) go back and forth T/4 = 1/60 s in time and collect spikes within [t0 – T/4, t0 + T/4].
The parameters of the von Mises fit were obtained by minimizing the mean squared error between the fitted and actual response PSTHs of each stimulus recording. For each ∼14 min block of WGN or NS stimuli, a separate PSTH fit was performed. The quality of the fitted curves was assessed using an
Feature-contrast.
The output of the linear filtering stage in the model indicates the similarity between the visual input and the spatial structure of the filter. The authors have previously demonstrated that the output of the filter can be described in terms of rms or Michelson contrast of the spatial structure (the feature) of that filter embedded in the visual stimulus, hence defined as the feature-contrast (Almasi et al., 2020). Furthermore, the feature-contrast corresponding to the RF filter of a cell can be interpreted as the local contrast of the stimulus when projected onto that filter.
Feature-contrast range.
Feature-contrast is defined as the output of the spatial RF filters. If the cell has only one RF filter uncovered by WGN, the feature-contrast of RF filters on WGN follows a univariate normal distribution with an SD of
To find the distribution of the feature-contrast for NS RF filters when using NS stimuli, we performed the same procedure that was explained above. After obtaining the canonical (decorrelated) distribution of NS feature-contrast, we found NS stimuli that lie inside a ball (or hyperball) with a radius equal to
Neuron feature space.
The RF filters of a neuron identified using the NIM framework will span a space that is termed the feature space of a neuron (Fig. 1b). Mathematically, this feature space is equal to the column space (H) of the matrix collecting all the RF filters as its columns
Results
We studied the spatial structure of V1 receptive fields in anesthetized cats identified using both WGN and NSs as stimuli. The recorded neurons were visually stimulated with interleaved blocks of WGN and NSs to increase the biological comparability of the recordings made with both stimuli (six blocks of 12,000 stimuli per block for each stimulus type). The spatial RFs were uncovered using the NIM framework by maximizing the log-likelihood of the model given the pairs of stimuli and responses, which was done over the set of all model parameters simultaneously. The temporal RF of cells was left out in the modeling framework as (1) we wanted to investigate the changes that are brought about in the visual feature selectivity of cells, and (2) the used stimuli did not have enough temporal correlation to allow decent estimation of the temporal RF.
Differences between the V1 response profiles to WGN and NSs
Although we did not study here the temporal aspects of the neuronal RFs, we observed major differences in the way that neurons responded to WGN and NS stimuli in terms of their response strength and latency (Fig. 2a). Generally, V1 cells responded more strongly to NS stimuli (∼78% higher spike rates) than to WGN (Fig. 2b; mean ± SD: 7.8 ± 4.9 vs 4.4 ± 3.3 ips). Furthermore, almost all cells showed longer response latencies to WGN than to NS stimuli, with an average difference of ∼14.2 ± 6.0 ms (Fig. 2c; mean ± SD: WGN = 28.2 ± 6.3 ms vs NSs = 13.9 ± 4.7 ms).
V1 RFs unveiled by NSs are typically higher dimensional compared with those with WGN
We successfully uncovered spatial RFs for 92 orientation-selective V1 cells using either NSs or WGN. Among these 92 cells, 87 cells had RFs uncovered using NSs, 58 had RFs uncovered using WGN, and 53 cells had RFs uncovered successfully using both stimulus types (of 70 cells that were driven by both stimulus types). In addition to the 92 cells with oriented RFs, we also uncovered RFs for 58 other cells whose RFs were nonoriented. According to a recent study (Sun et al., 2021), these cells might be of thalamic origin and whose axons terminate in V1. Hence, since we were not certain about the cortical origin of these additional 58 cells with nonoriented RFs, we excluded them from our RF analysis.
Figure 3a presents the spatial RFs of an example V1 cell uncovered using the NIM framework under both NSs and WGN. The most striking difference between these two model fits emerges in the number of spatial filters. Using the same number of stimulus images, the RF characterization using WGN and NSs identified one versus three spatial filters, respectively. This was typical in our population of V1 cells, with a majority of cells (78%) having more spatial filters identified using NSs than WGN (Fig. 3b,c). On the contrary, only a small fraction (8%) of cells had more uncovered spatial RF filters using WGN than NSs. The distribution of the difference between the number of uncovered spatial filters using WGN and NSs (no. of NS filters – no. of WGN filters) for the same cells varied from −1 to 4 (Fig. 3c, inset) but is asymmetric and heavily skewed to positive values. We investigate possible causes for having identified more spatial filters for RFs using NSs than WGN in the following sections.
Comparison of the feature spaces uncovered with WGN and NSs
How does the identified spatial feature sensitivity of each cell depend on the type of stimulus (WGN or NSs) used to estimate the model? For a given model, the set of spatial features to which the cell is sensitive is determined by its RF filters. The diagram in Figure 1a represents the RF model of a cell with multiple spatial filters. This cell is sensitive to any feature in an image corresponding to one of its RF filters, but also any feature that is a linear weighted sum of its RF filters. This can be understood because such a summed feature would drive each RF filter and hence neural response. The set containing all possible linear weighted sums (i.e., linear combinations) of the RF filters is referred to as the feature space of the cell (see Materials and Methods). Figure 1b shows how we represent the feature space of the model cell. Different linear combinations of the features, corresponding to RF filters, are represented as distinct points in the feature space. (Note that, mathematically, the original RF filters could, in principle, be replaced by any other filters that span this space, provided that the nonlinear function that is applied to the subspace to predict spike rate remains unchanged. The new filters will be weighted sums of the original filters. In this sense, the choice of RF filters is not unique, but the space they span is.)
As most cells had at least as many RF filters uncovered using NSs compared with WGN, we investigated whether the WGN feature space (
Our visual inspection of the above components showed that in most cases the residual components of WGN RF filters, while nonzero, were noisy and had no meaningful structures (Fig. 4d). This suggests that they may not contribute significantly to model predictions. If this was so, then
Different dynamic range of V1 RF filters on NSs and WGN
The spatial filter's output quantifies the contrast level of the corresponding feature as it appears embedded in the WGN or NS stimuli, which is termed the feature-contrast (see Materials and Methods). It is a way of quantifying the contrast of those features within an image that drive a particular cell. While the overall rms contrast of the WGN and NSs was matched for our stimuli, the distribution of contrast of particular features could differ between WGN and NSs because of their inherent statistical structure. For those features to which cells were sensitive, we often noticed considerable differences between the levels of feature-contrast between WGN and NSs. This is evident from Figure 5a, in which graphs show the distributions of feature-contrast of a typical V1 RF filter (inset) for WGN (magenta) and NSs (green). The abscissa indicates the feature-contrast that is computed as the output of a normalized (unit norm) filter when applied on the stimuli. The presented distributions differ significantly in their spread. Defining a range of feature-contrast to span ±3 SDs (see Materials and Methods), the range of feature-contrast of the filter in Figure 5a on WGN and NS stimuli was measured as 1.7 and 5.9, respectively. It can be shown that these amounts correspond to ∼10% and ∼35% of the Michelson contrast of the feature.
The distribution of the range of feature-contrasts for the population of V1 RF filters uncovered using WGN and NSs is given in Figure 5b, which demonstrates significant differences between these two stimulus types across the V1 population. Here, feature-contrast is computed as the output of normalized (unit norm) RF filters when applied on the stimuli. Such a pronounced difference arises because of the nature of the stimuli in relation to the feature sensitivity of V1 cortical neurons. As stated before, the differences between WGN and NS stimuli are in terms of second-order and higher-order statistical dependencies between pixels, which are ubiquitous in NSs but do not exist in WGN. The higher-order statistics of NSs mainly account for structures like edges, curves, and contours in these images. Of course, many V1 cells are highly responsive to these features because they have Gabor-like-oriented RF filters. Accordingly, numerous occasions can occur in natural scenes in which a Gabor-like RF filter of a V1 cell is partially or highly matched with a feature in the scene (like the ones illustrated in Fig. 5c). This results in large values at the output stage of V1 RF filters, namely, the feature-contrast of the filter. This provides an intuitive explanation for the higher range of feature-contrasts of V1 RF filters on NSs than on WGN (Fig. 5b). Further, this considerable difference between the distributions of feature-contrast for V1 RF filters on WGN versus NSs implies that the dynamic range of RF filter outputs that drive V1 cells is significantly larger when operating on NSs than on WGN.
Difference in the feature-contrast response functions
We examined how the nonlinear response of each cell to feature-contrast depended on stimulus type (i.e., WGN vs NSs). In the case of cells with multiple spatial filters, this feature-contrast response function determines how the cell pools those corresponding features in the visual input to give a response. For cells with equal numbers of RF filters between WGN and NSs, we use the feature directions that are shared between the uncovered WGN and NS feature spaces to compare the response functions between the two stimulus regimes. The shared dimensions between the two feature spaces correspond to the WGN RF filters projected onto the NS feature subspace (i.e.,
For simplicity, we begin by presenting the comparison of the response functions of cells that have a single RF filter using both WGN and NSs. For these cells
Figure 7a–c presents the response functions for a cell that had multiple but equal feature dimensions on both WGN and NSs. Here, the comparison is similarly performed on the WGN-projected filter dimensions. Figure 7d–f shows the response functions for a cell that had more NS feature dimensions than WGN. In this case, the comparison is performed on the feature dimensions that were shared between the two feature spaces (WGN-projected filter dimensions; Fig. 7e) and the NS feature dimensions that were orthogonal to the shared dimensions (NS-orthogonal dimensions; Fig. 7f).
We quantified these changes using an index of normalized difference between area under the curve (nD-AUC) of the WGN and NS feature-contrast response functions (Fig. 8a). This index was calculated for the polarity of feature-contrast to which the cell showed the strongest response dependency. The difference between the area under the curve of WGN and NS response functions is depicted using the yellow highlighted area in Figure 8a, which indicates how different the WGN response function is from the NS response function within the WGN feature-contrast range (Σwgn). To obtain a normalized index (nD-AUC) that varies between −1 and 1, we divided the yellow highlighted area by the gray-shaded area, which is determined by the WGN feature-contrast range (Σwgn), the maximum response of the cell within this range to either WGN or NSs (Rmax), and the response of the cell at zero feature-contrast (R0). We computed this normalized index for the cells whose feature-contrast response functions featured a significant difference between WGN and NSs (unpaired t test; p < 0.001). Positive and negative values of nD-AUC indicate that the WGN response function sits either significantly above or below the NS response functions, respectively, within the WGN feature-contrast range. Across our population of V1 cells, 52 of 53 cells (98%) showed a positive value of nD-AUC, indicating that for them the NS feature-contrast response functions sit below the WGN response functions across the shared (projected) feature directions between NSs and WGN (Fig. 8b, blue bars). For only one cell, the change between the NS and WGN response functions was statistically indistinguishable (at p = 0.001 confidence level). For cells whose dimensionality of WGN feature space was less than that of NS feature space, we considered and compared the response nonlinearities on the shared dimensions as well as the NS feature dimensions that were orthogonal to the shared dimensions. In contrast to the shared feature dimension, for orthogonal feature dimensions the nD-AUC indices were negative (Fig. 8b, gray bar), which is related to the trivial (i.e., constant) dependency of WGN response functions. Most positive nD-AUC indices vary between 0.1 and 0.4, with a peak at 0.3. Most negative nD-AUC indices vary between −0.2 and −0.4.
Similar nonlinear phenomena in response functions have been previously reported in motion-sensitive cells in the visual system of a fly in response to statistical changes in the stimulus (Fairhall et al., 2001).
Explaining the observed changes between the two stimulus regimes
For most V1 cells, the uncovered RF using NS stimuli reveals a larger repertoire of feature sensitivity compared with the RF structure revealed with WGN (Figs. 3a, 4d). The larger repertoire can be an indication of more complex feature selectivity and also the capacity for invariance (Almasi et al., 2020).
Here, we postulate different hypotheses that might explain the observed differences between the identified RFs of V1 cells using NS and WGN stimulus regimes.
The observed changes are not artifactual nor methodological
One may suspect that the observed changes are methodologically related to our RF identification technique, because the fitting procedure may extract statistical regularities inherent in the stimulus, rather than the stimulus–response relationship. This has been long considered to be a strong possibility for some RF identification techniques such as spike-triggered average and covariance when used in conjunction with statistically rich stimuli such as NSs (Paninski, 2003; Sharpee et al., 2004; Schwartz et al., 2006; Kouh and Sharpee, 2009). However, this is not an issue here since we used maximum likelihood estimation, which is an unbiased and consistent estimation method, thereby, minimizing the possibility of artifactual RF filters obtained using NSs (Paninski, 2004).
Greater number of RF filters using NSs are not because of higher spike rates
We found that neurons in V1 were usually more responsive to NSs than to WGN. One theory is that the larger number of uncovered RF filters with NSs is related to the higher spike rates, because more spike data are available for the NIM fitting. The logic behind this suggestion is that having more spikes for fitting might lead to better fits, which might identify more significant filters. We assessed this hypothesis by performing a control analysis to understand the effect that spike count (number of spikes per stimulus image) might have on the number of uncovered spatial filters. This analysis was performed for the cells that met the following three criteria (n = 32): (1) they responded to NSs with a higher spike rate than to WGN stimuli; (2) they had their RF uncovered using both WGN and NSs; and (3) they had more RF filters uncovered using NSs than using WGN stimuli. This control analysis corrected for the higher spike rates on NSs by matching the NS spike rate to the spike-rate recorded on WGN stimuli. This was done by randomly sampling spikes from the spike train of the cell in response to NS stimuli until the specified spike count for WGN was reached. After matching the spike counts of the cells between WGN and NSs, we performed our statistical significance test (z score > 2; see Materials and Methods) to determine the number of spatial filters within the RF with the matched (controlled) spike count for NSs. Most cells (83%) still had more filters within the RF identified using matched spike count-NSs than using WGN (Fig. 9a), indicating that for most cells identifying a larger numbers of RF filters using NSs is not attributed to there being more spikes for these stimuli compared with WGN.
As an additional check that our model fitting procedure correctly estimated the number of filters for WGN compared with NSs, we refitted the model nonlinear functions to WGN, but provided the larger set of RF filters estimated for NSs as fixed parameters in the model fitting. This test considered the possibility that our model-fitting procedure might fail to estimate the full set of filters obtained for NSs because of some flaws. If this was true, the refitted model with all the NS RF filters should have better predictive ability on a withheld WGN dataset, compared with the original WGN model with fewer RF filters. Nonetheless, our analysis proved that the models with the full set of NS RF filters never provide a better predictive ability than the simpler WGN models when tested on a withheld WGN test set. This supports the conclusion that a model fitted to NS data typically had a higher dimensional feature space compared with those fitted to WGN.
The observed changes cannot be fully explained by a small signal effect
Another possible explanation for the smaller number of RF filters typically found for WGN compared with NSs could be a small signal effect. In summary, when the stimulus is changed from NSs to WGN, the small signal effect hypothesis proposes the following: (1) there is a restriction in the range of feature-contrast; (2) there is a reduction in the number of RF filters; and (3) the response functions of WGN and NSs in the restricted range of feature-contrast present in WGN are statistically indistinguishable.
The argument in support of this effect is that stimuli with different statistics allow us to sample different operational regimes or modes of the visual system. Some modes might be effectively unobservable or cannot be estimated using some stimulus types because there are insufficient stimuli containing particular features to allow reliable estimation. Recall from the “Different dynamic range of V1 RF filters on NSs and WGN” section that the range of feature-contrast with WGN was significantly smaller than that for NSs. It could be that within the more restricted range of feature-contrast present in WGN that the two models (NSs and WGN) are not significantly different from each other. In particular, if this were the case for the extra dimensions of a model fitted to NSs, the feature-contrast response functions should be statistically indistinguishable from the feature-contrast response functions of the model fitted to WGN, which is equal to zero in these dimensions. In this hypothesis, the additional filter dimensions present in the model fitted to NSs are only able to be identified because the feature-contrast in NSs becomes sufficiently large to identify a response that departs significantly from zero.
In our population of V1 data, we found no cell that showed any such small signal effect based on the criteria set out by the small signal effect hypothesis. For every cell where we found a reduction in the number of RF filters when the stimulus was changed from NSs to WGN, we also found a significant change in their feature-contrast response functions within the restricted range of feature-contrast of WGN.
To further address the small signal effect, we performed an additional set of control analysis as follows. If the additional filters recovered with NSs relative to WGN are the result of adaptive (or other physiological) changes, and not simply a small signal effect, then it should be possible to recover the full set of filters from the simulated responses of the NSs-fitted model, even when the feature-contrast signal is small, as per WGN. Thus, for each cell, we stimulated the NSs-fitted model with the same WGN stimuli presented to the cell and repeated the RF recovery process for that model neuron. This recovery process typically resulted in fewer RF filters than the original NSs fitted model, most frequently giving no additional filters or just one additional filter. This might be indicative of a small signal effect for some cells (Fig. 9b). However, we also noticed that for most cells, the simulated responses to WGN with the NSs fitted model gave far fewer spikes than those elicited during the actual WGN stimulation in our experiment. These significantly reduced spike counts can bias the RF recovery process to give fewer RF filters than the true number. The number of spikes in the simulated WGN stimulation were reduced because, within the range of feature-contrast of WGN, the feature-contrast response functions of the NSs fitted models were significantly smaller than the models fitted to WGN (Fig. 6b, compare green curves, magenta curves). This demonstrates that the difference between the models uncovered using NSs and WGN is not only in the number of RF filters, but also in significant changes between the feature-contrast response functions. To control for this, we repeated the same analysis by fixing the spike count to that obtained for WGN during the experiment. For most cells, the number of uncovered RF filters using WGN stimulation of the NSs fitted model was found to be greater than the number of filters uncovered during the actual WGN stimulation in the experiment (Fig. 9c), although the number of filters was sometimes not as great as the number in the original NSs fitted model. Together, these analyses indicate there is a substantial increase in the number of RF filters uncovered with NSs relative to WGN, even after small signal effects are accounted for.
Adaptation among other nonlinear phenomena to explain the changes
Based on our analyses, it appears that the changes in the RFs of V1 cells as a result of changes in the stimulus from NSs to WGN are brought about by nonlinear phenomena. A possible explanation for these results may be the adaptation of cells to stimulus statistics. This is a potential explanation for the changes that we have observed in the RFs of V1 cells under WGN and NS stimulation. That said, our analyses cannot entirely rule out the possibility of other effects influencing the RFs, such as surround suppression, or end-stopping, which have been found for V1 cells when probing with basic visual stimuli such as bars and gratings.
In our V1 data, all cells underwent nonlinear effects in their response functions manifested as a significant amplification in their feature-contrast response functions across the restricted, shared feature dimensions when the stimulus changed from NSs to WGN. However, across the feature dimensions that were orthogonal to the shared space (i.e., NS-orthogonal), WGN feature-contrast response functions exhibited trivial dependencies, resulting in a significant reduction in the response function when the stimulus was switched from NSs to WGN (Fig. 8c).
Changes in the characteristics of RF filters
A thorough comparison between the WGN and NS feature spaces can be achieved by sampling features from these spaces and calculating the range of characteristics (e.g., orientation, spatial frequency, and spatial phase) of those sampled features (Almasi et al., 2020). However, this is a very significant project and is beyond the scope of the present article. One issue that hinders direct comparison of the characteristics of RF filters between WGN and NSs is the change in the number of RF filters. This can be particularly problematic when comparing the orientation characteristics between filters as we occasionally observed cells whose extra NS RF filters showed misalignment in their orientation preferences. To allow comparison with previous work (Sharpee et al., 2006), here we will consider spatial frequency. We found that changes in spatial frequency were less problematic. We computed and compared the peak spatial frequency and spatial frequency bandwidth of filters within the same cell between the WGN and NS RFs. Preferred spatial frequency was preserved (Fig. 10a;
Discussion
We studied cat V1 RFs, estimated using the NIM, when V1 cells were stimulated with WGN and NSs. For most V1 cells, we found that the RFs uncovered using NSs resulted in more filters than did WGN. The use of NSs led to an increase in spike rate, but this was not the cause of the increased number of filters. Rather, we believe it is attributed to the wider range of feature-contrasts generated by RF filters when using NSs.
Contrast
The WGN and NSs used in our experiments, though matched in global rms contrast, are different in their local contrast because of their different statistical regularities (Frazor and Geisler, 2006). WGN lacks any second-order and higher-order statistics and is spatially stationary (i.e., local statistics of random patches are independent of where in the scene the patches are sampled). Unlike WGN, NSs have strong second-order and higher-order statistics and are highly spatially nonstationary. Our findings indicate that the larger range of feature-contrasts of V1 RF filters using NSs, compared with WGN, is because of the differences in the statistics of the two stimulus types that are generated by differences in their local contrasts. In this context, feature-contrast can serve as an indication of local contrast in a scene.
Adaptation
Our results suggest the existence of nonlinear phenomena affecting V1 RFs (e.g., adaptation in the visual system), which may occur because of changes in stimulus statistics. Additional nonlinear phenomena such as surround suppression (Jones et al., 2001; Webb et al., 2005; Wissig and Kohn, 2012) or end-stopping (Bolz and Gilbert, 1986; DeAngelis et al., 1994) outside the classical RF may also contribute to the observed changes. Although our RF characterization technique does have the ability to reveal the latter effects, it is possible that they were very subtle or not localized enough to be identified.
Adaptation is often defined and characterized using a model-dependent approach (Baccus and Meister, 2002). In general, any change in the model parameters describing neuronal RFs, or any change in the description of the model is considered to be an adaptation effect. In the latter case, the model is no longer capable of describing the neuronal responses to the original scene after adaptation to a new scene. Since merely adjusting the parameters of our model when transitioning from WGN to NSs, or vice versa, is inadequate for describing the neural data, our results reveal that a change in stimulus statistics necessitates using a different model.
A number of studies have investigated the adaptation caused by image statistics in the visual system (David et al., 2004; Felsen et al., 2005; Sharpee et al., 2006; Lesica et al., 2007; Tkačik et al., 2014). Pronounced changes to RF structure can occur, particularly to inhibition, when the temporal statistics of the stimuli are altered from naturalistic to random (David et al., 2004). However, our stimuli differed primarily in spatial, not temporal, statistics. Shapley (1997) argued that contrast and scene statistics for a given spatiotemporal pattern are tightly related, suggesting that adaptation to scene statistics is equivalent to contrast adaptation. This is pertinent to the present study in which differences between the statistics of WGN and NSs led to differences in the contrasts of the corresponding features in the RF of each cell, which accordingly altered the feature-contrast response function of the cell (but see also Lesica et al., 2007). David et al. (2004) also noted changes in suppressive components of spatial RFs, consistent with the role of contrast gain control when presenting NSs compared with WGN.
The dimensionality of the feature space of a cell directly relates to the complexity of its RF. The RFs of simple cells often comprise a single filter (Almasi et al., 2020). A shift from multiple to single filters has been inferred experimentally using grating stimuli when reducing contrast (Crowder et al., 2007; Henry and Hawken, 2013; Cloherty and Ibbotson, 2015; Meffin et al., 2015; Yunzab et al., 2019). Cell responses shifted from those expected from multifilter RFs (complex-like) to those expected from a single filter (simple-like). A possible explanation for this switch is adaptation to image contrast (or statistics). When presented with WGN, RGCs adapt to the low feature-contrasts of their RF filters. This adaptation introduces a change in the feature-contrast response functions of RGCs, known as contrast gain control (Smirnakis et al., 1997). This leads to a reduction in the contrast gain and consequently the threshold of the feature-contrast response function (Fig. 11). The effect is to increase the sensitivity of cells to the low feature-contrast of the stimulus by adapting the dynamic range of the feature-contrast response function to the stimulus contrast regime. However, this effect might be insignificant in some RGCs and, as a result, the change in the feature-contrast threshold may be inadequate to improve the sensitivity of these cells to the low feature-contrast of WGN. Hence, some RGCs (or possibly the subsequent cells in the lateral geniculate nucleus) might not be activated during WGN stimulation. These cells would, accordingly, drop out from the feedforward visual stream. The aggregate effect on visual cortical cells could be a reduction in the number of RF filters, which indicates the effective dimensionality of the feature space of the cell.
An alternative explanation for the increased number of filters revealed with NSs is that these may reflect the context dependence between features that occur in natural scenes. The present model is limited to capture such dependencies as a sum of nonlinear functions of the filters' outputs [
Strictly for V1 simple cells, Sharpee et al. (2006) reported an amplification in the higher SF components of the uncovered RF filters when stimulus statistics changed from WGN to NSs. They interpreted their results to mean that the visual system was compensating for the under-represented higher SF components in NSs to improve the efficiency of neuronal information transmission. Our results corroborate the findings of Sharpee et al. (2006); with NS stimulation, our V1 cells tend to have RF filters with higher SF preferences and narrower SF bandwidths (Fig. 10). However, we extend beyond simple cells as we recorded from cells with multiple filters. Our data unveil increased numbers of filters and changed response functions as well as increased SF tuning. These modifications optimize the dynamic range of most recorded V1 cells for the given image statistics.
Such modifications may be explained by contrast adaptation. The effects of contrast gain control for ON cells are more pronounced than the effects for OFF cells in the early visual pathway (Chander and Chichilnisky, 2001; Felsen et al., 2005; Zaghloul et al., 2005; Ratliff et al., 2010). When switching from high to low contrast, the response functions of OFF cells show little adaptation to changes in contrast, thereby the shift to sensitivity for low contrasts is less pronounced than in ON cells (Bonin et al., 2006). Furthermore, OFF cells are more selective to high SF features (perhaps because of their small dendritic fields) in the visual scene than ON cells (Ratliff et al., 2010; but see Chichilnisky and Kalmar, 2002). The less pronounced contrast adaptation effect in OFF cells indicates that, when switching from high to low contrast (e.g., from NSs to WGN), some OFF cells likely exhibit little contrast gain control. Therefore, the reduction in their contrast gain is insufficient to improve their sensitivity to the low-contrast stimulus. The reduction in the activity of such OFF cells would propagate in the feedforward stream of the visual system. Therefore, the reduced OFF cell input might explain the significant reduction in the SF preferences and broadening of SF bandwidths of cortical RF filters during WGN, because OFF cells tend to be selective for high SFs.
Function
In most V1 cells in our study, the changes in the RFs between WGN and NSs can be interpreted as a mechanism to increase the amount of information encoded by the cell. Cells adapt their feature-contrast response functions to the dynamic range of the stimulus, and this change seems to be related to the range of feature-contrasts in the stimulus. There are changes that appear in the shape of the response functions. Higher response gains are allocated to the stimuli that are rare in the image, which belong to both the positive and negative tails of the feature-contrast distributions. Such an encoding mechanism carries more information about the stimulus (Felsen et al., 2005).
The changes in feature-contrast response functions reported here are consistent with the contrast adaptation phenomena reported in V1, described as both contrast gain and response gain control (Albrecht et al., 1984; Ohzawa et al., 1985; Fig. 11). Both effects result in a change in the response functions to match the prevailing visual environment.
Overall, our findings conclude that V1 extracts image features from the input in a flexible manner based on the nature of the stimulus.
Footnotes
This work was supported by the Australian Research Council Centre of Excellence for Integrative Brain Function (Grant CE140100007), the National Health and Medical Research Council (Grant GNT1106390), and the Lions Club of Victoria.
The authors declare no competing financial interests.
- Correspondence should be addressed to Hamish Meffin at hmeffin{at}unimelb.edu.au