Abstract
Neurons in the central auditory system are often described by the spectrotemporal receptive field (STRF), conventionally defined as the best linear fit between the spectrogram of a sound and the spike rate it evokes. An STRF is often assumed to provide an estimate of the receptive field of a neuron, i.e., the spectral and temporal range of stimuli that affect the response. However, when the true stimulus–response function is nonlinear, the STRF will be stimulus dependent, and changes in the stimulus properties can alter estimates of the sign and spectrotemporal extent of receptive field components. We demonstrate analytically and in simulations that, even when uncorrelated stimuli are used, interactions between simple neuronal nonlinearities and higherorder structure in the stimulus can produce STRFs that show contributions from time–frequency combinations to which the neuron is actually insensitive. Only when spectrotemporally independent stimuli are used does the STRF reliably indicate features of the underlying receptive field, and even then it provides only a conservative estimate. One consequence of these observations, illustrated using natural stimuli, is that a stimulusinduced change in an STRF could arise from a consistent but nonlinear neuronal response to stimulus ensembles with differing higherorder dependencies. Thus, although the responses of higher auditory neurons may well involve adaptation to the statistics of different stimulus ensembles, stimulus dependence of STRFs alone, or indeed of any overly constrained stimulus–response mapping, cannot demonstrate the nature or magnitude of such effects.
Introduction
A common goal in sensory neuroscience is to characterize a neuron in terms of a function that it computes on its input or on the input to the organism as a whole. This goal has often been pursued using a systems identification approach (de Boer, 1967; de Boer and de Jongh, 1978). In higher auditory centers, such reverse correlation analysis typically involves estimation of the spectrotemporal receptive field (STRF), which can be defined as the bestfit linear model between the spectrogram of the sound and the neuronal response it evokes (Aertsen et al., 1981; Eggermont et al., 1983; Palm and Pöpel, 1985; Eggermont, 1993; Theunissen et al., 2000).
Early research using reverse correlation methods focused on estimation of the Wiener–Volterra kernels using white noise as the driving stimulus (Marmarelis and Marmarelis, 1978). However, in higher levels of the auditory system, white noise tends to be ineffective at eliciting neuronal responses (Wang et al., 2005). This has led auditory researchers to use other stimuli, such as dynamic random chords (DRCs) (deCharms et al., 1998; Schnupp et al., 2001; Rutkowski et al., 2002; Linden et al., 2003), natural sounds (Aertsen et al., 1981; Theunissen et al., 2000; Machens et al., 2004), and a family of stimuli whose basic element is a ripple, a sound modulated sinusoidally in both the temporal and spectral domains (Kowalski et al., 1996a,b; Calhoun and Schreiner, 1998; Klein et al., 2000; Escabí and Schreiner, 2002; Miller et al., 2002; Qiu et al., 2003; Fritz et al., 2005). For many of these stimulus classes, the power at any two points in spectrotemporal space is uncorrelated by design, which simplifies estimation of the STRF. For the others, such as natural sounds, the impact of the correlations is removed at the analysis stage (Theunissen et al., 2000). However, in contrast to white noise, these stimuli are, with the exception of the DRC, not spectrotemporally independent; they contain nonzero third or higherorder crosscentral moments between points in spectrotemporal space.
This lack of full spectrotemporal independence is not an issue when the true underlying response function (RF) is linear. However, neuronal firing, and hence the computation performed by neurons, involves rectifying and saturating nonlinearities dictated by spiking mechanisms. Other significant nonlinearities have been directly demonstrated in the auditory cortex of bats (Suga et al., 1978), rodents (Sahani and Linden, 2003; Ahrens et al., 2008), cats (Calhoun and Schreiner, 1998), songbirds (Nagel and Doupe, 2006), and primates (Barbour and Wang, 2003) and in the inferior colliculus of barn owls (Peña and Konishi, 2001). When the RF is nonlinear, the linear fit between nonindependent stimuli and the neuronal response can reflect statistical properties of the stimuli used in the fit rather than properties of the RF. This fact is well known in theory; in practice, its consequences for STRF analysis are not always fully appreciated. In particular, it is often assumed that STRFs always provide a reliable estimate of the receptive field of a neuron, but this is not necessarily the case.
We show here in simulation that simple, biologically plausible nonlinearities can interact with higherorder central moments in nonindependent stimuli to produce STRFs with spurious receptive field elements. Moreover, we illustrate the fact that even STRFs estimated with spectrotemporally independent stimuli are dependent on the power of the stimulus. Finally, we demonstrate using natural sounds that these effects can lead to STRFs that appear to adapt to reflect stimulus structure, without any actual change in the underlying response function. Thus, the STRF of a nonlinear neuron does not necessarily reflect excitatory and inhibitory components of the underlying RF, and the structure and extent of the STRF may be stimulus dependent even when the true response function of the neuron is not.
Materials and Methods
Stimuli.
All stimuli were created in “frames” of spectrograms consisting of 80 frequency bins and 30 time bins. For each type of stimulus, 75,000 spectrograms were created.
Dynamic random chord stimuli.
As an example of spectrotemporally independent (and therefore also uncorrelated) stimuli, we used a DRC stimulus, one frame of which is shown in Figure 1a. DRC frames were generated directly in spectrotemporal space by randomly selecting 20% of the bins of the spectrogram to have zero intensity and assigning the nonzero bins to have one of five evenly distributed intensities with uniform probability. A DRC stimulus is spectrotemporally independent, in that the mean power in any given bin of the spectrogram is independent of the mean power in the other bins. In other words, knowing the power in any number of the spectrogram bins does not allow prediction of the power in any other spectrogram bin.
Ripple stimuli.
As an example of uncorrelated but not independent stimuli, we used an ensemble of ripples. Each ripple spectrogram in the ensemble was assigned 1 of 128 temporal modulations (with frequencies evenly distributed from 0 through to the maximum possible), and 1 of 255 frequency modulations (again evenly sampled between 0 and the maximum possible), multiplied by a randomly assigned sign; an example of one such spectrogram is shown in Figure 1b. Such ensembles of ripples are spectrotemporally uncorrelated; that is, the power in any given spectrogram bin cannot be predicted from the power in any other single bin. However, these stimuli are not independent; because ripples are periodic, the power in a given spectrogram bin can be predicted from the power in two other bins along the same line through the spectrogram.
Natural stimuli.
Four classes of natural sounds were used in this study: environmental sounds from the Pittsburgh database (Smith and Lewicki, 2006), speech sounds from the TIMIT (for Texas Instruments and Massachusetts Institute of Technology) speech database (Garofolo et al., 1993), a selection of tamarin vocalizations (all either contact calls or combination long calls) provided by R. Egnor and M. Hauser (Harvard University, Cambridge, MA), and Bengalese finch songs provided by C. Hampton and M. Brainard (University of California at San Francisco, San Francisco, CA). All sounds were resampled to a sampling rate of 16 kHz and passed through a filter bank consisting of 80 gammatone bandpass filters with center frequencies linearly distributed between 100 and 7000 Hz. The spectrogram was then given by the Hilbert envelopes of the filterbank output, decimated to a sampling rate of 1 kHz. The stimuli were subdivided into spectrogram elements, each 80 frequency bins by 30 time bins, and then a random subset of 75,000 were chosen for use in the study.
Natural stimuli, unlike DRC stimuli and ensembles of ripple stimuli, may have (secondorder) correlations. The problem of robustly compensating for the effects of these correlations on STRF estimation has been addressed in previous studies (Theunissen et al., 2000; Woolley et al., 2006). Because our primary interest was in the effects of higherorder statistics on STRF analysis, we chose to avoid the issues associated with secondorder structure by numerically whitening natural stimuli before use. After this process, the offdiagonal elements of the autocorrelation matrix were all five or more orders of magnitude smaller than the diagonal elements, although any higherorder statistical structure was preserved.
Simulation of response.
Spectrograms were each recast into a vector and became rows in a 75,000 × 2400 stimulus matrix S. This was then multiplied by one or more similarly vectorrecast RF matrices w⃗ (2400 × 1), and the results were combined according to the rules below to give a 75,000 × 1 response vector r⃗. In Figures 9 and 10, this was taken to be the response of the neuron. In other simulations, a final response ρ⃗ (75,000 × 1) was obtained by drawing 20 samples from an inhomogeneous Poisson distribution with mean parameter r⃗ and then averaging across the samples:
For simulation of a linear RF neuron, S was multiplied by a single RF w⃗ (2400 × 1): r⃗ = Sw⃗.
For the linear model in Figure 9, r⃗ was taken to represent the response, even if some entries were negative, thus preserving true linearity. In simulations with noise, the stimuli were offset so that r⃗ was never negative, and Poisson noise was added, as above.
Three basic nonlinear RFs were modeled. A “multiplicative RF” was modeled using the linear responses to two distinct Gaussian receptive fields in spectrotemporal space. The outputs of the linear projections were rectified and multiplied pointwise (indicated by the Schur product ○) to give the response vector r⃗: A “divisive inhibition RF” was modeled using the linear response to two distinct receptive fields. Both receptive fields were Gaussian in temporal extent. The excitatory receptive field was a squared Gaussian in spectral extent, whereas the inhibitory receptive field was quadratic (see Fig. 4a). As with the multiplicative model, the output of each projection was rectified and then combined pointwise (with pointwise division indicated by the symbol ÷): The constant a was set to 7.5. Last, a “threshold RF” neuron was simulated by subtracting 0.3 times the maximum value of the linear projection and then rectifying to give the mean rate:
Estimation of STRFs.
All simulations used stimuli with well conditioned autocorrelation matrices, and so STRFs could be estimated by simple linear regression: The autocorrelation matrix S^{T} S differed from a scaled identity matrix only because of finitesample effects.
In plots of RFs and STRFs, black denotes the minimum response, and white denotes the maximum response.
Analytic form of the STRF for multiplicative RF.
The effects of RF nonlinearity are much simpler to describe analytically for the multiplicative model than for the other models used in our simulations. Therefore, we use the multiplicative RF here to illustrate quantitatively how overestimation of the true receptive field may arise from dependence of the STRF on higherorder statistics of the stimulus. Note that the multiplicative model can be viewed as a component of the secondorder term in a Volterra expansion, and thus this derivation is partially relevant to any analysis of nonlinear data.
Let s⃗ be a single N × 1 stimulus frame (corresponding to a row of the stimulus matrix S defined above), which evokes a (scalar) mean response r in a multiplicative model as defined above, with N × 1 RF components w⃗_{1} and w⃗_{2}. To avoid effects attributable to finite sampling and noise, we consider expectations with respect to both response variability and the stimulus ensemble. These expectations will be denoted by angle brackets 〈·〉. Without loss of generality, we assume the stimulus set has been normalized such that 〈s⃗ s⃗^{T}〉 = I and 〈s⃗〉 = 0⃗.
In the multiplicative model, the expected value of the STRF estimate is given by This expectation over all stimuli can be replaced by the average over only those stimuli for which s⃗^{T}w⃗_{1} > 0 and s⃗^{T}w⃗_{2} > 0 (that is, those stimuli that evoke nonzero response under the multiplicative model), multiplied by a coefficient α giving the fraction of such stimuli in the overall ensemble. Writing 〈·〉_{+} for the restricted average and focusing on the ith element of ŵ, we have Thus, each element ŵ_{i} of the STRF estimate ŵ depends on the thirdorder conditional statistic 〈s_{i}s_{j}s_{k}〉_{+}, as well as on w⃗_{1} and w⃗_{2}.
We are interested in determining under what conditions ŵ might overestimate the receptive field of a neuron. Formally, by receptive field, we mean the “dimensional support” of the model (which we will abbreviate to “support”): the stimulus dimensions that can influence the response of the model. In this context, when we say i is within the support of the model, we mean that at least one of w_{1i} and w_{2i} is nonzero, and so we want to determine the conditions under which both w_{1i} and w_{2i} are 0, but ŵ_{i} is nonzero. Accordingly, we partition each stimulus into two components: Substituting s_{i} = s_{i}^{I} + s_{i}^{O} into Equation 2, we obtain the following: By construction, s_{i}^{I}, and therefore the first term in the final expression of Equation 4, is zero when i is not in the dimensional support of the model. Therefore, for points outside the support, we have Thus, unless the restricted statistic 〈s_{i}s_{j}s_{k}〉_{+} is zero for all i outside and j, k inside the support, estimated weights outside the support may be nonzero. If the stimulus is not independent, the statistic cannot be guaranteed to vanish.
Note that the definition of 〈·〉_{+} means that nonzero elements of s⃗^{I} are not independent of one another within the restricted set of stimuli, even if they were in the overall ensemble. Intuitively, knowledge of one point in the support and the fact that a response was elicited puts constraints on the possible values of other points in the support. One might therefore be concerned that, even for independent stimuli, the restricted statistic might not vanish. However, as long as it is true that s_{i}^{O} for i outside the support is independent of s_{j}^{I} for j inside the support, the value of s_{i}^{O} will also be independent of the response (which depends only on s⃗^{I}) and thus independent of s_{j}^{I} even after restriction to stimuli that evoke responses. Thus, for weights ŵ_{i} outside the support,
Results
Multiplicative interactions
The STRF is often taken to represent the receptive field of a neuron or, in other words, is often considered to be an accurate estimate of the dimensional support (see Materials and Methods), herein referred to simply as the support, of the RF of the neuron. More precisely, it is assumed that significant weights in the STRF appear only at those time–frequency points that contribute causally to the output of the response function. For a linear response function (Fig. 2a), both spectrotemporally independent (DRC) (Fig. 2b) and nonindependent but uncorrelated (ripple) (Fig. 2c) stimuli provide a consistent estimate of the support; that is, no spectrotemporal points appear in the STRFs that do not contribute to the RF in the large data limit. In the case of a nonlinear RF, for example with a multiplicative interaction (Fig. 2d), independent stimuli can still be used to obtain a conservative estimate of the RF support (Fig. 2e). This is a corollary of independence; if one point is independent of another point, then it will also be independent of any function of that other point. Thus, regardless of the nature of the RF, points outside the support of the RF will be uncorrelated with the response of the neuron, which is, by definition, a function only of points within the support. Therefore, in the large data limit, the STRF will have zero values outside the support when estimated with a spectrotemporally independent stimulus.
However, this is not the case for ensembles of ripple stimuli, which are spectrotemporally uncorrelated but not independent. The STRF estimated using ripple stimuli shows significant sidebands of excitation (Fig. 2f) that do not correspond to any feature in the true RF. Although the uncorrelated nature of the ripple ensemble guarantees that points outside the support are not correlated with any single point within the support, it does not prevent points outside the support of the RF from effectively being correlated with higherorder cross moments of the stimulus within the support. Thus, if the RF depends on these higherorder moments, the stimulus outside the support may be correlated with the response of the neuron. Intuitively, in the case of this particular nonlinearity, the RF selects for stimuli that have power present simultaneously in both RF elements. Because the individual ripples are periodic in structure, the ripple ensemble possesses nonzero thirdorder moments; specifying that two points are peaks, for example, means that peaks will repeat along the line drawn through the two points at integer multiples of the distance between them. The net result is that points outside the support of the RF then become correlated with combinations of points inside the support, resulting in the appearance of the sidebands in this case. This intuition is formalized in Materials and Methods, in which the analytic form of the STRF for a multiplicative model is derived, showing its dependence on a thirdorder statistic of the stimulus (Eq. 5).
A closer arrangement of the RF components may produce a single STRF feature with exaggerated support. A physiologically plausible example is shown in Figure 3a, in which two closely spaced RF components staggered in time and frequency are multiplied together. Compared with that estimated using the DRC (Fig. 3b), the STRF estimated using an ensemble of ripple stimuli is elongated in both the temporal and the spectral domains (Fig. 3c). The same principle as underlies the effect in Figure 2f causes this phenomenon, but the closer spacing of the RF components means that the sidebands are not visibly separated from the central region.
Divisive inhibition
Similar effects can be observed using an RF model intended to simulate the inhibitory sidebands observed in twotone mapping studies [e.g., in auditory cortex (Sutter et al., 1999; Kadia and Wang, 2003)]. Rather than explicitly modeling twotone interaction nonlinearities, we use coextensive fields of excitation and inhibition (inspired by the results of Wehr and Zador, 2003), with the excitation dropping off more sharply than the inhibition (Fig. 4a). The inhibition acts divisively; similar divisive interactions are thought to play a critical role in regulating activity in visual cortex networks (Heeger, 1992; Chance and Abbott, 2000). The STRF estimated for this model using the DRC stimulus has inhibitory regions on either side of an excitatory main peak (Fig. 4b), which resemble the inhibitory sidebands described in twotone mapping studies of the central auditory system (Sutter et al., 1999; Kadia and Wang, 2003). In the STRF estimated using an ensemble of ripples, however, additional banding is visible (Fig. 4c), extending well beyond the support of the model. Intuitively, the lateralinhibitionlike function of the model selects for those ripples that have a frequency modulation pattern matching the spacing of the surrounds, and the resulting dominance of ripples with this frequency modulation causes additional banding to emerge in the STRF.
Thresholding
Multiplicative nonlinearities are not required to produce this sort of effect. Even a thresholding output nonlinearity (Fig. 5a), fundamental to the spiking response, can lead to an overestimation of support in the STRF when the stimulus is not spectrotemporally independent (Fig. 5c). The STRF for the rectified bimodal linear RF in Figure 5c is very similar to the STRF for the multiplicative nonlinearity of Figure 2f and for essentially the same reason; the rectification ensures that the majority of spectrograms that actually elicit responses are those that have power corresponding to both peaks in the linear RF. In addition to the side bands, another feature of the rippleestimated STRF shared between the two cases, but more obvious in Figure 5c, is the increased size of the STRF elements corresponding to the true RF components. This again arises from structure in the ripple stimulus. A high threshold requires that the majority of the receptive field be stimulated to elicit a response. Because individual ripples are continuous, any stimulus that elicits a response and hence has power throughout the extent of the receptive field will also have power immediately outside the border of the receptive field. Thus, the set of all responseeliciting ripples have power extending beyond the support of the RF, and this is reflected in the STRF.
General effects of nonlinearities
In all Figures 2⇑⇑–5, other patterns can be seen in the STRFs, in addition to the specific features we have described. (This is true not only for the rippleestimated STRFs but also for the DRCestimated STRFs, which show an apparently noisy background.) Such effects are not entirely attributable to noise in the simulated responses; they remain present even when noise is excluded from the simulations. Rather, these patterns, like the specific features described previously for each model, arise from an interaction between the RF nonlinearity and the statistics of the stimulus used to estimate the STRF. (In the case of the DRCestimated STRFs, some part of the noisylooking background derives from nonzero moments in the stimulus that occur because the stimulus is finite in length.) These patterns are generally sensitive to minor changes in the model parameters, and their origin is difficult to describe more intuitively than with reference to the interaction between nonlinearities and stimulus statistics (e.g., see the analytic form of the STRF for the multiplicative RF in Materials and Methods).
Nonintuitive consequences of linear regression in highdimensional spaces
The fact that nonlinearities in response functions can lead to differences in STRFs estimated using different stimuli has long been acknowledged in the literature (Marmarelis and Marmarelis, 1978; Aertsen and Johannesma, 1981; Theunissen et al., 2000; Escabí and Schreiner, 2002). Indeed, for onedimensional regression, the point is obvious; because the linear fit is only an approximation to the true nonlinear generating function, the fit will depend on the range and distribution of data to be fit (Fig. 6).
Related and equally intuitive observations apply to response prediction. Again, in one dimension, the slope of a line fit to a nonlinear function over a set of points that fall within a particular data range is generally more useful for predicting the value of the function at other points within the same range than the slope of a line fit to data in a different range would be. Likewise, in our simulations, rippleestimated STRFs always predicted responses to novel instances of ripple stimuli better than did DRCestimated STRFs, and DRCestimated STRFs always predicted responses to novel instances of the DRC stimulus better than did rippleestimated STRFs.
However, the most important implication of our simulations has no analogy in onedimensional regression and is therefore less intuitive. Suppose a linear function (such as an STRF) is fit to nonlinear data (such as nonlinear neuronal responses) in a very highdimensional space (such as spectrotemporal space, in which each time–frequency element represents a different dimension). Suppose also that the points at which the data have been measured (e.g., stimuli used to elicit neuronal responses) are such that there are third or higherorder statistical relationships between the values of the different input vector elements at those points (such as there are between spectrotemporal elements within the ensemble of ripple stimuli). Then the optimal linear fit may have nonzero weights even in dimensions to which the true datagenerating function is insensitive; in the particular case of the nonlinear neuron, this means that the STRF may have nonzero weights outside the support of the RF of the neuron (Figs. 2⇑⇑–5). This overestimation of RF support will not impair the power of the STRF to predict responses to novel stimuli, as long as those novel stimuli fall within the same region of stimulus space in which the fit was performed.
Representation dependence
Coordinate independence is a property not simply of the stimulus itself but also of its representation. An ensemble of ripple stimuli, for example, although not independent in spectrotemporal space (Fig. 7a), becomes only a set of discrete points in modulation transfer function space (Fig. 7b), in which signals are represented by the modulation of their envelopes in the temporal and spectral domains (Kowalski et al., 1996a,b; Calhoun and Schreiner, 1998). A linear combination of ripples with independently chosen temporal and spectral frequencies is thus independent in modulation transfer function space, so although an ensemble of ripples overestimates the true support of a nonlinear RF in the spectrotemporal domain (Fig. 7c,e), when the STRF is transformed into modulation transfer function space, it provides a conservative estimate of the support in the modulation transfer function domain (Fig. 7d,f). As many previous authors have noted, ensembles of ripples are therefore ideal stimuli for studies of neuronal response properties in modulation transfer function space (Kowalski et al., 1996a,b; Calhoun and Schreiner, 1998).
In contrast, unless the powers of individual elements in a DRC stimulus are chosen from a Gaussian distribution, the modulation content of the DRC stimulus at different modulation frequencies is not independent and shows periodic structure much as the ripples do in spectrotemporal space. Thus, a modulation transfer function estimate derived from a DRC is not guaranteed to provide a conservative estimate of the true modulation transfer function for a nonlinear neuron.
Power dependence
Even when the stimulus used is independent for the coordinates chosen, nonlinearities in the RF can affect the estimated STRF in a stimulusdependent manner (Marmarelis and Marmarelis, 1978; Eggermont, 1993). This effect can confound the interpretation of apparent excitatory and inhibitory components in the STRF. Figure 8a illustrates a model neuron that has overlapping excitatory and inhibitory regions, in which the excitatory region is generated by a multiplicative nonlinear component and the inhibitory one is generated by a linear component. At low levels of stimulus power (Fig. 8b), the linear term dominates, and the STRF is inhibitory. At intermediate values, the region can essentially disappear entirely from the STRF as the two terms cancel each other out (Fig. 8c). Eventually, the multiplicative term dominates, and the STRF is entirely excitatory (Fig. 8d). In the case of the multiplicative model, this effect arises from the fact (as seen in Eq. 4 in Materials and Methods) that, even for independent stimuli, there is a dependence of the STRF on nonnegligible thirdorder statistics between points within the support. Although here we illustrate this effect with differing levels of stimulus power, it is also theoretically possible to see similar effects using different classes of independent stimuli that have the same power. For example, while recording responses of owl monkey auditory cortical neurons, Blake and Merzenich (2002) varied the power and spectrotemporal density of DRC stimuli and thus manipulated a combination of stimulus power and stimulus statistics. They observed emergent inhibition overlapping an excitatory region in their STRFs, which our simulations closely resemble. Thus, even when estimated using independent stimuli, STRFs are stimulus dependent, although independence does guarantee that the changes resulting from differences in the power or the nature of the stimulus used will still be constrained to a conservative estimate of the dimensional support.
Natural stimuli
It is not immediately clear that the above results hold in the case of natural stimuli, which, although not spectrotemporally independent, typically possess more complicated higherorder statistics than those considered thus far. To address this issue, we estimated STRFs for all of our model RFs using a variety of natural stimulus ensembles. The highly correlated structure of natural stimuli itself presents problems for STRF estimation (Theunissen et al., 2000), so some form of regularization is typically used during estimation of the STRFs for real neuronal responses. In simulations, however, we could avoid issues associated with regularization completely by using whitened versions of the natural stimuli that were in fact uncorrelated. This whitening allowed us to focus specifically on the effects of interactions between response nonlinearities and higherorder statistical dependencies in the stimuli. To avoid confounding the effect of signal statistics with the effect of noise, no noise was included in the model responses for the simulations shown, although inclusion of noise did not have a major influence on the results.
As expected, STRFs estimated for the linear model using whitened natural stimuli show complete recovery of the underlying RF, regardless of the stimulus ensemble used (Fig. 9 a–e). However, for the multiplicative model, differences between the STRFs estimated with the different stimuli become apparent (Fig. 9 f–j). In particular, the STRF estimated using ambient environmental sounds (Fig. 9g) displays smearing in the frequency dimension, which contrasts with the temporally elongated features of the STRFs estimated using Bengalese finch song (Fig. 9h) and tamarin calls (Fig. 9i). (These effects are consistent with the dominant features of the sounds: ambient environmental sounds include many brief spectral stacks, and Bengalese finch song and tamarin calls are composed of more elongated narrowband sounds.) Similar trends are observed in the STRFs estimated for the threshold model (Fig. 9 k–o). For the divisive inhibition model (Fig. 9 p–t), the inhibitory sidebands select against activation by spectrally elongated sounds, and so the frequency smearing vanishes in the case of the ambient environmental sounds (Fig. 9q). However, temporal elongation is still apparent near 4 kHz for the STRF estimated with finch song (Fig. 9r) and at lower frequencies for the STRF estimated using tamarin calls (Fig. 9s). In general, speech STRFs show fewer effects, although the speech STRF for the divisive inhibition model has a pronounced and temporally elongated lowfrequency RF component (Fig. 9t).
The effects illustrated in Figure 9 are less dramatic than those shown previously using ripple stimuli. It must be reiterated, however, that even these more subtle effects are attributable entirely to higherorder stimulus statistics and are not a result of either noise or regularization. Moreover, the effects depend on an interaction between the model and higherorder stimulus statistics. Changes in the RF model parameters can lead to dramatic results (Fig. 10). Results with ripple stimuli provide an intuitive explanation for this strong dependence on the RF nonlinearity. In ripple STRFs, structure outside the RF support is far more pronounced for the divisive inhibition model (Fig. 4c) than for the multiplicative model (Fig. 2f). This is because the inhibitory sidebands of the divisive inhibition RF are strongly selective for a particular category of ripples (those with frequency modulations fitting the off–on–off spacing of the model) and hence for those higherorder statistics. The RF models used in this study were chosen for their simplicity and were not designed to select for any features present in the natural stimuli, so it is not surprising that interactions between these RF nonlinearities and higherorder statistics of natural stimuli are relatively subtle (Fig. 9). However, recordings in the auditory cortex of marmosets (Wang et al., 2005) suggest that neurons in the central auditory system can be highly selective for complex stimuli. In fact, the observation that independent stimuli can drive auditory neurons poorly is one of the primary motivations for the use of nonindependent stimuli in reverse correlation studies (Klein et al., 2000). Thus, it is a real concern that the effects observed in STRFs estimated using natural stimuli would become much more pronounced in the presence of neuronal nonlinearities that selected strongly for natural stimulus statistics.
Discussion
We have demonstrated through simulation that even simple, biologically plausible nonlinearities can have powerful effects on STRF structure. For a neuron with a linear RF, the STRF will be an accurate estimate of the support of the RF (i.e., the receptive field of the neuron). However, when nonlinearities are present in the RF, decorrelation of the stimulus will no longer ensure accurate estimation of the support. Higherorder statistics of an uncorrelated but not independent (or neither uncorrelated nor independent) stimulus can interact with the RF nonlinearities to produce features in the STRF that do not correspond to any actual component of the RF. This effect, in which the support of the STRF can exceed that of the true RF, does not impair the performance of the STRF on prediction of responses to stimuli similar to those used for STRF estimation (although it may impair prediction of responses to stimuli with distinctly different characteristics). Rather, it impairs our ability to interpret the STRF to derive parameters relating to the true RF support. Thus, when estimated with stimuli that are not independent, STRFs cannot reliably be used to determine the spectrotemporal extent of receptive fields for nonlinear neurons.
STRFs estimated with spectrotemporally independent stimuli will not have features that lie outside the support of the RF; however, the estimate of RF support will be conservative. Therefore, it is possible for the extent of the receptive field to be underestimated (Fig. 8). Moreover, for STRFs estimated with either independent or nonindependent stimuli, the concept of the “sign” of STRF components is problematic. In a nonlinear RF, the excitatory or inhibitory action of stimulus power at any time–frequency point can depend on the total power in the stimulus; a linear approximation to the RF (i.e., the STRF) will therefore be stimulus dependent even when estimated using independent stimuli, and interpretation of inhibition and excitation will be limited to the specific stimulus used for the STRF estimation. Thus, for independent as well as nonindependent stimuli, stimulus dependence of STRFs may arise through interactions between constant RF nonlinearities and the properties of the stimulus ensemble. This fact is implicit in the previous theoretical work on STRFs (Eggermont, 1993); here we have presented explicit examples of such interactions using biologically plausible nonlinearities. We also made the previously unacknowledged point that, for STRFs estimated with nonindependent stimuli, these interactions can create features in the STRF that lie outside the true receptive field of a nonlinear neuron.
Many recent studies have demonstrated stimulus dependence in STRFs estimated using different stimuli (Theunissen et al., 2000; Blake and Merzenich, 2002; Escabí and Schreiner, 2002; Valentine and Eggermont, 2004; Woolley et al., 2006) or stimulus dependence in other measures of neuronal responses to different complex sounds (BarYosef et al., 2002; Nagel and Doupe, 2006). There is therefore little doubt that central auditory neurons are in fact nonlinear (but see below), but the nature of that nonlinearity remains a topic of some debate. One possibility is that stimulusdependent changes in STRFs may reflect some underlying process of adaptation; that is, a change in the stimulus causes an otherwise predominantly linear mapping to adapt, nonlinearly, to the altered context (e.g., as suggested by Woolley et al., 2006). Another (or additional) possibility is that stimulusdependent changes in STRFs may arise not from changes in RF parameters but instead from a stationary nonlinearity in the response function (e.g., as suggested by Theunissen et al., 2000; Escabí and Schreiner, 2002; Valentine and Eggermont, 2004). [A similar point has been raised recently in studies of adaptation of motion detection in the fly visual system (Borst et al., 2005).] Adaptation is ubiquitous in the brain and surely plays an important role in neural processing, but the effects of stationary response function nonlinearities described here complicate quantification of the extent and form of adaptation. This does not mean that STRFlike analysis is inappropriate for use in adaptation studies, however. To demonstrate adaptation, it is sufficient to show that the parameters of the response function change as a function of time, more gradually than the stimulus itself (Shechter and Depireux, 2006). Alternatively, it is possible to look at changes in STRFs estimated using the same stimuli embedded within an adapting context of other stimuli or task demands (Fritz et al., 2005), thus avoiding the issue of stimulusdependent estimation entirely. A critical point not addressed by previous work, however, is that the differences between STRFs resulting from static nonlinearities may only indirectly reflect the nature of the underlying response function. Thus, if changing the stimulus results in new areas of the STRF displaying power, it does not necessarily mean that additional portions of the receptive field have been “uncovered.” Such variation might instead reflect only the interaction between a constant nonlinear response and the statistics of the stimuli.
We should note here that there is still some debate in the literature about whether STRFs tend to be stimulus dependent and therefore whether central auditory neurons have nonlinear response functions. Previously, for example, Klein et al. (2006) examined the stability of STRFs estimated for auditory cortical neurons using a variety of stimulus classes and concluded that stimulus dependence of the STRFs was minimal (and therefore that the neuronal response functions could reasonably be described as linear). However, the stimuli they used were all derived from ripple stimuli and may possess similar higherorder statistics. More importantly, their analysis focused on grossscale structure of the STRFs, whereas our results suggest that nonlinearities may express themselves in a more subtle manner. The DRC and rippleestimated STRFs of Figure 2, e and f, for instance, have a correlation coefficient of 0.83; this contrasts with the mean correlation coefficient of 0.64 for STRFs in the study by Klein et al. (2006). Although the STRFs we estimate with DRC and ripple stimuli are well correlated, the differences between them would greatly complicate analysis of receptive field structure.
The issues we described here are not exclusive to STRF analysis. They are in fact general consequences of the fundamental problem of model mismatch. Single and twotone frequency response areas (FRAs) illustrate the same concept. The single and twotone FRAs of a neuron are generally different, and the fact that the singletone FRA cannot predict twotone interactions does not mean that the response function of the neuron changes with the number of tones presented, but rather that the response function cannot be completely characterized in terms of linear sums of the responses to single tones alone. STRF analysis makes the explicit assumption that the true RF is linear. This is, however, a very difficult assumption to verify in practice, and thus great caution must be taken in interpreting STRFs. Fundamentally, the most useful interpretation of the STRF is predictive: that is, it is the linear model that best predicts the response to the chosen stimuli. Although it may be useful to compare STRFs estimated with the same sound stimulus, it is difficult to derive meaningful conclusions from the comparison of STRFs derived using different stimulus sets. Similarly, extracting the true RF from comparison of any collection of STRFs is a difficult problem, in which even the analysis of such basic details as the spectrotemporal extent of the RF support must be approached with caution. Just as the singletone frequency response function is not the whole RF, the (perhaps misleadingly named) STRF is only an accurate description of the receptive field in the case of a neuron with a linear RF.
In light of the issues presented here, it may seem a reasonable question to ask, why use STRF analysis at all? Just as singletone FRAs have not been invalidated as a tool for auditory neuroscience by the biological implausibility of the assumption that auditory neurons can be characterized by their responses to single tones alone, STRF analysis is not invalidated by the overly constraining assumption that auditory neuronal response functions are spectrotemporally linear. Indeed, in many circumstances, the simplicity of the STRF, like the simplicity of the singletone FRA, can be an advantage for analysis. A generalized nonlinear model, such as the Wiener–Volterra series, may seem preferable to STRFs, but fitting increasing orders of complexity in such models requires an exponentially increasing amount of data. More sophisticated approaches to modeling linear–nonlinear–Poisson (LNP) neural responses (that is, response functions in which one or more linear components are combined in a static nonlinearity) have been proposed. These include “conditional whitening” in the context of the spiketriggered covariance method (Rust et al., 2005) and information maximization (Sharpee et al., 2004) (for review, see Simoncelli et al., 2004). These methods might go some way toward alleviating the problems discussed here, although they cannot resolve the issues entirely (for a discussion, see Schwartz et al., 2006). More significantly, the issue of model mismatch remains for these methods, and it is not clear whether the LNP assumptions are any easier to validate than those of STRF models. In our view, the STRF is a good starting point for modeling auditory neuronal responses, because the analysis makes a simple assumption and requires a minimum amount of data. By examining how the linear approximation to the RF changes with modifications in the stimulus, it may be possible to explore the nature of any underlying RF nonlinearities (Kvale and Schreiner, 1997; Blake and Merzenich, 2002). Moreover, by incorporating specific nonlinearities into models that extend STRF analysis and comparing the predictive power of those models with that of STRFs, it is possible to improve our understanding of how auditory neurons process complex stimuli (Ahrens et al., 2008). Thus, STRF analysis is an important tool for characterizing auditory neuronal responses, when used with full awareness of the possible consequences of linear approximation to likely nonlinear neuronal response functions.
Footnotes

This work was supported by Gatsby Charitable Foundation Grants GAT2579/GAT2623 (J.F.L.) and GAT2868 (G.B.C. and M.S., via the Gatsby Computational Neuroscience Unit). We thank M. Ahrens and L. A. Anderson for their useful discussions regarding this work, R. Egnor for helpful comments on this manuscript, and M. Brainard, A. J. Doupe, C. Hampton, R. Egnor, and M. Hauser for providing natural stimuli.
 Correspondence should be addressed to Jennifer F. Linden, UCL Ear Institute, University College London, London WC1X 8EE, UK. j.linden{at}ucl.ac.uk