Abstract
Numerous studies have investigated the spatial sensitivity of cat auditory cortical neurons, but possible dynamic properties of the spatial receptive fields have been largely ignored. Given the considerable amount of evidence that implicates the primary auditory field in the neural pathways responsible for the perception of sound source location, a logical extension to earlier observations of spectrotemporal receptive fields, which characterize the dynamics of frequency tuning, is a description that uses sound source direction, rather than sound frequency, to examine the evolution of spatial tuning over time. The object of this study was to describe auditory spacetime receptive field dynamics using a new method based on crosscorrelational techniques and whitenoise analysis in spherical auditory space. This resulted in a characterization of auditory receptive fields in two spherical dimensions of space (azimuth and elevation) plus a third dimension of time. Further analysis has revealed that spatial receptive fields of neurons in auditory cortex, like those in the visual system, are not static but can exhibit marked temporal dynamics. This might result, for example, in a neuron becoming selective for the direction and speed of moving auditory sound sources. Our results show that ∼14% of AI neurons evidence significant spacetime interaction (inseparability).
 auditory cortex
 receptive field
 whitenoise analysis
 reversecorrelation
 sound localization
 motion sensitivity
The acoustic environment contains both static and dynamic sound sources that must be localized for communication and survival. It is well known that auditory cortex, including the primary auditory (AI) field, plays a significant role in sound localization (Jenkins and Merzenich, 1984; Masterton and Imig, 1984). Indeed, directional sensitivity of cat auditory cortical neurons has been recognized for many years (Eisenman, 1974;Middlebrooks and Pettigrew, 1981; Imig et al., 1990; Rajan et al., 1990; Middlebrooks et al., 1994; Brugge et al., 1996). Typically, this sensitivity is assessed by relating an average response metric (e.g., discharge rate) to the direction of the sound source. Space receptive fields mapped in this static domain reveal systematic spatial patterns. In studies of the visual system, techniques have been applied to derive dynamic receptive fields that have been viewed as sensitivity for a stimulus that evolves in space and time (Jones and Palmer, 1987;McLean et al., 1994; DeAngelis et al., 1995). A consequence of this dynamic structure is that response patterns evoked by stimuli moving through a spatial receptive field will depend on the stimulus trajectory in a manner that cannot be predicted by a “static” description of the receptive field alone. Here we show that a proportion of AI neurons, as in primary visual cortex, require descriptions in both space and time.
Auditory spacetime receptive field dynamics are described with a new investigative tool that is grounded in the theory of whitenoise analysis and reversecorrelation techniques. Whitenoise analysis is a general approach for linear, as well as nonlinear, system analysis in physiology (Marmarelis and Marmarelis, 1978; Aertsen and Johannesma, 1981; Eggermont, 1993). Reversecorrelation, which was originally used to estimate filter characteristics of auditory peripheral afferents (deBoer and Kuyper, 1968), has been particularly fruitful in the investigation of spatiotemporal selectivity in the visual (Jones and Palmer, 1987; McLean et al., 1994; DeAngelis et al., 1995) and somatosensory (DiCarlo et al., 1998) systems and in analyzing frequency–time receptive fields in the auditory pathways (Epping and Eggermont, 1986; Melssen and Epping, 1992; deCharms et al., 1998; Depireux et al., 1998).
It is important to recognize that in the visual and somatosensory systems a spatiotemporal receptive field maps the stimulus in two dimensions with respect to both the receptor surface (i.e., retina or skin) and the external visual field or body surface. In the auditory system, however, the frequency–time receptive field maps the stimulus only on the receptor surface and provides no explicit information about spatial sound location. Thus, a common view, expressed by deCharms and Zador (2000), is that a twodimensional spatial receptive field is undefined for auditory neurons, because there is only a single dimension, i.e., a linear array of inner hair cells, along the basilar membrane. However, this view neglects the fact that the central auditory system must compute sound source location. Our new spherical whitenoise method, in which multiple sound events from random directions encapsulate all possible angular velocities, specifically estimates auditory spacetime receptive fields in two spherical spatial dimensions.
MATERIALS AND METHODS
Physiology. Adult cats with no sign of external or middle ear infection were premedicated with acepromazine (0.2 mg/kg, i.m.), ketamine (20 mg/kg, i.m.), atropine sulfate (0.1 mg/kg, s.c.), dexamethasone sodium (0.2 mg/kg, i.v.), and procaine penicillin (300,000 U, i.m.). Anesthesia was maintained with halothane (0.8–1.8%) in a carrier gas mixture of oxygen (33%) and nitrous oxide (66%). Pulse rate, O_{2}, CO_{2}, N_{2}O, and halothane levels in the inspired and expired air were monitored continuously (Ohmeda 5250). A muscle relaxant was administered (pancuronium bromide, 0.15 mg/kg, i.v.) if spontaneous respiration was irregular or otherwise compromised. Paralysis could be maintained throughout the experiment by supplemental doses of pancuronium. Experimental protocols were approved by the University of Wisconsin Institutional Animal Care and Use Committee.
Under surgical anesthesia, the pinnae were removed, and hollow earpieces were sealed into the truncated ear canals and connected to specially designed earphones. A probetube microphone was used to calibrate the sound delivery system in situ near the tympanic membrane. The left auditory cortex was exposed, and a sealed recording chamber with Daviestype microdrive was cemented to the skull. Action potentials were recorded extracellularly with tungsteninglass microelectrodes, digitized at 25 kHz, and sorted online and offline.
Normally, sound produced by a freefield source is transformed in a directiondependent manner by the pinna, head, and upper body structures en route to the tympanic membrane (Musicant et al., 1990;Rice et al., 1992). To implement a virtual acoustic space (VAS), these transformations are replicated digitally. Interpolation between measured directions (Chen et al., 1995; Wu et al., 1997) was used to allow the generation of arbitrary virtual sound source directions. Directional stimuli were 10 msec Gaussiannoise bursts that were positioned in VAS using a spherical coordinate system (−180 to +180° azimuth, −36 to +90° elevation) centered on the midline of the cat's interaural axis. All sound stimuli were compensated for the transmission characteristics of the sound delivery system. Tone burst stimuli delivered monaurally or binaurally were used to estimate the characteristic frequency of a neuron and some response area features related to binaural interactions as described previously (Brugge et al., 1996). The tonotopic organization observed over numerous electrode penetrations during the course of an experiment further confirmed that the recordings were obtained from neurons in AI. Stimulus presentation and data acquisition were accomplished with a TDT System II (TDT, Gainesville, FL), and BrainWare software (TDT) was used to sort action potentials (spikes) among single units.
Mathematical foundation. The formal mathematical foundation for our methodology is based on work by Krausz (1975), who extended the system identification techniques developed by Lee and Schetzen (1965)and Wiener (1958) to the use of a random Poisson process as the input set. Poisson acoustic clicktrains have been used to characterize frequency–time kernels (Epping and Eggermont, 1986) and interaural time difference (Melssen and Epping, 1992) in the midbrain of the grassfrog. Wiener originally described mutually orthogonal kernels with respect to a Gaussian whitenoise signal. See Klein et al. (2000) andEggermont (1993) for details on the theoretical background.
To meet the requirement of “spatial” whiteness, sound source directions must be sampled uniformly. One solution for uniform spherical sampling constructs a connected spiral of points on the surface of a sphere (Rakhmanov et al., 1994). Accordingly, in our experiments, the acoustic input is derived from a set of “virtual” sound sources (Brugge et al., 1994, 1996; Reale et al., 1996) that are uniformly positioned along a spiral of 208 points (Fig.1). Formally, these different directions are members of the point set:
A schematic for computing the firstorder kernel by reverse correlating the spacetime signal with an evoked spike is shown in Figure 1. Formally, the Poisson impulse train approaches a Gaussian distribution when the pulse train is smoothed for a large number of pulses, thus meeting the assumptions of Wiener's original theory (Krausz, 1975). The output spiketrain can also be defined in terms of time intervals of width ΔT, such that at most one spike falls within the interval ΔT. The output response can therefore be defined as y(t), where the amplitude is either 1 or 0 on the t ^{th} time interval.
A time invariant system can be characterized or identified with successively higherorder orthogonal kernels, beginning with the zero^{th} order, which is simply the average over the output spiketrain. The firstorder kernel models the “memory” of the neuron for the stimulus history (the “transfer function” of the neuron) in terms of a linear filter, and provides an optimal linear leastsquare approximation to the true transfer function. Higherorder kernels can identify nonlinear characteristics of the transfer function; however, they cannot generally be equated specifically to the quadratic and cubic terms of the Volterra series (Marmarelis and Marmarelis, 1978). In principle, arbitrary nonlinearities can be captured by introducing kernels of sufficiently high order into the system description of the neuron. However, estimating the parameters of the higherorder kernels also requires orders of magnitude more data, so in practice most studies, like the present one, only identify the linear part of the input–output function of the neuron. Formally, the kernels are given by:
In any case, the firstorder kernel can subsequently be used to generate “predictions” of the neural response to a given stimulus set using a discrete convolution of the spacetime kernel and stimulus:
Using the spatial noise stimuli described above, we derived firstorder system kernels for 144 single units in field AI of five cats. The time it takes to characterize a unit in this way depends on the mean discharge rate of the unit, but for one stimulus level we typically required recordings of the responses to at least 25 min of spatial noise. The intensity level was typically set to maximize the response rate of the neuron; occasionally several intensity levels were tested. Future studies will examine in detail the effects of sound source intensity on the structure of spacetime kernels. Electrode penetrations were restricted to regions of AI in which the best frequencies were in the range of 14–22 kHz. All data reported here were recorded at electrode depths ranging from 440 to 1800 μm with respect to the surface of the cortex.
One interpretation of the system kernels is that they are estimates of the posterior probability distributions for a sound having occurred at particular spacetime coordinates given that a spike was observed. To obtain a continuous spacetime probability density, which can be graphically rendered and ultimately used more analytically (Jenison, 2000), some form of approximation (modeling) of the spacetime kernels was necessary. We recently demonstrated a methodology for modeling auditory space receptive fields (Jenison et al., 1998) that employs spherical (von Mises) basis functions. The von Mises basis function appears as a localized “bump” on the sphere. A set of von Mises basis functions, centered on each point on the spiral set of virtual sound sources (Fig. 1), affords global interpolation between response measurements at different spherical coordinates. This approach was extended to interpolate the receptive field dynamics in the time domain using Gaussian basis functions (localized bumps on a line). Smoothing on joint spherical and Euclidean coordinates is nontrivial, and this advancement may indeed have applications in other fields, such as meteorology, that also study spherical dynamics. This new approach incorporates basis functions in both space (von Mises) and time (Gaussian) to leastsquares fit the observed system kernel, and the fit is regularized using a smoothness constraint (Poggio and Girosi, 1990). This optimization procedure generates a continuous approximation of the system kernel of the neuron in spacetime. When visualized as a volume spanning the dimensions of azimuth, elevation, and time, the kernel provides a description of the spacetime receptive field.
RESULTS
Spacetime receptive fields
Figure 2 illustrates the modeled auditory spacetime receptive field derived from responses of the single AI unit shown in Figure 1. An isoprobability surface from the spacetime receptive field takes the form of a “skin” in Figure2 A, which represents a surface contour of the underlying threedimensional probability distribution. This threedimensional rendering shows a striking evolution of the receptive field from a narrow region of spatial selectivity at tens of milliseconds before spike generation to a broadened region resembling a torus nearer in time to spike discharge. The 10 msec noise bursts that constitute our stimulus events set an artificial lower bound on the temporal resolution of the kernels. This artifact is responsible for the fact that the kernel shown in Figure 2 A extends slightly into the noncausal region beyond time 0.
To assess the reliability of the estimated spacetime receptive fields, we used a well known technique (Efron and Tibshirani, 1993) to bootstrap measured spike trains and obtain a sampling distribution with an average and SE. Replicates from the originally measured spiketrains were used to estimate 100 spacetime receptive fields. In this analysis, crosssectional renderings (Fig. 2 B) are useful graphical displays, which can be generated for any twodimensional plane because the modeled receptive field is continuous in both space and time. The bootstrapped average and SE for each of these crosssections are shown in Figure 2 C. The observed SE generally increases as the mean increases at each position in spacetime, as expected from a Poisson process. This example illustrates that the shape of spacetime receptive fields can be estimated with good confidence. A minority of neurons (23) was dropped from the database as a result of failing to demonstrate statistically reliable estimates of space receptive fields. This was typically because of insufficient spike counts.
Spacetime inseparability
Spacetime kernels can be classified as spacetime separable or inseparable depending on whether the kernel (considered as a joint probability distribution) is equal to the outer product of its marginal distributions. In the separable case, the spatial marginal is equivalent to the static receptive field of the neuron, whereas the temporal marginal is proportional to the poststimulus time histogram. These distributions could be determined independently, and together they would nevertheless provide a complete description of the separable receptive field. In contrast, for an inseparable kernel, the outer product of the temporal and spatial marginals fails to reconstitute the kernel (Fig. 3). Features that run obliquely through a kernel are always lost when the kernel is decomposed into marginals. Oblique (or diagonal) elements are therefore a characteristic of nonseparable kernels. Such oblique patterns can be observed in the kernel shown in the crosssectional displays of Figures4 A, 5 A.
Issues pertaining to separability of receptive fields have been recognized by auditory (Gerstein et al., 1968; Eggermont et al., 1981) and visual neurophysiologists (DeAngelis et al., 1995). However, formal inferential statistics were not used in these studies to test the significance of the observed inseparability of dimensions. We used the powerdivergence (PD) statistic (Read and Cressie, 1988) to test the difference between observed and expected (productofmarginals) distributions. The PD statistic is asymptotically χ^{2} distributed with the mean equal to the degrees of freedom (df), and the variance is equal to twice the degrees of freedom. The degrees of freedom for the PD statistic are roughly equivalent to the number of spacetime cells in the firstorder kernel spacetime matrix. When the sampling density in space and time is large, the PD statistic also asymptotically follows a normal distribution (Osius and Rojek, 1992). For example, the dimensions of space and time appear to be visually inseparable based on inspection of Figure 3. The statistical test supports this observation (PD = 23,788; df = 20,493; p < 0.01), which allows rejection of the null hypothesis that the space and time dimensions are separable. Approximately 14% (17 of 121) of AI units recorded in this study evidence significant spacetime inseparability. The degree of separability of spacetime receptive fields does not appear to depend either on the best frequency of the neuron or its depth within cortex, based on an examination of PD as a function of best frequency (r = −0.0579; p > 0.05) and electrode depth (r = −0.0187; p > 0.05). Additionally, in 12% of 41 positions in which two single units were recorded simultaneously on the same electrode (26 electrode penetrations), we found one of the units in the pair to be separable and the other inseparable. Figure 4 shows an example of such a discordant receptive field pairing; the PD for the unit shown in Figure4 A differs significantly from the unit in Figure4 B (z = 10.76; p < 0.01). Of the 36 concordant pairs, 6% were both inseparable, and 94% were both separable.
One interesting possibility is that the features of spacetime receptive fields may be indicative of a tuning to the angular velocities of a moving stimulus. Figure 5(same unit as shown in spiral coordinates in Fig. 3) illustrates this by showing complementary simulated stimulus trajectories through a measured inseparable (A) and separable (B) spacetime receptive field. To the right of each receptive field crosssection is shown the predicted instantaneous spike rate plotted as a function of time for each trajectory. These predicted responses are shown for several repeated passes through the receptive field with two different angular speeds (dotted vssolid) for two opposing path directions (black vsred). The steeper slope corresponds to the greater speed. In the case of the inseparable spacetime receptive field, the predicted pattern of the response differs depending on the trajectory and the speed of the sound source, specifically in the peak of the response and the depth of discharge rate modulation. However, for the case of the separable spacetime receptive field, there is little difference between responses for opposing path directions. Interestingly, the predicted spike rate shows less modulation for the higher speed relative to the lower speed, suggesting that although separable spacetime receptive fields lack trajectory selectivity, they may nevertheless signal changes in speed. Although these simulations illustrate a possible connection between the observed pattern of the spacetime receptive field and motion selectivity, it remains to be observed empirically whether these predictions hold for real soundsource movements. To test such predictions in vivorequires a quasirealtime computation of the spacetime receptive field of a neuron, a procedure that we are actively perfecting.
Evaluation of predicted responses
To demonstrate that the firstorder kernel adequately estimates the spacetime receptive field, the measured response trials are divided into two interleaved sets; one set is used to calculate the kernel and the other set is used to “test” the predictions of the kernel. Figure 6 A shows an example of such a comparison between predicted and measured instantaneous spike rates. The predictions are generated by convolution with the firstorder kernel, as described in Materials and Methods, and followed by a leaky integrator that yields the expected instantaneous rate function. Given the stochastic nature of neural responses, one cannot expect a perfect match but should nevertheless see a good degree of correlation between the measured rate function and its corresponding prediction. Indeed, the measured response is well approximated by the predicted response (r = 0.72 for the 7.5 sec response segment shown in Fig. 6 A). In contrast, Figure 6 B shows a control comparison between the same prediction and a different response segment that had been drawn randomly from the measured set. As expected, in this mismatched or shuffled pair, the prediction failed to capture the peaks and troughs of the measured rate function and yields a correlation coefficient near zero. Crosscorrelation coefficients as a function of lag shifts are shown in Figure 6 C to further quantify the similarity of the predicted to measured rate functions compared with the shuffled rate function. The crosscorrelation function ensures that small phase shifts that are introduced by convolution/integration do not bias the correlation coefficient. The correlation coefficient, of course, only summarizes a linear relationship, and perhaps the addition of higherorder kernels might generate better predictions. Furthermore, the correlation coefficient as a measure of similarity is also quite sensitive to the stochastic noise present in the estimate of the firstorder kernel, which will tend to deflate the correlation coefficient of the measured with the predicted response. Figure 6 D provides summary boxwhisker plots of correlation coefficient distributions for a subset of 12 units. Six inseparable units and six separable units were selected on the basis of having comparable kernel signaltonoise ratios. The distribution of correlation coefficients for each unit summarizes the range of response predictability using the firstorder kernel. Each correlation coefficient is based on a 7.5 sec segment over, on average, 130 segments per unit. The rate functions shown in Figure 6, Aand B, were selected from unit 4.
DISCUSSION
Whitenoise methods have provided valuable tools for revealing response properties that go beyond the level of description afforded by static receptive fields (McLean et al., 1994; DeAngelis et al., 1995;Ringach et al., 1997; DiCarlo et al., 1998; Reich et al., 2000). They can also provide a method to investigate nonlinear interactions, so long as sufficient data can be collected while maintaining reliable responses from a neuron (Marmarelis and Marmarelis, 1978; Sakai, 1992;Eggermont, 1993). Of particular interest to the processing of auditory information are studies that examine spectral dynamics by constructing frequency–time (spectrotemporal) receptive fields. These used a variety of auditory stimuli, including broadband complex sounds with sinusoidal spectral profiles referred to as moving “spectral ripples” (Schreiner and Calhoun, 1994; Shamma and Versnel, 1995;Kowalski et al., 1996a,b; Klein et al., 2000), twotones (Brosch and Schreiner, 1997), and natural sounds (Theunissen et al., 2000). However, given the evidence that implicates AI in the neural pathways responsible for the perception of sound source location, an interesting alternative to these frequency–time kernels is one that uses sound source direction, rather than sound frequency, as the independent variable in the stimulus parameter space. This was the object of the present study, and we developed a technique to estimate the shape of auditory receptive fields in two spatial (azimuth and elevation) dimensions and one temporal dimension from firstorder linear kernels derived by whitenoise analysis. Bootstrapping demonstrated the reliability of the receptive field estimates (Fig. 2 C). Furthermore, the predictive power of the firstorder kernels (Fig. 6) supports the legitimacy of these kernels in revealing the nature of the receptive field of the neuron and its spacetime dynamics. The firstorder kernels from some units are better than others at predicting the neural response to spherical whitenoise. This could be caused by different levels of noise inherent in the kernel, although an attempt was made to select kernels with comparable signaltonoise ratios. Alternatively, the differences in predictability may indicate differing demands for inclusion of the higherorder kernels. Given the overlap of the correlation distributions between separability conditions, there doesn't appear to be a clear distinction between the conditions in terms of the strength of firstorder kernel prediction. Where one of the separable kernels (unit 7) does indeed do a remarkable job of predicting the neural response, another separable kernel (unit 12) performs rather poorly.
Using conservative quantitative inferential criteria, we produced strong evidence for the existence of a distinct subpopulation of neurons in AI showing spacetime inseparability in the firstorder kernel. This subpopulation represented a fairly modest proportion of neurons that we recorded in AI (∼14%). In primary visual cortex, almost 50% of neurons were reported to show spacetime inseparability when the sample population was restricted to simple cells (McLean et al., 1994). Various neural mechanisms may account for the observed inseparability of dimensions, including directiondependent adaptation and postexcitatory or inhibitory rebounds (McAlpine et al., 2000). Our method is sensitive to all of these effects, but it does not allow us to distinguish between them. In a series of simulations, we have illustrated that spacetime inseparability may be indicative of the sensitivity of a neuron to the direction of sound motion (Fig. 5). Motion of a sound source is a ubiquitous feature of the acoustic environment and has stimulated both psychophysical (Middlebrooks and Green, 1991; Grantham, 1997; Perrott and Strybel, 1997; Saberi and Hafter, 1997) and neurophysiological (Altman, 1988) investigations into the neural mechanisms involved in motion processing. However, a precise definition of motion selectivity has been difficult to pin down in the auditory literature. Nevertheless, these and subsequent studies have shown that several simple aspects of sound motion are reflected in the auditory neural code (Toronchuk et al., 1992; Wagner and Takahashi, 1992; Spitzer and Semple, 1998).
The present findings do not support a strong clustering or segregation of spacetime inseparable units within AI. Our sample represented only high bestfrequencies (14–22 kHz) distributed unevenly among the cortical layers, and within this sample we found no evidence that spacetime separability might be distributed systematically as a function of best frequency or recording depth. Furthermore, in cases in which pairs of single units were simultaneously recorded at a single electrode site, both separable and inseparable receptive fields were observed. Nevertheless, it is well known that under experimental conditions of general anesthesia, responsive neurons are found predominantly in the middle cortical layers. This was also the case in the present study in which 65% of the neurons were recorded at depths between 600 and 1200 μm. Thalamocortical projections from the ventral division of the medial geniculate body appear to produce their heaviest terminations at these depths (Huang and Winer, 2000). The strong input from this division to layers III and IV may account for the proportion of separability reported; however, the complexity of convergence to these layers may still provide the opportunity for the emergence of spacetime inseparability. It is possible that inseparability is dependent on converging projections and that intrinsic projections among the other layers might yield proportions different from those observed in the middle layers. These are important considerations for further investigation, which would certainly include nonprimary auditory fields.
How successful our spacetime receptive fields really are at characterizing the sensitivity of a neuron to true motion stimuli remains to be seen. If motion selectivity, as commonly defined in the vision literature (DeAngelis et al., 1995), is indeed manifest in the observed inseparability of the space and time dimensions, then our data would point to the existence of a relatively small subset of motionsensitive neurons in AI. In any case it is well to remember that the demonstration of a preferred sound source trajectory, speed, or both does not necessarily imply that the underlying neural circuitry was constructed, or is used, solely for the purpose of auditory motion analysis. Our method does not allow us to pinpoint the mechanisms underlying the observed spatial receptive field dynamics, but it would seem most likely that these are a manifestation of the sensitivity of a neuron to dynamic changes in the binaural spectra that accompany the movement of a sound source in space.
Footnotes

This work was supported by National Institutes of Health (NIH) Grant DC03554 (R.L.J), Defeating Deafness, Dunhill Medical Research Trust Fellowship (J.W.H.S.), and NIH Grants DC00116 and HD03352 (J.F.B.).
Correspondence should be addressed to Dr. Rick L. Jenison, Department of Psychology, 1202 W. Johnson Street, University of Wisconsin, Madison, WI 53706. Email:rjenison{at}facstaff.wisc.edu.