Abstract
Birds and mammals exploit interaural time differences (ITDs) for sound localization. Subsequent to ITD detection by brainstem neurons, ITD processing continues in parallel midbrain and forebrain pathways. In the barn owl, both ITD detection and processing in the midbrain are specialized to extract ITDs independent of frequency, which amounts to a pure time delay representation. Recent results have elucidated different mechanisms of ITD detection in mammals, which lead to a representation of small ITDs in high-frequency channels and large ITDs in low-frequency channels, resembling a phase delay representation. However, the detection mechanism does not prevent a change in ITD representation at higher processing stages. Here we analyze ITD tuning across frequency channels with pure tone and noise stimuli in neurons of the barn owl's auditory arcopallium, a nucleus at the endpoint of the forebrain pathway. To extend the analysis of ITD representation across frequency bands to a large neural population, we employed Fourier analysis for the spectral decomposition of ITD curves recorded with noise stimuli. This method was validated using physiological as well as model data. We found that low frequencies convey sensitivity to large ITDs, whereas high frequencies convey sensitivity to small ITDs. Moreover, different linear phase frequency regimes in the high-frequency and low-frequency ranges suggested an independent convergence of inputs from these frequency channels. Our results are consistent with ITD being remodeled toward a phase delay representation along the forebrain pathway. This indicates that sensory representations may undergo substantial reorganization, presumably in relation to specific behavioral output.
Introduction
The brain exploits interaural time differences (ITDs) of sound waves for auditory orientation (Colburn et al., 2006; Grothe et al., 2010). ITDs are first detected by brainstem coincidence detector neurons (Carr and Soares, 2002), after which ITD processing continues along a parallel midbrain and forebrain pathway.
Tuning to ITD emerges when binaural signals are delayed such that the internal delay compensates for the external delay (Fig. 1A). This was first suggested by Jeffress (1948) and is best supported through the barn owl, where axonal delay lines compensate for external delays creating an ITD map (Carr and Konishi, 1990; Köppl and Carr, 2008). Because timing is compared across narrow frequency channels, noise delay tuning is cyclic and ambiguously signals ITD (Wagner et al., 1987, 2007; Peña and Konishi, 2000). This ambiguity is eliminated in the external nucleus of the inferior colliculus (ICX), where frequency channels converge, but neurons conserve the tuning to a single frequency-independent delay (Takahashi and Konishi, 1986; Mazer, 1998; Peña and Konishi, 2000).
If a neuron responds maximally at a single ITD independent of the stimulus frequency, it is said to exhibit a characteristic delay (CD) (Fig. 1A,B).
In mammals, growing evidence suggests that ITD detection is inconsistent with the Jeffress model (McAlpine et al., 2001; Brand et al., 2002; Pecka et al., 2008). For instance, the Jeffress model does not predict the emergence of frequency-dependent ITD tuning, which is yet typical of mammalian brainstem and midbrain neurons.
If ITD tuning changes linearly across frequency, this could be parametrized as a constant phase [characteristic phase (CP)] and a constant time delay [characteristic delay (CD)] (Yin and Kuwada, 1983) (Fig. 1C). The characteristic phase determines a neuron's relative firing level between minimal and maximal response at its characteristic delay.
In the owl's ICX, characteristic phases were close to zero, suggesting a pure time delay representation (Fig. 1B). In contrast, frequency-dependent ITD tuning was typically observed in mammals both in single neurons as nonzero characteristic phases (Yin and Kuwada, 1983; Batra et al., 1993; Spitzer and Semple, 1995; McAlpine et al., 1998) and across populations (McAlpine et al., 2001; Joris et al., 2006).
While the anatomical and frequency organization of the inputs determines the frequency dependence of ITD tuning at the detection stage, it is unclear whether this affects ITD representations at later processing stages.
Interestingly, recent findings in the owl indicated deviations from pure delay representation in the thalamus (Pérez and Peña, 2006) and the auditory arcopallium (AAR; Fig. 1C) (Vonderschen and Wagner, 2009).
The AAR represents the endpoint of the auditory forebrain pathway projecting directly to the midbrain ICX and to motor nuclei (Cohen et al., 1998). Besides having direct control of sound localization behavior, the forebrain pathway is involved in top-down control and auditory attention tasks (Cohen and Knudsen, 1996; Knudsen and Knudsen, 1996; Winkowski and Knudsen, 2006; Reches and Gutfreund, 2008). AAR responses to varying ITD are frequency-dependent and biased toward contralateral-leading ITDs (Vonderschen and Wagner, 2009).
Thus, ITD detection must be separately considered from the later downstream representation of ITD, and the representation of ITD may differ in the different pathways leading to motor output, the midbrain and the forebrain pathways. The mechanisms underlying the creation of characteristic phases in the AAR are not well understood. The data presented in the following are a first attempt to unravel the underlying neural computations.
Materials and Methods
Owl handling.
Data from eight barn owls (Tyto alba) taken from the institute's breeding colony were included in this study. The animals were of either sex. Surgical procedures have been described in detail previously (Vonderschen and Wagner, 2009). Briefly, owls were implanted before experiments with a head piece for stereotactic control. Owls were kept under anesthesia during all surgical interventions (15 mg/kg of ketamin, 1 mn/kg of diazepam, 0.065 mg/kg of atropine sulfate) and received analgesics (0.06 mg/kg of buprenorphine). During recordings, anesthesia was kept light. In each owl, stereotactic coordinates for the auditory arcopallium were established with respect to the optic tectum (Cohen and Knudsen, 1995). Electrolytic lesions were made in two owls to verify the recording area (cf. Vonderschen and Wagner, 2009). After the experiments, owls were kept in a monitoring box for 12 h and then returned into their home aviaries. They were allowed to recover for 10–14 d between experiments. All procedures were approved by the Landespräsidium für Natur, Umwelt und Verbraucherschutz Nordrhein-Westfalen, Recklinghausen, Germany, and complied with the National Institutes of Health guidelines for animal experimentation.
Electrophysiology.
Extracellular recordings in the AAR were obtained with epoxylite-insulated tungsten microelectrodes (9–12 MΩ, FHC). Electrodes were advanced into the brain with a custom-built microdrive. Electrophysiological signals were preamplified (custom-built device), amplified, and bandpass-filtered (300–5000 Hz, M Walsh Electronics), digitized (25 kHz, AD1, Tucker-Davis Technologies), and stored on a PC. Semiautomatic spike sorting based on cluster analysis (BrainWare, Jan Schnupp, Tucker-Davis Technologies) was performed on-line and refined for analysis off-line. In 87%, clusters could be attributed to single units. In the remainder of units, we could not exclude contributions from multiunits. Multiunits were pooled with single units, since we did not observe physiological differences. It has been suggested that neurons in the AAR are organized in clusters of neurons with similar physiological properties (Cohen and Knudsen, 1999). Experiments were conducted in an anechoic chamber (IAC 403A, Industrial Acoustic). We used white noise bursts (0.1–20 kHz) and tone pips of 100 ms length with 5 ms cosine start and end ramps for dichotic stimulation. Signals were sampled at 50 kHz, digital-to-analog converted, attenuated, antialias-filtered (DA3-4, PA4, FT6, System II, Tucker-Davis Technologies), power-amplified (AX-590, Yamaha), and presented through calibrated earphones (MDR-E831LP, Sony).
To detect neural activity, we played noise bursts of varying ITD while slowly advancing the electrode. For each neuron, we first sampled the ITD range typically between ±270 μs in steps of 30 μs using an interaural level difference (ILD) of 0 dB. Via on-line analysis, we obtained a first estimate of the noise-delay curve. We call the ITD at which the neuron responded maximally the best ITD. The neuron's interaural level difference tuning was assessed between −20 and 20 dB in steps of 1 dB at the best ITD. If the neuron's best ILD was different from zero, we recorded the noise-delay curve again, holding the ILD at the neuron's preferred value. The neuron's frequency tuning was probed by playing tones of frequencies between 500 and 9500 Hz in steps of 500 Hz while keeping the ITD and ILD constant at the neuron's preferred values. We refer to this curve as the isolevel frequency tuning or, briefly, the frequency-tuning curve. Stimuli were presented between 5 and 10 times in blockwise random order. Because AAR neurons were more responsive to noise than to pure tones, five trials were normally used with noise stimuli and 7–10 with pure tones. Intertrial interval was 1 s. In addition to the noise-delay curves, tone-delay curves were obtained from some neurons in an analogous way. Tone pips were interaurally delayed using ITDs that regularly sampled one stimulus period. Typical stimulation frequencies had periods that were integer multiples of 30 μs: 2381, 2564, 3030, 3333, 3704, 4167, 4762, 5555, 6666, or 8333 Hz. For very high (>6667 Hz) or very low (<2381 Hz) frequencies, a more adequate ITD sampling step was chosen. The sampled ITD range for tones included a minimum of one period of the stimulation frequency. All stimuli were tested at ∼30 dB above the neuron's response threshold as assessed from binaural rate level functions obtained with noise stimuli.
Analysis.
All off-line analysis was done using self-written Matlab routines (MathWorks). Baseline firing rates were assessed by averaging the spike rates recorded in a 400 ms window before stimulus onset. Noise-delay and tone-delay curves as well as frequency-tuning curves were obtained by averaging the spike counts across trials in a 100 ms response window after stimulus onset. The window was time-shifted by the response latency defined as the time lapse between stimulus onset and half maximal response level in a peristimulus time histogram with optimized binwidth. The optimal binwidth was defined as the one that caused minimal variation in latency estimates computed from surrogate peristimulus histograms obtained through bootstrap resampling of the original spike arrival times (Friedman and Priebe, 1999).
Characteristic phase and characteristic delay.
Characteristic delay and characteristic phase are parameters widely used in auditory research to describe the across-frequency ITD sensitivity of neurons (Yin and Kuwada, 1983).
Classically, CD and CP estimates have been derived from neural responses to varying ITDs obtained with pure tone stimuli (i.e., tone-delay curves). Tone-delay curves tend to be cyclic due to the periodic nature of the sinusoidal signal and the binaural coincidence detection mechanism that performs a cross-correlation of the binaural signals. The resulting periodic curves can be represented in a simple model as cosine functions of ITD and frequency according to the following: r(ITD, f) = cos[2πf (ITD − delay)], where r is the neural response measured as average spike count across trials, ITD defines the interaural time difference at which the stimulus is presented, and f refers to the frequency of the pure tone stimulus.
An illustration of a family of ITD tone-delay curves generated from this equation for different frequencies is shown in Figure 1Aii for a fixed delay. The delay describes a shift of the curves along the ITD axis, which can be further subdivided.
Delays can be composed of a pure time delay (Fig. 1B), the CD, and a delay that can be derived from the CP (Fig. 1C) as follows: r(ITD, f) = cos[2πf (ITD − CD) − CP].
The ITDs causing maximal responses are a linear function of the period: where CP is in units of cycles and best ITD denotes the ITD at which the response is maximal. Thus, in a unit in which ITD tuning to different frequencies can be described by a CP ≠ 0 and a CD, the best ITDs are a hyperbolic function of f as can be seen in Figures 2A, 6A, and 9A. If CP = 0, best ITDs are equal to the CD across all frequencies (Fig. 1B).
By multiplying both sides with the stimulus frequency, we obtain an equivalent expression for the interaural phase difference (IPD). It follows that the best IPD is a linear function of frequency with the characteristic delay corresponding to the slope and the characteristic phase representing the offset, as represented in the following equation: Linear phase-frequency relations are illustrated for the cases CP = 0 in Figure 1B (bottom) and CP ≠ 0 in Figure 1C (bottom).
In cases where the best interaural phase differences are not a linear function of frequency, the characteristic delay and phase are not defined.
Characteristic delay and characteristic phase from tone-delay curves.
To estimate the neuron's characteristic phase and delay, we used circular statistics to calculate the best interaural phase difference (best IPD) from each tone-delay curve (Batschelet, 1981) as follows: where IPD is the stimulus ITD divided by the stimulus frequency and r(IPD) represents the response at each IPD. IPD data points were only incorporated in the unit's phase-frequency curve if phase tuning was significant by the Rayleigh test (p < 0.001) (Mardia, 1972; Batschelet, 1981). Based on Equation 2, we performed a regression through the IPD data points, which yielded the CD and CP estimate, respectively. The linear model was judged suitable if the root mean squared error (RMSE) was unlikely to occur by random data based on a bootstrap method (p < 0.005) (cf. Yin and Kuwada, 1983). We note that, while this test criterion has been used in many follow-up works (Takahashi and Konishi, 1986; Yin and Chan, 1990; Joris, 1996; Batra et al., 1997; Pecka et al., 2008) and will be used here, it does not preclude other relations than linear relations, a point we consider in more detail later.
Composite curves.
We used composite curves to test whether inputs from different frequency channels were linearly combined. Composite curves were computed as the average across frequencies of all significantly tuned tone-delay curves recorded in a neuron (Yin et al., 1986; Yin and Chan, 1990). Similarity of the composite curve with the noise-delay curve was assessed by the Pearson correlation coefficient r, and r2 was used to express how much of the variation could be explained by a linear model of across-frequency integration.
Characteristic delay and characteristic phase from noise-delay curves.
Assuming a linear system, the noise-delay function can be explained entirely as a linear combination of tone-delay curves. In this case, the phase–frequency curve measured by tone-delay curves would be equivalent to the phase spectrum of the noise-delay curve. Hence, we used discrete Fourier transforms of noise-delay curves as an alternative method to estimate the phase–frequency relation. Fast Fourier transforms (FFT) of noise-delay curves were calculated using the Matlab algorithm (fft) after offsetting the curve to zero mean. Noise-delay curves were zero-padded to a 64 sample signal, which allowed approximating the frequency resolution of 500 Hz steps used to assess the amplitude spectrum in frequency-tuning curve recordings. Amplitude spectra were offset by each unit's mean spontaneous rate and scaled to the maximum response in the unit's frequency-tuning curve. Phase spectra were corrected by a constant to account for the part of the signal that extended into the negative range of ITDs. As the Rayleigh test was inadequate here as a test for the significance of the phase tuning, we simply excluded phases of frequencies that contributed <30% of the maximum amplitude. Other thresholds between 20 and 40% yielded similar results (data not shown). Based on the regression over the remaining phase data, we inferred the characteristic delay and phase as described above. The mismatch between the two estimates was quantified by subtracting the amplitude and phase estimates obtained from tone-delay curves from their FFT counterpart (i.e., the amplitude and phase spectra of noise-delay curves, respectively). Because the frequency range was sampled at slightly different points in the two methods, we linearly interpolated the FFT spectra at the respective frequency sample points.
In addition, characteristic delay and phase distributions were obtained from ICX neurons in the midbrain pathway using FFTs of noise-delay curve as described above, thereby allowing for comparison with CD–CP data obtained with tone-delay curves in earlier studies (Takahashi and Konishi, 1986; Wagner et al.1987). These noise-delay curves had been recorded in our laboratory for earlier studies (Wagner et al., 2007) using procedures identical to those described above.
Linearity of the phase–frequency relation.
The phase–frequency relation was considered significantly linear when the RMSE was smaller than expected from random data (cf. Yin and Kuwada, 1983). While this criterion was met in most cases, it represented a liberal test for linearity. By visual inspection, phase–frequency relations in many AAR neurons tended to indicate two linear regimes, one in the low-frequency range and one in the high-frequency range. In neurons that displayed a local minimum in the amplitude spectrum, we used the corresponding frequency to divide the range into a low-frequency and a high-frequency range. Predictably, the RMSE obtained with two linear regimes tends to be smaller than the one obtained using a single linear regime. However the reduction in RMSE should be larger if the data deviate systematically from a single regression model compared with random deviations from that model. To check for a significant reduction in RMSE, we computed probability density functions for the reduction in RMSE using 1000 bootstrap runs for each set of data. Surrogate datasets were created by randomly shuffling the order of the residuals from the single regression model and recalculating the RMSE obtained with two regressions, one through the low-frequency range and one through the high-frequency range, thereby obtaining a probability distribution of the decrease in RMSE (ΔRMSE) under the assumption that no systematic deviations from the linear model were present. The model of two regression regimes was considered superior if the probability of observing a given ΔRMSE was <0.01.
Model neurons.
As proof of concept for using Fourier transforms of short noise-delay functions to estimate phase and amplitude spectra, we created model units that behaved as linear integrators across frequency. Each model unit was assigned a linear phase–frequency relation based on a chosen CD–CP pair as well as a flat frequency-tuning curve. Frequencies were sampled in steps of 500 Hz between 0 and 31.5 kHz (64 sample points) with amplitude values equal to 1 between 500 and 8500 Hz and 0 everywhere else. We experimented with different shapes of frequency-tuning curves but chose to present the data from flat frequency-tuning curves for different reasons: (1) they have conceptual advantages over the more physiologically realistic ones; and (2) our error assessment of the method will be maximal for the rectangular frequency-tuning curve due to edge effects in Fourier transforms of the signals and therefore represents the most conservative estimate. The CD and CP pair defined a unit's best IPD at each frequency (see Eq. 2). Tone-delay functions were modeled as cosine functions with amplitudes and phase offsets defined by the frequency and phase tuning, respectively. Noise-delay curves were simulated simply by taking the average of the tone-delay functions. This process was equivalent to taking a real valued inverse Fourier transform of the phase and amplitude spectrum, but allowed us to evaluate the resulting noise-delay function at arbitrary ITDs. According to the Fourier theorem for discrete functions, the time steps and signal duration are related to the frequency sampling steps as follows: with k being the running index of the time samples (tk) and n being the running index of the frequency samples (fn).
With a frequency sampling step Δf = 500 Hz and N = 64 samples, the resulting noise-delay function would be sampled over a time window of 2 ms in steps of 31.25 μs. Instead, we evaluated the noise-delay curves between −270 and 270 μs in steps of 30 μs corresponding to the experimental procedures. We calculated Fourier transforms of the simulated noise-delay curves as described for the experimental data to explore the systematic errors in the Fourier estimates arising from the analysis of short noise-delay curves. We further used the simulated noise-delay curves to test the effect of zero padding the ITD signals. Omitting zero padding or using zero padding to various signal lengths affected exclusively the frequency resolution of the FFT spectra. All other results remained qualitatively unchanged (data not shown).
Results
This study is based on extracellular recordings of 290 forebrain neurons in the AAR of the barn owl. Data were obtained from eight owls. In two animals, the recording sites were verified through histological lesions as reported by Vonderschen and Wagner (2009).
Characteristic phases in auditory forebrain neurons
Sensitivity to ITD first emerges in brainstem coincidence detector neurons that cross-correlate narrow-band inputs from the two ears (Carr and Konishi, 1990; Fischer et al., 2008). In a simple approach, the input may be regarded as sinusoidal, resulting in an output resembling a cosine function. Upon stimulation with tones, auditory forebrain neurons that inherit ITD sensitivity from lower processing stages should display ITD tuning properties similar to those observed in brainstem units. Consistent with this prediction, we found that neurons in the AAR displayed cyclic tone-delay functions to different tone frequencies (Fig. 2A,E). Neurons were tuned to ITD across a broad range of frequencies, including low frequencies (<3 kHz), as illustrated in the two example neurons. The best ITDs changed as a function of frequency in a regular manner in the first neuron and in a more irregular way in the second neuron. The progression of best ITD across frequencies can be parametrized into a frequency-independent component, the characteristic delay, which represents a shift of all tone-delay curves away from 0 (Fig. 2A,E, black lines), and a frequency-dependent delay component that can be derived from the CP (see Eq. 2), which describes which response phase, ranging from the neuron's maximal response (phase 1) to the minimal response (phase 0.5), is displayed at the characteristic delay. To estimate the characteristic delay and phase, we computed the IPD that the neuron was tuned to from each periodic tone-delay curve, thereby obtaining each neuron's phase-frequency relation (Fig. 2D,H). Characteristic delay and phase correspond to the slope and offset, respectively, of the linear regression of the phase–frequency data (see Materials and Methods). Significance of the linear fit was assumed if the mean squared error was unlikely to occur by random phase data (bootstrap test; p < 0.005). Both example neurons exhibited a relatively small frequency-independent ITD tuning component (CD of 20 and −1 μs, respectively) compared with the physiological ITD range of ±250 μs and a large frequency-dependent component (CP of 0.29 and 0.38 cycles, respectively), where the range of possible CPs is ±0.5 cycles.
If inputs from different frequency channels were combined in a linear way in a neuron, the noise-delay function should be predictable from the sum of the tone-delay functions and, furthermore, the general shape of the noise-delay function should be predictable from the characteristic phase. For instance, a frequency-independent ITD tuning (CP = 0) predicts that tone-delay curves exhibit a peak at a common ITD across frequencies, which should give rise to a symmetric large peak in the noise-delay function. In contrast, a frequency-dependent component in the ITD tuning (CP ≠ 0) indicates a shift of best ITDs as a function of frequency and should result in an asymmetric peak in the noise-delay function (compare Fig. 1B,C). To test this, we computed the composite curves (i.e., the normalized sum of the tone-delay curves) (Yin and Chan, 1990), and compared them to the normalized noise-delay function (Fig. 2B,F). In both cases, the composite curve was a good match of the noise-delay function (r2 = 0.8 and 0.53, respectively). As expected from the CPs in both neurons, the corresponding noise-delay functions displayed an asymmetric shape featuring one steep slope in the center of the ITD range and stronger responsiveness to positive (contralateral ear leading) ITDs.
As the linear across-frequency integration seemed a fair model, we next tested whether linear decomposition by fast Fourier analysis would enable us to assess a faster and better-sampled estimate of a neuron's phase–frequency relation. Prior to fast Fourier transformation, noise-delay curves were zero padded to a 64 sample signal to yield adequate frequency resolution (∼512 Hz) in the amplitude and phase spectrum. The phase spectrum in turn would allow us to estimate the frequency dependence as described before. We found that the amplitude spectrum captured the range of frequency responses seen in the frequency-tuning curve obtained with tones. Similarly the phase spectrum obtained by FFT of noise-delay curves was a good match to the phase–frequency relation of the first example neuron (Fig. 2C,D), even though its linear phase–frequency relation differed slightly in the low-frequency and high-frequency ranges. Consequently, the derived CD and CP were similar to the estimates based on tone-delay functions (34 μs and 0.21 cycles vs 20 μs and 0.29 cycles). In the second example neuron, Fourier decomposition of the noise-delay curve revealed two important aspects. First, the neuron's responsiveness to low frequencies and high frequencies seen in the tone-delay curves was captured in the FFT amplitude spectrum, whereas the measured frequency-tuning curve only reflected the responsiveness to high frequencies (Fig. 2G). In this case the FFT amplitude spectrum seemed a better estimate of the neuron's frequency tuning. Second, while the IPD data points that estimated the phase–frequency relation were all close to the FFT phase data points, the enhanced frequency sampling of the latter indicated two linear regimes in the phase–frequency spectrum separated by a phase jump (Fig. 2H). Although the phase–frequency relation was significantly linear and similar to the one estimated from tone-delay curves (CD = 2 μs, CP = 0.29 cycles vs CD = 1 μs, CP = 0.38 cycles), it could be more accurately described by two regressions, one in the low-frequency range (CD = 58 μs and CP = 0.2 cycles) and a second in the high-frequency range (CD = −1 μs and CP = 0.3 cycles) (Fig. 2E,H, violet and yellow lines), a point we will further explore later on.
The observed deviations between FFT spectra and the recorded frequency-tuning curve were quantified by subtracting the frequency-tuning curve from the FFT amplitude spectrum after normalizing them to the same dynamic range (Fig. 3A). A mismatch was observed in the low-frequency range where the FFT amplitude spectra yielded systematically higher power as reflected by the positive mean error (Fig. 3A, bottom) and the negative correlation of that error with frequency (r = −0.35; p < 0.01). Presumably, this was due to measuring the frequency-tuning curves at a fixed ITD. If ITD tuning was frequency-dependent in a neuron, its frequency response range was not captured accurately by that method. Instead the amplitude spectrum of the noise-delay curve turned out a better match to the maximal responses obtained in tone-delay curves (Fig. 3B). The average error pooled over frequencies between these two curves was significantly smaller than the average error between FFT amplitude spectra and frequency-tuning curve (−0.06 in B compared with 0.13 in A; U test; p < 0.01) and was only weakly correlated to frequency (r = −0.14; p < 0.05). Moreover, the error variation around the mean was smaller in B than in A (averaged SD after subtracting mean at each frequency band: ±0.25 in B and ±0.28 in A; Ansari–Bradley test; p < 0.05). We mention in passing that a change of best ITD with sound intensity could also contribute to the observed mismatch in frequency tuning assessed with noise and tones. But in the owl, this seems unlikely, as ITD tuning was robust against intensity changes both in the ITD detector neurons in the brainstem and in the AAR (Cohen and Knudsen, 1995; Peña et al., 1996). Importantly, the FFT phase spectra obtained from noise-delay curves on which CD and CP estimates were based were a good match of best IPD values computed from tone-delay curves (Fig. 3C). Phase errors were frequency-independent (r = −0.06, p = 0.31) and small on average (−0.009 cycles) with an average standard deviation of ±0.1 cycles.
Overall, the similarities in recorded-tuning curves and FFT spectra lend support to our basic assumption that across-frequency integration in AAR neurons can be adequately described as a linear process such that the neuron's frequency response and its noise-delay function approximate Fourier pairs.
The characteristics of ITD tuning across frequencies seen in the example neurons were representative of the population of AAR neurons we used for recording. One or several tone-delay functions were recorded in 92 AAR neurons. In 73 neurons, tuning to ITD in response to tone stimuli was significant (Rayleigh test; p < 0.001). Best IPDs were a linear function of frequency (bootstrap test; p < 0.005) in 32 out of 37 neurons in which tone-delay functions had been assessed at a minimum of four different frequencies. In these neurons, the composite curves computed from the tone-delay curves explained 0.6 ± 0.18 (r2, mean ± SD) of the variance in the respective noise-delay functions.
Most of the 32 AAR neurons (Fig. 4, light gray symbols) were tuned to both a time and a phase delay with 50% of the CP values falling between 0.03 and 0.25 cycles (percentiles at 25 and 75% here and onwards). The CD distribution was narrowly centered within the owl's physiological range with 50% of the values ranging between −20 and 12 μs.
We confirmed these results by analyzing phase–frequency relations based on FFTs of noise-delay functions in a large population of AAR neurons. Noise-delay functions were recorded in 290 AAR neurons. In 279 neurons, the phase spectra were significantly linear such that CD and CP could be estimated (Fig. 4, black symbols). CDs were distributed narrowly around zero with the median at −2 μs and 50% of the values between −21 and 26 μs (Fig. 4A). In contrast, the CP distribution was centered away from zero around a median of 0.15 cycles with 50% of the data ranging between 0.04 and 0.21 cycles (Fig. 4B). CD and CP estimates obtained in the same neuron by both methods – from tone-delay functions and from FFTs of noise-delay curves – correlated significantly (r = 0.55, p < 0.01 for CDs; r = 0.37, p < = 0.04 for CPs; n = 30; Fig. 4, insets). Moreover, the CD and CP distribution assessed by the two methods were statistically similar (p > 0.05, Kolmogorov–Smirnov test, two-sample Kuiper's test, respectively).
The finding of frequency-dependent delay tuning (CP ≠ 0) in AAR neurons was unsuspected in the barn owl considering that midbrain neurons in the external nucleus of the ICX feature linear phase–frequency relations with intercepts at zero and hence represent pure time delays (Takahashi and Konishi, 1986).
To exclude any bias of our FFT-based method, we computed CD and CP distributions from FFTs of 77 noise-delay functions recorded in ICX (Fig. 4, bottom). CDs were distributed across positive (contralateral ear leading) ITDs around a median of 26 μs and 50% of the data within −4 and 54 μs. CPs were near zero (median at 0.02 cycles) and 50% of the values ranging between −0.05 cycles and 0.07 cycles. The prevalence of frequency-independent ITD tuning in ICX neurons confirmed previous findings (Takahashi and Konishi, 1986). Both CD and CP distributions for ICX differed significantly from the distributions obtained for AAR neurons (p < 0.01; Kolmogorov–Smirnov test and two-sample Kuiper's test, respectively).
Contributions of low-frequency and high-frequency ranges to ITD sensitivity
The finding of frequency-dependent ITD tuning in the owl forebrain was surprising, not only because this has not been observed in the midbrain, but also because its origin in the auditory system raises an intriguing question. One obvious difference between the forebrain and the midbrain pathway is the integration of inputs that convey ITD sensitivity in the low-frequency range (<3 kHz) in the forebrain (cf. Fig. 2, also Pérez et al., 2009; Vonderschen and Wagner, 2009). ICX neurons typically respond to frequencies >3 kHz. In other words, they exhibit high-pass characteristics (Wagner et al., 2007). Therefore, a major processing step in the forebrain pathway involves the convergence of ITD sensitivity in the high-frequency and the low-frequency ranges. Interestingly, a majority of neurons (230 out of 279) exhibited two maxima in the amplitude spectrum of the noise-delay curve (Fig. 5A,B), the low-frequency maximum at ∼2 kHz and the high-frequency maximum at ∼6 kHz. The local minimum occurred at ∼3.5 kHz (Fig. 5C). Moreover, many phase spectra were discontinuous around the frequency at which the local minimum occurred (Fig. 5B). These neurons appeared to display distinct CDs and CPs in the low-frequency and the high-frequency ranges. To quantify these band-limited CDs and CPs, we computed band-limited regressions over two regimes of the phase data, one in the low-frequency range and one in the high-frequency range. In each neuron, the low-frequency range was separated from the high-frequency range by the frequency at which the local minimum in the amplitude spectrum occurred. In some neurons the phase–frequency relation was consistent with a single CD and CP (Fig. 5A). Consequently, the root mean squared error was not reduced substantially by a fit using two band-limited regressions instead of a single regression over the entire frequency range. In the majority of neurons, however, the discontinuity in the phase spectrum was more pronounced and the band-limited regressions reduced the RMSE significantly (bootstrap test, p < 0.01) (Fig. 5B,D).
Linear model for across-frequency integration
The local minima in the amplitude spectra and discontinuities in the linearity of the phase–frequency spectra at corresponding frequencies might indicate converging inputs conveying band-limited ITD tuning. Could they alternatively represent a systematic effect of the Fourier decomposition of noise-delay curves? According to the interdependencies of sampling in the time and frequency domain in discrete Fourier analysis, noise-delay curves would ideally have been assessed over a range of 2 ms in steps of 31.25 μs to yield frequency sampling steps of 500 Hz as used when assessing frequency tuning with pure tones. Yet we evaluated ITD tuning over the narrower physiological range of ±270 μs in steps of 30 μs. Thus, the short ITD curves can be considered the product of the ideal signal multiplied with a rectangular window of 540 μs width. To systematically test the spectral estimation errors introduced by using relatively short ITD curves, we created simulated noise-delay curves from linear integrator model units provided with arbitrary phase–frequency functions (Fig. 6). If noise-delay functions were sampled over their entire length (64 samples between ±1000 μs), the FFT yielded the exact amplitude and phase spectrum that was used to create the curve (Fig. 6, blue symbols) regardless of whether the model neurons were provided with one broadband or two band-limited CD–CP pairs. This was expected, since the synthesis of noise-delay functions from cosine curves corresponded to a real valued inverse Fourier transform. We note that observed differences in the phase spectrum at frequencies of amplitude 0 can be neglected (Fig. 6E,J).
However, if noise-delay curves were evaluated over a short range of ITDs corresponding to the experimental conditions (±270 μs in steps of 30 μs), deviations in amplitude and phase spectrum could be observed (Fig. 7A). In this case, the FFT amplitude spectrum could not reproduce the sharp edges of the original flat spectrum, but instead appeared as a low-pass filtered version of the original spectrum and leaked into the range of frequencies above 8.5 kHz, which had not contributed to the original signal. This spectral leakage was primarily caused by the rectangular window multiplied onto the ideal ITD signal (i.e., the short range of ITDs over which the curve was evaluated) (Harris, 1978; Smith, 1999). Notably, deviations in the phase spectrum were minor. Only phases at very high frequencies (>8.5 kHz) corresponding to frequencies at which spectral leakage was obvious did not reflect the actual phase–frequency relation of the original signal. CD and CP estimates remained mostly unaffected as the deviating phase data corresponded to frequencies that contributed <30% of the maximal amplitude and were therefore excluded from the phase–frequency regression.
We note in passing that the use of more sophisticated window functions than simple rectangles is a standard method in digital signal processing to improve estimates of amplitude spectra. However, we refrained from this method as it turned out to cause a deterioration of the phase spectra and consequently caused a further deterioration of the CD–CP estimates.
Since we had observed prominent minima in the amplitude spectra as well as corresponding changes in linearity of the phase–frequency relation in many AAR neurons (compare Fig. 5), we next used our model neurons to test whether a change in the linearity of the phase–frequency relation would further affect the FFT amplitude spectrum. To address this question, we systematically introduced discontinuities in the phase–frequency relation at 3500 Hz corresponding to a change in CD and CP between low-frequency and high-frequency ranges. Interestingly, we found systematic effects in the amplitude spectrum when introducing phase jumps in the phase–frequency relation such that the CP in the high-frequency range changed while the CD was kept constant (Fig. 7B). Phase jumps caused a local minimum in the FFT amplitude spectrum that deepened with increasing phase difference. A phase jump of 0.5 cycles resulted in zero power at that frequency in the amplitude spectrum. This phenomenon can be regarded as a combined effect of spectral leakage and interference. The amplitude spectrum is estimated at a frequency between two frequencies that contributed to the original signal. Hence it will be an average of the two contributions. Due to the phase difference of 0.5 cycles, these frequencies interfere negatively, causing an absolute minimum (Fig. 7B). The estimated phase at that frequency fell in between the neighboring phases, thereby smoothing out the phase jump. Additional variation of the CD had no effect on the minimum in the amplitude spectrum (Fig. 7C).
These model data demonstrate that phase jumps in the phase–frequency spectrum will appear smoothed out in the FFT estimate of the phase spectrum but cause local minima in the amplitude spectrum. The model predicts a negative correlation between size of the phase jump (between 0 and 0.5 cycles) and the relative amplitude at that frequency. Indeed, in Fourier transforms of AAR noise-delay curves, the normalized amplitude at the local minimum was negatively correlated with the absolute phase jumps (Fig. 7D; r = −0.35; p < 0.001; n = 230). The relation predicted by the model was an upper bound to 93% of the data. The model relation can only be approached as an upper bound, as the model was provided with flat amplitude spectra of 1 across all frequencies, whereas flat amplitude spectra were not present in the real data. The negative correlation of phase jumps and amplitude minima at frequencies of ∼3.5 kHz confirms that low-frequency and high-frequency inputs converge in the forebrain pathway, contributing each a band-limited CD–CP pair.
ITD sensitivity in different frequency bands across the population of neurons
We next assessed CDs and CPs in the low-frequency range and high-frequency range across the population of 230 AAR neurons that featured a local minimum in the amplitude spectrum (Fig. 8). The CD distribution in both frequency ranges was similar (Kolmogorov–Smirnov test; p > 0.05) and independent (r = 0.14; p > 0.05). Ninety percent of all values fell within a range that was much smaller than the physiological range of the barn owl (i.e., between −92 and 76 μs for low frequencies and between −137 and 114 μs for high frequencies compared with a physiological range of ±250 μs (Fig. 8A). However, the distribution of CPs in the high-frequency and low-frequency ranges was significantly different (Kuiper's test; p < 0.01), and uncorrelated (r = −0.04; p > 0.05). CPs at high frequencies spanned the entire range with an almost uniform distribution, while the distribution of CPs in the low-frequency range was skewed and narrow around a median of 0.21 cycles (Fig. 8B). As explained above, some systematic errors were associated with FFTs of relatively short noise-delay functions. To assess the accuracy in our CD–CP estimates, we computed the differences between true and estimated CD–CP pairs for model noise-delay functions with flat frequency spectra in the broadband range (0.5–9 kHz; Fig. 8C), the low-frequency range (0.5–3.5 kHz; Fig. 8D), and the high-frequency range (4–9 kHz; Fig. 8E). Each arrow in Figure 8C–E starts at the true CD–CP pair of a model unit and points to the estimated CD–CP pair and hence indicates size and direction of the error in the CD–CP plane. Errors between true and estimated CD–CP pairs were negligible for the broadband as well as for the high-pass CD–CP estimates in the parameter space comprising 90% of the data (Fig. 8C,E, green squares). However, for the low-frequency range, a substantial number of CD–CP combinations in and outside the area comprising 90% of the AAR data resulted in CP and CD estimates that differed clearly from the true CP and CD (Fig. 8D, black arrows).
From Equation 1, it is clear that different combinations of CDs and CPs can lead to the same best ITD value at a given frequency. We indicated the iso-ITD lines connecting all CD–CP combinations that lead to a peak ITD at 0 and ±250 μs for each range's center frequency (Fig. 8C–E, black lines). This illustrates that true and estimated CD–CP pairs in our model data got systematically shifted in parallel to the iso-ITD lines. Thus the ITD conveyed by a CD–CP pair was accurately represented, although the accuracy of CD–CP estimates in the low-frequency range was relatively coarse. The CD–CP pairs across all frequency bands conveyed mainly sensitivity to contralateral leading ITDs (data points up to and right of the zero iso-ITD lines). Finally we calculated the best ITDs conveyed according to the band-limited CD–CP pair and each neuron's best frequencies in the low-frequency and high-frequency ranges (compare Eq. 1). As expected from the CD–CP distributions, we found that high frequencies conveyed sensitivity to small values of ITD, whereas the low frequencies conveyed sensitivity to large values of ITD across the population of AAR neurons (Fig. 8F).
Discussion
This study shows that ITD information is substantially remodeled along the barn owl's forebrain pathway such that neurons at its endpoint exhibit frequency-dependent ITD tuning, although they receive inputs from a frequency-independent representation in the central nucleus of the inferior colliculus (ICC). Tone-delay curves and Fourier decomposition of noise-delay curves revealed that high-frequency channels conveyed sensitivity to small contralateral leading ITDs, whereas low-frequency channels conveyed tuning to large contralateral leading ITDs. In the following, we shall discuss the neural mechanisms allowing the emergence of frequency-dependent ITD tuning in the owl compared with the mechanisms discussed in mammalian systems.
ITD detection and ITD representation
How the brain detects ITDs and represents them in a meaningful way remains a controversial subject (Joris and Yin, 2007; Köppl and Carr, 2008; Grothe et al., 2010). In mammals, ITD detection and representation typically depend on frequency, whereas the detection of ITD in the avian nucleus laminaris and its representation in the midbrain pathway is frequency-independent.
ITD detection in the sense used here describes the process by which ITD is extracted in the nervous system. ITD representation, in contrast, refers to the subsequent information-processing steps. Thus, although ITD representation relies on the signals created during ITD detection, it is important to realize that the remodeling in the ascending auditory pathway is not limited by the detection mechanism. Specifically, a frequency-independent ITD detection as occurring in the barn owl's nucleus laminaris may be remodeled into a frequency-dependent representation, as we demonstrate here, for the forebrain pathway. Likewise, remodeling of the ITD representation may occur in the owl's midbrain. While most neurons in the ICC seem restricted to represent best delays within one half of their characteristic period (Fontaine and Brette, 2011), convergence of inputs creates ITDs outside the so-called π-limit in the ICX (Wagner et al., 2007). Whether similar remodeling occurs in mammals is unknown.
Frequency-dependent delays in single neurons
In single neurons, the representation of ITDs may be described by CD and CP (Eq. 1). The Jeffress model proposes axonal delays that generate a variety of CDs at 0 CP for excitatory–excitatory (EE) inputs and 0.5 CP for excitatory–inhibitory (EI) inputs (Goldberg and Brown, 1969). Experimental evidence supports the realization of this model in birds (Sullivan and Konishi, 1986; Carr and Konishi, 1990). Coincidence detectors receive EE inputs such that CPs scatter near zero (Köppl and Carr, 2008). In contrast, in the mammalian superior olive coincidence detection on EE, EI or excitatory-inhibitory-excitatory inputs generate frequency-dependent ITD tuning with widely distributed CPs (Yin and Chan, 1990; Spitzer and Semple, 1995; Joris, 1996; Batra et al., 1997; Fitzpatrick et al., 2002).
CDs and CPs at higher processing stages of the mammalian auditory system (dorsal nucleus of the lateral lemniscus, Pecka et al., 2008; inferior colliculus, Yin and Kuwada, 1983; Yin et al., 1986; Kuwada et al., 1987; Batra et al., 1993; and auditory cortex, Fitzpatrick et al., 2000) result from converging inputs from the detection stage (McAlpine et al., 1998; Fitzpatrick et al., 2000; Shackleton et al., 2000; Agapiou and McAlpine, 2008). In barn owls, nonzero CPs in the ICC have been reported in one publication (Moiseff and Haresign, 1992). Regardless of this, the barn owl midbrain pathway appears dedicated to preserve pure time delays. In ICC, tonotopic layers (>3 kHz) organized in arrays of neurons represent array-specific CDs (Wagner et al., 1987; Takahashi et al., 1989). Across-array convergence onto ICX neurons results in a pure time delay representation based on CDs with CPs ranging near zero (Takahashi and Konishi, 1986).
Recent findings indicated that low-frequency information substantially impacts ITD processing in the forebrain (Pérez and Peña, 2006; Vonderschen and Wagner, 2009). Thus, a parsimonious explanation for the emergence of nonzero CPs at the endpoint of the forebrain pathway lies in the remodeling of information through convergence of inputs from a broad frequency range (Fig. 9A).
Integration of ITD-tuned inputs in AAR neurons is frequency-dependent such that ITD and frequency convergence involves low-frequency input (<3 kHz) with large contralateral ITDs and high-frequency input with small contralateral ITDs (Fig. 9B,C).
Frequency-dependent ITD representation in neural populations
Could the frequency dependence of ITD tuning in the AAR be inherited from such lower processing stages as the ICC, the gateway nucleus giving rise to the midbrain and forebrain pathways? The distribution of ITDs in ICC narrows with increasing frequency, with most, but not all, best ITDs lying within the π-limit (Wagner et al., 2007). This is consistent with the scarceness of large ITDs in high-frequency channels, but is not in line with the absence of small ITDs in low-frequency channels in AAR. Thus we surmise that ITD information is partially discarded along the forebrain pathway.
In the medial superior olive and ICC of at least some mammalian species, best ITDs are inversely related to best frequency such that neurons tend to represent a constant interaural phase (McAlpine et al., 2001; Brand et al., 2002; Hancock and Delgutte, 2004; Joris et al., 2006; Pecka et al., 2008; but see Bremen and Joris, 2011). Several mechanisms may underlie this way of detecting ITD. Apart from axonal delays, cochlear delays were proposed to underlie frequency-dependent ITD tuning (“stereausis”; Shamma et al., 1989; Joris et al., 2006; Day and Semple, 2011). Precisely timed glycinergic inhibition shaped ITD detection in small mammals (Brand et al., 2002; Pecka et al., 2008; Leibold, 2010), constituting a third possible mechanism. In owls, the necessary frequency mismatches assumed by the stereausis model are absent (Peña et al., 2001; Köppl and Carr, 2008; Fischer and Peña, 2009; Singheiser et al., 2010b), and there is no evidence for shifts in delay tuning by inhibition (Fujita and Konishi, 1991; Mori, 1997; Funabiki et al., 1998).
Thus, our data suggest that the frequency-dependent ITD tuning observed in AAR and reminiscent of what is observed in the mammalian brainstem emerges in higher processing stages. In other words, while the Jeffress model yields a time delay code on early levels of processing, this does not restrain the representation at later levels, where it may be reshaped.
Optimality and decoding of ITD representations
Optimality considerations predicted two regimes of ITD representation based on head size and an animal's physiological frequency range (Harper and McAlpine, 2004). Above a critical frequency (3 kHz in the owl), a sparse representation in a sensory map was proposed. This map of auditory space is well established in the ICX and optic tectum of the barn owl (Knudsen and Konishi, 1978a,b; Knudsen, 1982). Below the critical frequency, the representation should change into a population code, in which the relative activity in two populations represents ITD. This prediction did not hold for the owl's midbrain ICC, where many ITD channels are represented at both low and high frequencies (Wagner et al., 2007). However, it was consistent with our observations in AAR, where low-frequency channels contribute large ITDs. On the other hand, high-frequency channels in AAR were restricted to small ITDs, which is at variance with the predicted optimal sparse code. Harper and McAlpine (2004) quantified optimality based on the Fisher information extracted from ITD tuning curves in a population of model neurons. Yet, throughout the evolution of neural codes, optimality is likely based on the decodability of sensory representations by downstream motor areas and subsequently on the performance in relevant behavioral tasks (see Wagner et al., 2007). Thus, if the behaviors controlled by midbrain and forebrain pathways differ, the restraints on optimal coding differ, too. In fact, optimal tuning functions may reflect the variability of the motor output they support, according to a theoretical study (Salinas, 2006). While the midbrain pathway uses high-frequency auditory cues to govern head saccades (Knudsen et al., 1993; Wagner, 1993), the forebrain pathway has been implicated in low-frequency sound localization (Cohen and Knudsen, 1996; Pérez and Peña, 2006; Vonderschen and Wagner, 2009; Singheiser et al., 2010a), memory-based orienting (Knudsen and Knudsen, 1996), novelty detection (Reches and Gutfreund, 2008), and top-down control (Winkowski and Knudsen, 2006). These functional differences supposedly affect the optimality criteria for the sensory representations leading to a frequency-dependent representation of ITD in the forebrain pathway and a frequency-independent representation of ITD in the midbrain pathway.
Meanwhile, the actual impact of a change in ITD tuning functions on the neural code remains unclear. Some neurons in the owl's thalamus bear similarity to AAR in that they represent ITDs in the low-frequency and high-frequency ranges. Notably, these neurons did not display strong changes in their auditory spatial receptive fields compared with those in the midbrain ICX (Pérez et al., 2009). While we cannot exclude the possibility that the transformation to frequency-dependent ITD tuning observed in AAR is functionally irrelevant, our data offer a strong incentive to test functional decoding of ITDs in behavioral experiments.
A substantial rearrangement of ITD information in the forebrain pathway may also be present in other species: a study in humans reported a switch in the lateralization percept that was unpredicted based on the population activity in the inferior colliculus (Thompson et al., 2006). These results suggest the possibility of ITD representation reorganization in the mammalian auditory pathway as well.
Footnotes
This study was supported by the Deutsche Forschungsgemeinschaft. We thank Dr. Laurent Calmes for helpful comments throughout all phases of the work, as well as Dr. Christian Leibold and two anonymous reviewers for constructive criticism on the manuscript.
- Correspondence should be addressed to Katrin Vonderschen, Institut für Biologie II, RWTH Aachen, Kopernikusstraße 16, D-52056, Germany. katrin{at}bio2.rwth-aachen.de