Abstract
The relative arrival times of sounds at both ears constitute an important cue for localization of low-frequency sounds in the horizontal plane. The binaural neurons of the medial superior olive (MSO) act as coincidence detectors that fire when inputs from both ears arrive near simultaneously. Each principal neuron in the MSO is tuned to its own best interaural time difference (ITD), indicating the presence of an internal delay, a difference in the travel times from either ear to the MSO. According to the stereausis hypothesis, differences in wave propagation along the cochlea could provide the delays necessary for coincidence detection if the ipsilateral and contralateral inputs originated from different cochlear positions, with different frequency tuning. We therefore investigated the relation between interaural mismatches in frequency tuning and ITD tuning during in vivo loose-patch (juxtacellular) recordings from principal neurons of the MSO of anesthetized female gerbils. Cochlear delays can be bypassed by directly stimulating the auditory nerve; in agreement with the stereausis hypothesis, tuning for timing differences during bilateral electrical stimulation of the round windows differed markedly from ITD tuning in the same cells. Moreover, some neurons showed a frequency tuning mismatch that was sufficiently large to have a potential impact on ITD tuning. However, we did not find a correlation between frequency tuning mismatches and best ITDs. Our data thus suggest that axonal delays dominate ITD tuning.
SIGNIFICANCE STATEMENT Neurons in the medial superior olive (MSO) play a unique role in sound localization because of their ability to compare the relative arrival time of low-frequency sounds at both ears. They fire maximally when the difference in sound arrival time exactly compensates for the internal delay: the difference in travel time from either ear to the MSO neuron. We tested whether differences in cochlear delay systematically contribute to the total travel time by comparing for individual MSO neurons the best difference in arrival times, as predicted from the frequency tuning for either ear, and the actual best difference. No systematic relation was observed, emphasizing the dominant contribution of axonal delays to the internal delay.
- auditory
- cochlear disparity
- interaural time difference
- internal delay
- medial superior olive
- sound localization
Introduction
Sound sources that are not straight ahead or behind travel different distances to both ears. As a result, interaural time differences (ITDs) are created. These ITDs are an important cue to calculate the location of a sound in azimuth, and in humans ITD cues are more important than differences in intensity (Wightman and Kistler, 1992). These computations are performed within the CNS by a specialized system in the brainstem. A central role in these computations is played by the principal neurons of the medial superior olive (MSO), which are excited by inputs from spherical bushy cells (SBCs) of the cochlear nucleus from both the ipsilateral and the contralateral side (Thompson and Schofield, 2000). MSO neurons have a bipolar dendritic orientation, and the ipsilateral and contralateral inputs are segregated to the two main branches of these neurons (Stotler, 1953; Smith et al., 1993). They function as coincidence detectors; their firing probability depends on the relative arrival time of the inputs from both ears; when the inputs arrive coincidentally, the firing probability of the MSO neurons reaches its maximum (Goldberg and Brown, 1969; Moushegian et al., 1975; Yin and Chan, 1990; Spitzer and Semple, 1995; van der Heijden et al., 2013). Importantly, this so-called best ITD (BITD) varies between cells, and is typically different from 0 μs, indicating that the travel time for the signals from both ears to the MSO neurons must be different, with for most neurons a shorter travel time for ipsilateral than for contralateral signals (Goldberg and Brown, 1969; Moushegian et al., 1975; Crow et al., 1978; Spitzer and Semple, 1995; Pecka et al., 2008).
In a classical model by Jeffress (1948), the internal delay was postulated to be entirely due to differences in axonal travel time. Efforts to test this hypothesis have met with mixed success. In birds, there is good evidence that a difference in axonal delay can explain to a large part the distribution of BITDs (Seidl et al., 2010, 2014; Carr et al., 2015). In mammals, evidence is more equivocal (Smith et al., 1993; Beckius et al., 1999; Karino et al., 2011). An alternative to the Jeffress' hypothesis is the so-called stereausis hypothesis, which proposes that differences in wave propagation time along the basilar membrane due to asymmetric innervation can provide the necessary delays (Schroeder, 1977; Shamma et al., 1989; Bonham and Lewis, 1999). This alternative is based on the finite speed of the traveling wave in the cochlea, which travels from the base to the apex. For low-frequency waves in the apex, travel time differences between neighboring locations are large compared with typically measured BITDs. Hence, small interaural differences in characteristic frequency (CF), the frequency for which auditory thresholds are lowest, suffice to create substantial differences in internal delay (Bonham and Lewis, 1999; Joris et al., 2006). Reports of frequency tuning for individual MSO neurons have shown anecdotal evidence for disparities in best frequency (van der Heijden et al., 2013). There is some evidence for frequency-dependent internal delays (Day and Semple, 2011). However, because of the difficulties in recording from MSO neurons, in combination with the often sparse firing induced by monaural sound stimulation, a direct, comprehensive test of the stereausis hypothesis has not yet been performed in mammals.
We recently showed that loose-patch (juxtacellular) recordings can be used to record synaptic inputs to the gerbil MSO neurons in vivo (van der Heijden et al., 2013). Gerbils have unusually good low-frequency hearing, and, in contrast to rats and mice, a well developed MSO (Irving and Harrison, 1967; Rautenberg et al., 2009). This opens up the possibility for a direct test of the stereausis theory. Here, we therefore compared mismatches in frequency tuning with binaural tuning in the same MSO neuron; in a subset of experiments we also compared auditory and electrical binaural tuning, thus testing key predictions of the stereausis theory.
Materials and Methods
Animal procedures.
All experiments were conducted in accordance with the European Communities Council Directive (86/609/EEC), and were approved by the institutional animal ethics committee. Young-adult female Mongolian gerbils with an average body mass of 60 g were anesthetized intraperitoneally with a ketamine-xylazine solution (114 and 17 mg/kg body weight, respectively). Anesthesia was maintained by administrating one-third of the induction volume at regular time intervals; the animals remained in areflexic state throughout the experiment. Rectal temperature was maintained at 37°C using an electrical heating pad.
The surgical approach to the MSO was described in detail previously (Kuwada et al., 1984; Plauška et al., 2016). Briefly, the head was fixed with a metal head-plate and the animal was positioned in supine position. Both pinnae were removed to expose the middle ear cavities and speakers were attached to both ears using a thin tube. The skin, connective tissue, salivary glands, lymph nodes, and muscles covering both bullae were surgically removed. The animal was intubated and kept breathing independently. The right bulla was opened as wide as possible. A hole was also made in the left bulla to maintain the same pressure conditions in both middle ears. A ∼1 mm diameter craniotomy was made to expose the brainstem on the right side. The meninges were left intact. The electrode insertion angle with respect to the craniotomy could be changed using a fixed-pivotal-point, custom-built, positioning device on which the animal was laying during the experiment.
For electrical round window stimulation experiments, both facial nerves were cut to prevent facial muscle activation by the electrical stimulation, and a silver-ball electrode was placed in contact with each round window through a lateral opening in the bulla. Electrodes were fixed to the skull with Histoacryl (Braun) and wax (Sticky Wax, Kerr). Local ground electrodes were placed close to the bony part of the external acoustic meatus.
In vivo loose-patch recordings.
Thick-walled borosilicate glass microelectrodes (4–7 MΩ resistance; 1–1.5 μm tip diameter) were used for loose-patch (juxtacellular) recordings. Most of the recordings were done using pipettes filled with intracellular solution containing the following (in mm): 138 K-gluconate, 8 KCl, 10 Na2-phosphocreatine, 4 MgATP, 0.3 Na2GTP, 0.5 EGTA, 10 HEPES, pH adjusted to 7.2 with NaOH. Less than 12% of the cells reported here were recorded using normal rat Ringer's solution containing the following (in mm): 135 NaCl, 5.4 KCl, 1 MgCl2, 1.8 CaCl2, 5 HEPES, pH adjusted to 7.2 with NaOH. No clear differences could be found between the responses of cells recorded with either solution so the recordings were pooled.
High positive pressure (70–100 mbar) was applied to the pipette during brain surface penetration. After successful penetration the pressure was lowered to 20–30 mbar, and we waited for a few minutes before further advancing the electrode to minimize the impact of brain tissue movements relative to the electrode during recordings.
The somatic layer of the MSO was identified by monitoring local field potentials (Galambos and Schwartzkopff, 1959; Biedenbach and Freeman, 1964; Clark and Dunlop, 1968; Mc Laughlin et al., 2010; “neurophonics”), which were evoked by 2 ms alternating clicks to both ears (Kuwada et al., 1984). Upon reaching the somatic layer, the electrode was advanced in small steps and its resistance was closely monitored. A gradual increase in resistance together with the appearance of electrophysiological activity indicated contact with a neuron. Subsequently the positive pressure on the electrode was released and the electrode was advanced up to 10 μm to establish the loose-patch recording configuration. Recordings were done in current-clamp mode. In case of changes in cell response, the recordings were stopped. Data were acquired using a Multiclamp 700B (Molecular Devices) amplifier using custom software written in MATLAB 7.6.0 (MathWorks).
Auditory stimulation.
Auditory stimuli were generated using custom software written in MATLAB and realized through a 24-bit D/A-channel [RX6, Tucker Davis Technologies (TDT); 111.6 kHz], programmable attenuator (PA5; TDT) and an amplifier (SA1; TDT). Stimuli were delivered to the ear canals in a close-field fashion through Shure speakers (frequency range 22 Hz to 17.5 kHz) and a pair of small (∼11 cm length) tubes. The lowest SPL at which click stimuli evoked neurophonics was considered to be the animal's hearing threshold. Three types of auditory stimuli were used in this study: monaurally and binaurally presented irregular tone complexes and binaurally presented pure tones.
The type of multitone stimulus used in this study (“zwuis”) was described in detail previously (van der Heijden and Joris, 2003, 2006). In short, zwuis stimuli are produced by summating multiple irregularly spaced frequency components with the same amplitude and a random phase, choosing the frequencies in such a way that neither second-order nor third-order distortion products in the response match any of the components themselves. The stimuli in this study contain 30 components spanning a frequency range of 50–3000 Hz.
For monaural stimulation, the zwuis stimulus was presented in 300 ms bursts; its intensity ranged from −10 to 50 dB SPL per component in 10 dB steps; 20 repetitions were presented for each condition; each zwuis stimulus was followed by 100 ms silence; total duration was 58 s. Binaural zwuis, which was presented at the same burst intervals as the monaural stimulus, had systematic time delays between the ears (“noise delay stimulus”). Each condition was presented 20 times. Pure tone stimuli were presented binaurally with different ITDs and frequencies. The frequency range for each pure tone stimulus was centered on the estimated best frequency (BF) of that neuron, and was presented in 100 Hz steps. Each presentation started with a 20 ms silent period, followed by a 70 ms burst and another 50 ms silent period. All conditions were presented 10 times. For both binaural zwuis and pure tone stimuli, ITD ranges depended on the frequency sensitivity of the neuron; stimuli intensity levels were between 20–70 dB SPL. Sound intensity levels for binaural stimuli were chosen to be 20–30 dB above hearing threshold. Stimuli waveforms were recomputed for a different instance of their presentation, but were kept the same within one instance for all different stimulus conditions and repetitions.
To observe secondary peaks in rate ITD (rITD) functions of low-frequency (<700 Hz) neurons, we increased the ITD range of the stimuli. Consequentially, to conserve the recording duration, we increased the stimulus step size; 8 of 68 neurons were stimulated binaurally with ITD steps >0.2 ms up to a maximum of 0.4 ms.
Electrical stimulation.
Electrical pulses for round window stimulation were generated using a homemade bipolar current stimulator. Pulses were 100 μs in duration. Current intensities at both ears were varied between 0.2–0.8 mA with the aim of finding levels at which a subthreshold response was evoked by stimulating at either ear, but suprathreshold responses when stimulating at both ears. Binaural electrical stimulation was presented with time delays ranging from −2 to 2 ms in 0.2 ms steps. For all electrical stimulation recordings, the stimulation window was 100 ms, preceded and followed by a 50 ms period without stimulation. Stimulus frequencies ranged from 20–120 pulses/s. No obvious differences in binaural stimulation were obtained at the different frequencies, and responses were therefore pooled within a cell.
MSO cell admission.
Our previous study showed that loose-patch (juxtacellular) recordings are suitable to resolve synaptic events in MSO neurons (van der Heijden et al., 2013). We observed that the best quality loose-patch recordings were made when the seal resistance was between 20 and 70 MΩ; at lower resistances the contribution of field potentials became too high, higher resistances caused strong waveform filtering. In this study only recordings with seal resistances between 20 and 70 MΩ were accepted as valid loose-patch recordings.
Each new stimulus block was preceded by a silent period of 1 or 5 s. These baseline periods were used to judge recording stability. Details of the method were presented in Plauška et al. (2016). Briefly, a power spectrum of prestimulus baselines was estimated for all recordings from a cell. Its value at 1 kHz, which predominantly reflected the spontaneous activity of the neuron, had to remain within a 5 dB window for the stimulus block to be included in the analysis.
Action potentials were detected offline based on a threshold criterion for the maximum repolarization rate of individual events (van der Heijden et al., 2013). Only cells for which histograms of negative peak sizes showed clear bimodality were accepted for further analysis. For rITD functions, the first 10 ms of the response to each stimulus presentation were excluded from analysis to avoid onset effects.
Responses evoked by electrical round window stimulation were more difficult to analyze because of the presence of larger contamination by field potentials, presumably due to the hypersynchronous excitation. The analysis window was restricted to latencies of 2.5–8 ms from the contralateral electrode. At shorter latencies unambiguous AP identification was typically not possible.
Experimental design and statistical analysis.
To test the stereausis hypothesis we investigated whether BITD and CF mismatches were systematically correlated. Apart from the criteria given in the previous section, to be included in this comparison cells had to exhibit significant differences in the means of responses to different time delays in their rITD functions, which was assessed with a single-sided ANOVA test (function anova1 in MATLAB) with the threshold for binaural sensitivity at p < 0.01. For composite rITD functions, a dynamic range criterion was defined as half of the difference between the sums of the two highest and the two lowest spike counts in the response. Based on visual inspection of all composite rITD functions, only composite rITD functions with a minimum dynamic range of 10 action potentials were accepted. To obtain BITDs, rITD functions were fit with a modified Gabor function:
where, g(p, x) =
, τ is time (ms); A, offset (spikes/s); B, amplitude (spikes/s); τ0, peak time of the envelope (ms); w, width of the envelope (ms); f, frequency (kHz); ϕ0, start phase of cosine (cycle), the power parameter p modifies the envelope of the Gabor function to account for the typically asymmetric peak and trough relationship. BITD was defined as the ITD of the dominating peak in the case of wideband rITD functions, and the ITD of the central peak for composite rITD functions (Yin and Chan, 1990). If fits of the wideband rITD functions with the modified Gabor function could account for <80% of the variance in the data, typically because of low numbers of spikes, the wideband rITD functions were not included in further ITD-related analysis.
To obtain CF, responses to monaural zwuis stimuli were processed using Fourier spectral analysis. From the power spectrum, the baseline spectrum, obtained from the 1 s period preceding and following the stimulus, and an additional 5 dB were subtracted. The subtraction of the baseline spectrum serves to compensate for the spectral coloring caused by the finite bandwidth of individual elementary events responsible for the neural activity. If the events themselves were randomly timed, i.e., originated from a generating process having a flat spectrum, the spectrum of the resulting waveform equals that of the elementary event. More generally, the resultant waveform is a convolution of the waveforms of the point process and that of the elementary event, and their long term power spectra multiply (Campbell, 1909; Fesce, 1990; Ashida et al., 2013). The information about CF is contained in the former (the timing of the events, not their shape), which is retrieved by dividing out the baseline spectrum, i.e., subtracting it in the log domain. Monaural receptive fields were built by interpolating intermediate values (MATLAB function contourf). Responses to monaural stimulation were fit with a weighted sixth order polynomial. Only responses with >10 (one-third of total) signal components above the noise floor (i.e., signal-to-noise ratio >1), were used. The noise floor was obtained from the spectral analysis of the recordings in the absence of sound stimuli. BFs were defined as the fit peak; CF as the BF at the lowest sound intensity. The contour plots, polynomial fits and thresholding are illustrated in Figure 5.
Estimates for the SD of the CF mismatch or of the BITD were obtained using bootstrap methods. The 20 responses to identical stimulus presentations were randomly divided in two groups of 10; BITDs for both groups were computed as described above, and the difference D between these two BITD estimates was computed. This procedure was repeated N = 25 times using independent random divisions, resulting in N values for D. The reported SD (STD) for the BITD estimates based on the complete set of 20 responses is equal to:
which is based on applying the same procedure to sets of 20 independent numbers drawn from a normal distribution having unity variance. SDs for CF mismatches were obtained by the same method, but now simultaneously subdividing the responses to repeated contralateral and ipsilateral stimulation.
We used a t statistic to test whether Pearson's r differed significantly from zero (function corrcoef in MATLAB).
CF mismatches were converted to predicted BITDs as described in the next section, and the correlation between BITDs and predicted BITDs was assessed using a bootstrap analysis as described in the Results and Figure 9.
Conversion of CF mismatches to predicted BITDs.
Ipsilateral and contralateral CF estimates were first converted to cochlear location (distance from the base) using the tonotopic map for the gerbil cochlea (Müller, 1996). A given interaural mismatch DX (in m) in cochlear location corresponds to a travel time difference DT equal to:
where c(CF) is the phase velocity of the traveling wave of frequency CF at its best site, which is related to wavelength λ at CF by:
Values of either c(CF) or λ(CF) in the apex were obtained from studies reporting large populations of auditory nerve responses to identical stimuli in chinchilla (Temchin et al., 2012, their Fig. 3), cat (van der Heijden and Joris, 2006, their Fig. 9B), and guinea pig (Palmer and Shackleton, 2009, their Fig. 5B at 50 dB SPL). The wavelength dependence on CF was found to be similar across species (Fig. 1A), supporting the application to the gerbil, for which no data are available, and showed an approximately linear relation between λ and log(CF). We therefore pooled the data from the three studies and used linear regression to characterize this dependence. Application of Equations 3 and 4 produced the predicted BITD from the measured CF mismatches. Note that the steepness of this dependency is greater for lower CFs (Fig. 1B).
Cochlear time delays. A, Traveling wave wavelength dependence on the characteristic frequency in the cochlea for three different species: chinchilla (Temchin et al., 2012), cat (van der Heijden and Joris, 2006), and guinea pig (Palmer and Shackleton, 2009); circles, squares, and triangles, respectively). Solid line shows the linear fit for all three species. B, Theoretical cochlear time delay dependence on frequency mismatch between the ears for six different characteristic frequencies. Relationships were derived from linear fit in A.
Results
The stereausis theory makes several clear predictions. First, there should be distinct mismatches in frequency tuning for ipsilateral and contralateral inputs, where the expectation would be a general bias for contralateral tuning to be to lower frequencies than ipsilateral tuning, thus creating an extra cochlear delay for contralateral sounds. A second prediction is that the central internal delay obtained by direct electrical stimulation of both cochleae at varying latencies can be different from the total internal delay for acoustic stimuli as reflected by the BITD for the same neuron. A third prediction is a systematic correlation between ITD tuning and frequency tuning mismatches for individual cells. We used a tailored, multitone stimulus to investigate the distribution of mismatches in frequency tuning in MSO neurons using loose-patch recordings, and to compare in individual neurons these mismatches with binaural tuning; in a subset of experiments we also compared auditory and electrical binaural tuning.
ITD tuning
We compared frequency and ITD tuning of principal neurons in the MSO of anesthetized gerbils using loose-patch recordings with the aim of testing whether there exists a systematic relation between the two, as predicted by the stereausis model. We will first present an overview of the ITD tuning, which was studied using binaural stimulation with irregularly spaced multitone (zwuis) stimuli or with simple tone stimuli. Figure 2A shows an example of the response of an MSO neuron to binaural zwuis stimulation, illustrating prominent subthreshold activity and action potentials (asterisks). Figure 2B shows action potential rates at different ITDs for a low-frequency MSO neuron. This rITD function (rITDf) was obtained by systematically varying the time delay between the zwuis stimuli to both ears. Spike rates during these measurements were 25 ± 27 sp/s (mean ± SEM; range 0.3–129 sp/s; N = 68). The rITDf was fit with a modified Gabor function (see Materials and Methods) from which we extracted the BITD as the ITD at which the fit function had its maximum. The vertical bar in Figure 2B indicates the BITD of the neuron, which was 0.17 ms. On average, BITDs were 0.12 ± 0.12 ms (Fig. 2C; N = 68), indicating a bias for contralateral ear leading, as observed previously for both gerbils (Spitzer and Semple, 1995; Brand et al., 2002; Pecka et al., 2008) and other species (Goldberg and Brown, 1969; Moushegian et al., 1975; Crow et al., 1978). The gray area in Figure 2C demarcates the ecological ITD range of gerbils, which is ∼±0.13 ms (Maki and Furukawa, 2005). More than half (57%) of the BITDs fell within the ecological ITD range.
Determining BITDs of MSO neurons. A, Loose-patch (juxtacellular) recording of an MSO neuron during binaural stimulation with 300 ms zwuis stimulus (gray bar) at 30 dB SPL. The gray portion of the waveform is shown at higher time resolution below, revealing subthreshold events and action potentials (*) during stimulation. B, Example of a rITDf, showing firing rate as a function of ITD (positive ITD values: contralateral leading). Circles indicate measured spike rates; gray line is a cubic spline through the data points. BITD was 0.17 ms (vertical line). Same cell as A. C, Cumulative histogram plots of BITDs. The gray area indicates the physiological ITD range for a gerbil (±0.13 ms). In 60 of 68 cells, BITDs were biased toward contralateral ear leading.
To obtain an alternative estimate for the BITD, we also presented binaural pure tone stimuli at frequencies around the BF. In the example shown in Figure 3A, the different frequencies ranged from 0.4–0.9 kHz, each of which was presented in 0.2 ms steps between −2 and 2 ms at a stimulus intensity of 40 dB SPL. Different traces in Figure 3A correspond to spike count ITD functions in response to different frequencies. This neuron was most sensitive to 0.7 and 0.8 kHz tone stimuli, as expected from its BF, which was 0.78 kHz. We summed all the spike count ITD functions to produce a composite ITD function (Fig. 3B; Yin and Chan, 1990). The BITD was obtained as the most central peak (vertical bar, 0.11 ms) of the Gabor fit (black line). A comparison of BITDs obtained with zwuis and pure tone stimuli showed good agreement for most cells (N = 26; Fig. 3C; r = 0.85; p < 0.0001).
Composite ITD curves from tonal data. A, Superimposed tonal ITD curves at six different frequencies. Average BF of the neuron at 40 dB SPL was 0.78 kHz. Line thickness indicates tone frequency, varying in 100 Hz steps from 400 (thickest line) to 900 Hz. B, Composite ITD curve (circles) was obtained by adding the six tone responses shown in A. Solid line is fit with a Gabor function. Vertical line indicates the BITD (0.11 ms). C, Wideband ITD curve (circles) and fitted Gabor function (solid line) for the same cell as shown in B. Vertical line indicates BITD (0.04 ms). D, Comparison of BITDs from wideband rITDfs and from tonal stimulation composite ITD curves (N = 26; r = 0.85; p < 0.0001). Gray line indicates identity; only BITDs with estimated SD <0.25 ms were used. Highlighted symbol corresponds to the cell shown in B and C.
Binaural MSO sensitivity to electrical round window stimulation
One prediction of the stereausis hypothesis is that ITD tuning may be different for auditory stimuli and when the cochlea is bypassed by electrical stimuli. To test this prediction, we recorded MSO responses to round window electrical stimulation, and compared the responses to those obtained by auditory stimulation within the same cell (Fig. 4A). Electrical stimuli were presented both monaurally and binaurally. We kept increasing current intensity for each cochlea individually until we reliably started seeing subthreshold responses upon monaural stimulation and action potentials during binaural electrical stimulation. In some cells we also varied stimulation rate; however, this did not obviously affect BITD estimates and responses obtained at the same set of current intensities were pooled (see Materials and Methods). Current stimulation typically evoked a complex response consisting of a series of peaks and troughs with latencies ranging from <3 to >8 ms. The short-latency peaks varied little between trials and generally did not seem to evoke action potentials, suggesting that they were field potentials originating from more proximally located areas. Figure 4B shows an example of the MSO response to electrical round window stimulation. Blue and red traces correspond to ipsilateral and contralateral ear responses, respectively. Gray traces show responses to binaural stimulation at the electrical BITD for this recording. Black traces show averages of 10 repetitions under the same condition. The comparison of monaural and binaural stimulation illustrates that summation of electrically evoked synaptic potentials at a latency of ∼5.6 ms evoked the most action potentials for this cell during binaural stimulation. Because of the large size of the field potentials and electrically induced movements at high stimulus intensities it was possible in only 3 of 18 MSO cells to compare an electrical BITD with the auditory BITD (Fig. 4C). For these three cells, CFs were not available. The minimum sound-evoked latencies (7–9 ms) were clearly longer than the electrical latencies (3.5–6 ms) in the same experiments. For all three cells there was a mismatch between auditory and current-evoked BITD: −0.12 versus −0.32 ms; cell 2, −0.01 versus 0.21 ms; cell 3, 0.38 versus 0.12 ms. These data therefore suggest that a difference in ipsilateral and contralateral cochlear delay can make a substantial contribution to ITD tuning, as predicted by the stereausis hypothesis.
Binaural MSO sensitivity to electrical round window stimulation. A, Schematic representation of the two stimulation methods. Blue and red trapezoids symbolize ipsilateral and contralateral cochleae, respectively. Speaker icons indicate sound stimulation at the base of cochlea; bolt icons indicate electrical stimulation which bypasses the traveling wave. Two arrows pointing toward MSO cell represent neural pathways converging onto MSO. B, Monaural and binaural MSO responses to electrical round window stimulation. Red, blue, and gray traces show responses to contralateral-only, ipsilateral-only, and both ear stimulation, respectively. Each set of traces shows five instances of individual responses and the black trace is an average of a total of 10 repetitions. The three groups of traces were displaced with respect to each other in the vertical direction for visual clarity. The asterisk on the bottom trace indicates the location of two evoked action potentials. Arrows indicate the beginning of stimulus; stimulus artifacts were cut out for demonstrational purposes. C, Comparison of rITDfs from wideband auditory and electrical round window stimulation data. Circles indicate data points, lines are interpolated values. The left ordinate shows how many spikes on average were evoked by a single electrical stimulus, thus providing the current stimulation rITDf. The right ordinate show the spike rates evoked by the auditory stimulus (auditory rITDf). BITDs for the three cells from left to right (auditory vs electrical): −0.12 versus −0.32 ms, −0.01 versus 0.21 ms, 0.38 versus 0.12 ms.
Frequency tuning
Stereausis critically depends on a difference in frequency tuning for ipsilateral and contralateral sound stimulation. We therefore compared frequency tuning for both ears using zwuis stimuli, presented at intensities ranging from −10 to 50 dB SPL per component. Figure 5A shows spectral amplitude components at the different frequency components of the monaural zwuis stimulus for an MSO neuron; solid lines show the fits with a polynomial function. From these responses, the ipsilateral and contralateral receptive fields of this cell were constructed (Fig. 5B). CF was defined as the peak of the fit to the response function at the lowest stimulus intensity at which both ipsilateral and contralateral ears responded with at least 10 significant components above a signal-to-noise ratio of 1. For this cell, this intensity was 10 dB SPL, yielding CFs for ipsilateral and contralateral stimuli of 0.74 and 0.78 kHz, respectively (Fig. 5A, vertical lines). Frequency tuning generally resembled the frequency tuning of SBCs (Caspary et al., 1994; Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011). Mean CF was typically lower for deeper cells (results not shown), in agreement with the tonotopic organization of the MSO (Goldberg and Brown, 1968; Guinan et al., 1972; Day and Semple, 2011; Karino et al., 2011; Franken et al., 2015). The observed distribution of CFs was similar to frequency tuning based on binaural stimuli in some earlier studies (Brand et al., 2002; Day and Semple, 2011), whereas in other studies in the gerbil MSO cells that were tuned to much higher frequencies were observed (Pecka et al., 2008; Franken et al., 2015). Recordings in the present study were typically made at a penetration depth of at least 300 μm; we did not record from very superficial neurons, which constitute the neurons tuned to frequencies >2 kHz in the gerbil (Franken et al., 2015), because in superficial layers it was more difficult to ascertain that we recorded from the somatic layer using field potential recordings.
Determining the characteristic frequency of an MSO neuron. A, Individual monaural responses to zwuis stimuli presented at different sound intensity levels. Responses to different SPLs are represented by different colors; the numbers indicate SPL per tone component. Symbols show the measured data points; solid lines, the fits. Left and right plots show responses for contralateral and ipsilateral ear, respectively. Gray area demarcates the threshold where the response cannot be distinguished from the noise floor (see Materials and Methods). Red vertical lines indicate the characteristic frequencies, determined at 10 dB SPL stimulus intensity; CFs for contralateral and ipsilateral ears were 0.78 and 0.74 kHz, respectively. B, Monaural receptive fields of the MSO neuron determined using zwuis stimulus presented at different sound intensities (in 10 dB steps). Left and right plots show receptive fields for contralateral and ipsilateral ear, respectively.
To estimate the difference in frequency tuning for responses to sound stimuli presented to either ear, we cross-correlated magnitude spectra of ipsilateral and contralateral responses at the same intensity at which CF was determined (Fig. 6A,B). The vertical bar in Figure 6B indicates the peak of the cross-correlation function; it was at 40 Hz, indicating that contralateral frequency tuning was to slightly higher frequencies than ipsilateral. This cross-correlation peak generally corresponded well with the difference in CFs between both ears, but we consider the former a more robust estimate of frequency tuning mismatch than the difference in CFs, because it takes the entire frequency curve into account. We thus estimated CF mismatches for 83 cells by cross-correlation (Fig. 6C). Mismatches up to 400 Hz were observed, but most mismatches were much smaller. Even though the frequency mismatches were not very large, they were larger than the frequency tuning mismatches that were observed in the nucleus laminaris of the barn owl (Peña et al., 2001; Fischer and Peña, 2009) or alligator (Carr et al., 2009). Average mismatch was −8 ± 115 Hz, suggesting that there was no overall preferred mismatch direction. Figure 6D shows the absence of a significant correlation between CF mismatch (in octaves) and average CF for the same neurons (r = −0.07; p = 0.53). Similar results were obtained when frequency mismatches were translated into differences in basilar membrane location (Fig. 6E,F; r = −0.10; p = 0.35; Müller, 1996).
Frequency mismatch estimation. A, MSO neuron's frequency responses to contralateral (filled circles) and ipsilateral (open squares) stimulation at 10 dB SPL, estimated from the Fourier spectrum of the response waveform (compare Fig. 5A). Gray area indicates the noise floor. B, Normalized cross-correlation function of the two frequency responses shown in A. The gray vertical bar indicates the peak of the curve, revealing that the contralateral ear has an estimated 56 Hz higher CF. The data for A and B were taken from the same cell as Figure 5. C, Histogram of characteristic frequency mismatches between ipsilateral and contralateral stimulation (N = 78). D, Relation between interaural CF mismatches and mean CF of both ears for all MSO cells (N = 78; r = −0.17; p = 0.13). E, Distribution of cochlear time delays calculated from measured frequency mismatches (N = 78). F, Relation between basilar membrane mismatches, calculated using Müller (1996) and CF (N = 78; r = −0.10; p = 0.35).
BFs from ipsilateral and contralateral inputs were well correlated (r = 0.95; Fig. 7A). On average, a high correlation was also observed when the entire ipsilateral and contralateral receptive fields were correlated for each neuron, yielding an average r of 0.81 ± 0.13 (“native CI”; Fig. 7B). For comparison we also correlated random pairs of ipsilateral and contralateral ears (“random CI”) and random pairs of contralateral ears (“random CC”), which yielded on average a much lower correlation of r = 0.48 ± 0.30 and r = 0.51 ± 0.29, respectively (Fig. 7B). We thus conclude that ipsilateral and contralateral receptive fields are generally similar within MSO neurons, which is in general agreement with earlier reports measuring CF or BF (Moushegian et al., 1964; Goldberg and Brown, 1969; Guinan et al., 1972) and with our earlier work in which we used mostly intense tones in a small number of cells (van der Heijden et al., 2013).
Correlation between monaural frequency tuning within and across MSO cells. A, Relation between contralateral and ipsilateral BFs of MSO neurons at a stimulus intensity of 30 dB SPL (N = 69; r = 0.94; p < 0.0001). Seventeen neurons were not included as they were not sensitive to the 30 dB SPL stimulus. Solid line indicates identity. B, Cumulative histogram plots of the normalized correlation coefficients between monaural receptive fields. Thick black line: contralateral and ipsilateral receptive fields from the same MSO neuron (C/I, same cell; N = 84). Gray line: contralateral and ipsilateral fields from all pairs of MSO cells (C/I, across cells). Broken line: contralateral receptive fields between all pairs of MSO cells (C/C, across cells).
Relation between frequency and ITD tuning
We next compared ITD tuning and frequency tuning in the same cells. BITD and mean CF were inversely correlated (Fig. 8A), in agreement with earlier work (Fig. 8B). The estimates for the mismatch in monaural frequency tuning allowed us to test a key prediction of the stereausis theory, which is to investigate whether the frequency mismatch can predict ITD tuning. For binaural tuning we used BITDs obtained from wideband rITDfs to have zwuis stimuli for both estimates. CF mismatches were converted into octave differences. No obvious correlation was observed (N = 40; Fig. 9A). We next converted the CF mismatches to predicted BITDs based on the frequency-place map of the gerbil cochlea (Müller, 1996) and estimates of propagation speed and wavelength of the traveling wave in the apex of the cochlea for several species (Fig. 1; see Materials and Methods). Note that the predicted BITDs have no bias to either ipsilateral or contralateral leading values (Fig. 6E), in agreement with the lack of asymmetry in the distribution of CF mismatches (Fig. 6C). No obvious correlation was observed between the observed and predicted BITDs (r = −0.012; slope = −0.22 ms/ms; N = 40); to avoid the nonlinearities associated with fitting bivariate regression when both variables have errors (Buonaccorsi, 2010), the error estimates for both BITD and predicted BITD were combined into a single value. To get an estimate for the reliability of this conclusion that takes into account the number of cells and the precision of the individual measurements, we did a bootstrap analysis. To get an error for the estimate of the slope of the line fit, the individual BITD values were drawn from a distribution with the same mean and combined error as the original measurement; the resulting slope was −0.22 ± 0.17 ms/ms; if the association between the BITD and the predicted BITD was randomized, the average slope of the line fit became 0 ± 0.28 ms/ms; if individual values were drawn from a distribution with mean identical to the predicted BITD, conform the stereausis prediction, with combined error obtained from the measurement, the slope became 1.0 ± 0.17 (Fig. 9C). From this we conclude that the measured slope was well inside the range of slopes that could be expected if the association between BITD and frequency mismatch were random, and well outside the range of slopes that could be expected if the stereausis prediction was accurate. Similar results were obtained when BITDs were first detrended for the inverse relation between BITD and mean CF illustrated in Figure 8A (results not shown).
Inverse relation between BITD and mean CF. A, Cells tuned to low CFs tend to have more positive BITDs; black line shows linear regression (N = 40; r = −0.57; p = 0.0004). B, Comparison of BITD and mean CF correlation in Mongolian gerbils for four studies (and their stimulus): Day and Semple (2011) (binaural beat tone stimuli at BF); Brand et al. (2002)(FM tones at BF); Pecka et al. (2008)(tones at BF); and this study (the same data as in A). Dashed lines indicate gerbil's physiological ITD range.
Relation between BITDs and frequency tuning mismatches of MSO neurons. CF mismatches were determined using cross-correlation of responses to zwuis stimuli (Fig. 6). A, Relation between BITD and CF mismatch (N = 40). Bars indicate SD. B, Relation between BITD and predicted BITD. Predicted BITD was obtained from the relation between the traveling wavelength and CF (Fig. 1A). Bars indicate combined SD in BITD and predicted BITD. Line shows linear regression (slope −0.22 ms/ms; r = −0.012; N = 40). C, Results of bootstrap analysis of the slope of the regression line of the relation between BITD and predicted BITD. Gray line (random) shows distribution of slopes when BITD values were scrambled; black line (data fit) shows fit slopes when data points were drawn from a distribution with the same mean as the measured BITD and its combined error; dashed line (stereausis) shows distribution of fit slopes when data points were drawn from a distribution with the same mean as the predicted BITD and the combined error.
Discussion
We tested three predictions of the stereausis theory. First, as predicted by the stereausis theory, we found evidence that tuning for bilateral electrical stimulation, in which the cochlea is presumably bypassed, was different from ITD tuning for auditory stimuli. Second, we found that even though frequency tuning for ipsilateral and contralateral sounds were generally similar, the interaural differences were large enough to create substantial cochlear disparities in many cells. Most importantly, however, we did not find evidence for a direct correlation between BITDs and CF mismatches, even though the accuracy of our methods was sufficient to detect such a correlation in a scenario in which the mismatches were the dominant source of internal delays. We therefore failed to obtain critical support for the stereausis theory, suggesting that axonal delays are more important for determining internal delay than cochlear disparities.
Spatial tuning
A majority of neurons had a positive bias (contralateral leading) in BITDs, in agreement with previous results (Fig. 8B). We observed that more than half of neurons had a BITD within the ecological range, similar to earlier work in gerbil and in cat (Yin and Chan, 1990; Day and Semple, 2011). In contrast, the observed distribution of BITDs was quite different from two other earlier studies in gerbil, where only ∼20% of BITDs fell within the ecological range (Brand et al., 2002; Pecka et al., 2008; Fig. 8B). One possible cause for the difference between our results and some of the earlier results is that we used wideband stimuli at relatively low intensity. ITD tuning for tones can be quite complex in the gerbil MSO (Day and Semple, 2011; van der Heijden et al., 2013), and especially at frequencies away from CF, or at very high intensities, preferred ITDs to tone stimuli can be quite variable. Most physiological sounds, however, are wideband, and sound intensities >60 dB SPL are uncommon in nature.
The wideband methods we used did not allow us to take onset responses into account. However, for humans, ongoing ITDs of low frequencies are the main source of information to detect the sound source location in the horizontal plane (Wightman and Kistler, 1992).
We did not target a specific area within the MSO, but because BITDs do not vary systematically along the rostrocaudal axis in the gerbil (Franken et al., 2015), it seems unlikely that our conclusions would depend on the exact rostrocaudal location within the MSO.
Electrical stimulation
To separate the delay caused by the traveling wave in the cochlea from the retrocochlear delay, we successfully obtained in three cells an estimate for BITD based on electrical stimulation at the round window in addition to the sound-evoked BITD. In each case we observed a difference of ∼0.2 ms for both estimates. This illustrates that cochlear delays can contribute to the overall internal delay, as predicted by the stereausis hypothesis, but that electrical BITD does not predict sound-evoked BITD well, in agreement with the lack of a correlation of the sound-evoked BITD with the predicted BITD based on frequency tuning mismatch.
Our approach of electrically stimulating the round window had the advantage that it preserved hearing, thus allowing a comparison between sound-evoked ITD tuning and electrical ITD tuning. In larger animals, such as guinea pigs or cats, it has been shown that sound-evoked responses can be largely preserved following cochlear implant insertion into the basal cochlea (van den Honert and Stypulkowski, 1984; McAnally et al., 1997; Miller et al., 2006; Sato et al., 2016), but this is a delicate procedure that would still not allow us to directly stimulate the apical cochlea, which supplies the low-frequency afferents involved in ITD tuning. In contrast, monopolar stimulation near the round window can excite the whole cochlea and thresholds are essentially independent of CF (Moxon, 1971; Hartmann et al., 1984; van den Honert and Stypulkowski, 1984, 1987). Our experiments thus complement earlier studies in slices showing that it is possible to measure an electrically evoked BITD in MSO neurons (Jercog et al., 2010; Roberts et al., 2013).
In all experiments, peaks at different latencies were observed at the higher stimulation intensities. The underlying mechanism remains uncertain in our experiments, but is unlikely to involve a traveling wave, because this would be expected to induce ringing, and periodic responses with an interval determined by CF, similar to the response to a click stimulus (Goblick and Pfeiffer, 1969; Moxon, 1971; van den Honert and Stypulkowski, 1984, 1987; Recio and Rhode, 2000). In addition, the minimum sound-evoked latencies were clearly longer than the electrical latencies in the same experiments. We therefore consider it likely that the traveling wave was bypassed by the electrical stimulation. Our observation that in each of the three cells there was a clear difference in the BITD for electrical and auditory stimulation is thus compatible with a contribution of cochlear delays to the overall internal delay, as predicted by the stereausis hypothesis.
Stereausis hypothesis
The ability to measure both ITD tuning and monaural frequency tuning in a large number of cells allowed us to test some of the predictions of the stereausis hypothesis. We used differences in CF to estimate differences in cochlear delay. Because we used wide-band stimuli at moderate sound levels, a substantial change in cochlear disparities compared with the threshold measurements seems unlikely to occur during our BITD measurements. For the positive bias (contralateral leading) in BITDs to originate from a difference in tuning, it would be needed that, on average, contralateral input would be tuned to lower frequencies than ipsilateral input, thus creating a larger cochlear delay for contralateral sounds. This is not what was observed. Neither did we observe evidence for the opposite, i.e., lower BFs in response to ipsilateral than contralateral stimulation, as was observed in gerbil inferior colliculus (Semple and Kitzes, 1985).
To convert CF mismatches to predicted delays (Fig. 1) we used data from the literature. Probably the biggest source of error in the conversion is the smaller cochlear length of the gerbil compared with cat, chinchilla, and guinea pig (Liberman, 1982; Greenwood, 1990; Müller, 1996). Note, however, that if we had scaled down the apical wavelength for the gerbil accordingly, the predicted BITDs would be 50–100% larger, making the expected size of the stereausis effects correspondingly larger. Previous assessments of stereausis (Shamma et al., 1989; Day and Semple, 2011) were based on cochlear models (Holmes and Cole, 1984; Tan and Carney, 2003) rather than data, predicting an even larger contribution of CF mismatches to ITD tuning. Another possible source of error in the BITD predictions are the individual differences in cochlear length (SD/mean ∼5%; Plassmann et al., 1987; Müller, 1996), whereas length differences between left and right cochlea are small (Bohne and Carr, 1979).
Predicted and measured delays were not significantly correlated. Our data thus agree with earlier tests in low-frequency units of the nucleus laminaris of the barn owl (Peña et al., 2001; Fischer and Peña, 2009) and alligator (Carr et al., 2009), and in the inferior colliculus of the barn owl (Singheiser et al., 2010). We conclude that for individual MSO cells, frequency mismatches are expected to make a substantial contribution to their internal delay, but that they are not the dominant determinant of BITD on a population level.
The origin of internal delays (Jeffress revisited)
Because our data do not support a role for a systematic contribution of frequency mismatches, the most likely remaining mechanism to create an internal delay is the presence of differences in the time it takes signals to travel from the cochlear nucleus to the MSO neurons. As SBCs typically innervate both ipsilateral and contralateral MSO neurons (Thompson and Schofield, 2000), it is hard to see how a systematic internal delay could be created in the time it takes to travel from cochlea to cochlear nucleus in the absence of a systematic shift in frequency tuning. In recent years several alternative theories to the classical Jeffress' model (Jeffress, 1948) have been put forward that focus on delays created within the MSO itself, including a role for well timed inhibition (Brand et al., 2002; Pecka et al., 2008; Myoga et al., 2014) or for asymmetric EPSPs (Jercog et al., 2010), but in recent studies these alternatives have not received much support (Zhou et al., 2005; Day and Semple, 2011; Roberts et al., 2013; van der Heijden et al., 2013; Franken et al., 2015). We cannot exclude a role for intrinsic conductances in creating delays within the MSO, even though the reported effects might be smaller for wideband, low-intensity stimuli as used here, than for high-intensity, low-frequency tones that were used by Franken et al. (2015).
This leaves as the most likely possibility, by exclusion, the option that axonal delay lines are responsible for the internal delay, as originally proposed by Jeffress (1948). A direct test of the branching patterns of the axons of SBCs indicated that they cannot account for the frequency-dependent distribution of best delays in the cat (Karino et al., 2011). However, in addition to axonal length, differences in axonal conduction velocities can also contribute to the internal delay, as shown for birds (Seidl et al., 2010, 2014). Some evidence for differences in conduction velocities for mammalian auditory brainstem axons has indeed been found recently (Ford et al., 2015; Seidl and Rubel, 2016). A combination of anatomical and physiological studies would thus allow to further test the Jeffress hypothesis.
Footnotes
This work was supported by the Dutch Fund for Economic Structure Reinforcement (FES, 0908 NeuroBasic PharmaPhenomics project).
The authors declare no competing financial interests.
- Correspondence should be addressed to J. Gerard G. Borst, Department of Neuroscience, Ee 1202b, Erasmus MC, P.O. Box 2040, 3000 CA, Rotterdam, The Netherlands. g.borst{at}erasmusmc.nl