Abstract
Detection of interaural time differences (ITDs) is crucial for sound localization in most vertebrates. The current view is that optimal computational strategies of ITD detection depend mainly on head size and available frequencies, although evolutionary history should also be taken into consideration. In archosaurs, which include birds and crocodiles, the brainstem nucleus laminaris (NL) developed into the critical structure for ITD detection. In birds, ITDs are mapped in an orderly array or place code, whereas in the mammalian medial superior olive, the analog of NL, maps are not found. As yet, in crocodilians, topographical representations have not been identified. However, nontopographic representations of ITD cannot be excluded due to different anatomical and ethological features of birds and crocodiles. Therefore, we measured ITD-dependent responses in the NL of anesthetized American alligators of either sex and identified the location of the recording sites by lesions made after recording. The measured extracellular field potentials, or neurophonics, were strongly ITD tuned, and their preferred ITDs correlated with the position in NL. As in birds, delay lines, which compensate for external time differences, formed maps of ITD. The broad distributions of best ITDs within narrow frequency bands were not consistent with an optimal coding model. We conclude that the available acoustic cues and the architecture of the acoustic system in early archosaurs led to a stable and similar organization in today's birds and crocodiles, although physical features, such as internally coupled ears, head size, or shape, and audible frequency range, vary among the two groups.
SIGNIFICANCE STATEMENT Interaural time difference (ITD) is an important cue for sound localization, and the optimal strategies for encoding ITD in neuronal populations are the subject of ongoing debate. We show that alligators form maps of ITD very similar to birds, suggesting that their common archosaur ancestor reached a stable coding solution different from mammals. Mammals and diapsids evolved tympanic hearing independently, and local optima can be reached in evolution that are not considered by global optimal coding models. Thus, the presence of ITD maps in the brainstem may reflect a local optimum in evolutionary development. Our results underline the importance of comparative animal studies and show that optimal models must be viewed in the light of evolutionary processes.
Introduction
The arrival time difference of sounds at both ears (interaural time difference [ITD]) is a key feature for sound source localization (Blauert, 1996). In archosaurs, which include birds and crocodilians, the detection of ITD is assumed to be consistent with the Jeffress model, which is made up of coincidence detectors and delay lines (Konishi, 2003; Grothe et al., 2010). Internal delay lines from both ears compensate for the range of external delays and innervate arrays of coincidence detectors forming a map or place code of ITD. A structure closely resembling the Jeffress model was found in the nucleus laminaris (NL) of birds (Carr and Konishi, 1990; Overholt et al., 1992). NL neurons fire maximally when the phase-locked excitatory inputs from both ears arrive simultaneously, whereas the axons from the nucleus magnocellularis function as delay lines (Carr and Konishi, 1988; Köppl and Carr, 2008). The response of NL neurons can be described by a cross-correlation of narrow-band inputs from the ipsilateral and contralateral ears (Fischer et al., 2008). Therefore, the firing rate varies as a cyclic function of ITD.
The accuracy of a place code drops at low frequencies because of broadening of ITD tuning (Fischer and Seidl, 2014). This problem increases with small heads and decreasing naturally occurring ITDs. Maximum information may shift from the peak to the slope of the ITD tuning (Fischer and Seidl, 2014). In the medial superior olivary nucleus of small rodents, the mammalian analog of NL, response maxima often lie outside the physiological range of ITDs (Crow et al., 1978; Brand et al., 2002). A place code generated by ITD maps cannot exist under this condition. An optimized solution for low-frequency ITD coding would be to compare the firing rates of two populations of neurons, one in each hemisphere, with dynamic changes of firing rate within the physiological ITD range (Harper and McAlpine, 2004; Stecker et al., 2005). Indeed, ITDs are not topographically arranged in medial superior olive (MSO) (Joris and Yin, 2007; for reviews, see Grothe et al., 2010). Such a distribution in MSO may in part emerge through fast glycinergic inhibition of neurons (Brand et al., 2002; Pecka et al., 2008) or by intrinsic properties of MSO neurons (Franken et al., 2015; Winters et al., 2017). For high frequencies (and larger heads), a place code consistent with the Jeffress model is the optimal code (Harper and McAlpine, 2004; Harper et al., 2014). However, data from chickens, which have a similar head size to gerbils, indicate a place code with topographical arrangement of best ITD (Köppl and Carr, 2008; Palanca-Castan and Köppl, 2015a).
The auditory brainstem structures and the response properties of brainstem neurons in birds and crocodilians are very similar (Köppl and Carr, 2008; Carr et al., 2009). Preliminary data indicated a place code, but clear evidence for a topographic map was lacking so far (Carr et al., 2009). Best ITDs, however, often lie outside the apparent physiological range (Bierman et al., 2014). Eardrums that are internally coupled through cranial cavities (Wada, 1924; Witmer, 1990; Larsen et al., 2006; Witmer and Ridgely, 2008; Bierman et al., 2014; Kettler et al., 2016) can alter the ITD range. Depending on the degree of coupling, interaural sound transmission may increase the physiological ITD range by a factor of 3, as shown experimentally in birds (Calford and Piddington, 1988; Hyson et al., 1994) and with simulations (Vossen et al., 2010; Vedurmudi et al., 2016). Mammalian ears are with a few exceptions not coupled, which could be a contributing factor for different coding strategies.
Our data connect best ITDs and frequencies with their location in NL and show unequivocally that delay lines form ITD maps in alligators. We also compare the three most studied archosaurs (barn owl, chicken, and alligator) and find that ITD coding in these species is not consistent with the optimal coding model of Harper and McAlpine (2004).
Materials and Methods
Animals and surgery.
We report extracellular electrophysiological recordings from NL of 40 American alligators (Alligator mississippiensis) aged between 6 and 36 months, weighing between 360 and 1450 g. Sexes of the individual alligators were not determined. All procedures and protocols were approved by the University of Maryland Institutional Animal Care and Use Committees and complied with the National Institutes of Health Guide for the use and care of laboratory animals. Anesthesia was induced by an initial intramuscular injection of ketamine (Ketavet, Phoenix, 10 mg/kg body weight) and dexmedetomidine (Dexdomitor, Pfizer,, 0.22 mg/ kg) and by further injections of ketamine and dexmedetomidine with half the dose of the initial injection every 20 min until no toe-pinch reflex occurred. Anesthesia was maintained by regular (every ∼2 h) intramuscular injections of ketamine and dexmedetomidine with half the initial dose. Head skin temperature was measured with an infrared thermometer and maintained >24°C by wrapping the alligator in a homeothermic electrical heating blanket (Harvard Apparatus). Additionally, room temperature was also held at 24°C. Thresholds of auditory nerve fibers in caimans are lowest between 24° and 30° (Smolders and Klinke, 1984), and latencies of auditory brainstem responses in alligators did not vary for temperatures >24° (Strain et al., 1987). The head was firmly held by a metal plate connected to the stereotactic device and cemented to the flat top of the snout. In bigger animals (>700 g), the craniotomy was performed by drilling a hole through the external layer of skull and softening the bone forming the brain capsule with a Dremel (Dremel). Bone was then carefully removed with rongeurs and forceps to expose the dura mater covering the cerebellum. Craniotomy was performed without a drill in smaller animals (<700 g).
Stimulus generation and calibration.
Experiments were performed in a sound-attenuating chamber (IAC Acoustics). Sounds were delivered by custom-made loudspeaker systems that were pressed on the earlids and sealed with Gold Velvet ear impression material (All American). The loudspeakers were commercial earbud loudspeakers (Yuin PK2). Microphones close to the eardrum were used to calibrate the sound systems individually for each ear before every recording session. The microphones were calibrated before the experiments using a reference microphone (Brühl & Kjær). Pure tone sound stimuli were generated using custom-written MATLAB software (tytology) that fed the signals to a processing device (RX8, Tucker Davis Technology), attenuators (PA5, Tucker Davis Technology), and headphone buffers (HB7, Tucker Davis Technology). Tonal stimuli for neurophonic recordings had a duration of 100 ms, including 5 ms cosine onset and offset ramps. The interstimulus interval was 150 ms between offset and onset of stimuli. Condensation clicks had a rectangular form and a duration of two samples (41.6 μs). Clicks were presented at 30 dB above threshold but never >95 dB SPL. Clicks were generated and presented with the same procedure as with tonal stimuli.
Electrophysiological recordings.
Recordings were obtained with tungsten microelectrodes with impedances of ∼20 mΩ (F.C. Haer). Anatomical landmarks helped in guiding the electrode toward the cerebellar area covering NL, which was typically easy to find. The electrode was advanced into the brainstem by a micro-drive system (md-800 MicroDrive Controller, Walsh). Recorded signals were amplified (mA-801D Amplifier, Walsh) and bandpass-filtered (10–6000 Hz), passed through a Hum Bug (Quest Scientific Instruments) to eliminate 60 Hz noise, and converted from analog to digital (48 kHz sampling frequency, RX8 Multi/O Processor, Tucker Davis Technologies). Signals were visualized using a custom-written MATLAB script (tytology, MathWorks). Recording sites were denoted by the alligator identification number used in the C.C. laboratory and the consecutive number of the recording site during the experiment. For example, 56.4 would be the fourth recording site in alligator 56.
Experimental design and statistical analysis.
Data from 83 recording sites in NL of 40 American alligators of either sex were used for the heterogeneous analyses of the neurophonic responses. The individual analysis steps, including their statistics, will be described in detail in the following Materials and Methods sections, as well as in the respective Results sections. The analyses include the decomposition of the neurophonic signal for further investigation of frequency and ITD tuning (Frequency tuning of neurophonic responses, Best ITD and IPD, Best ITD and internal delays), mapping of the tuning characteristics (Maps in NL), and comparison with an optimal coding model (Optimal coding). Data from 92 recording sites in barn owls (reported by Palanca-Castan and Köppl, 2015b) and from 96 sites in chicken (reported by Palanca-Castan and Köppl, 2015a) were used for analysis with the optimal coding model. Additional data from 75 recording sites in alligators (Carr et al., 2009), with our data totaling 155 recording sites, were also used for optimal coding analysis.
Neurophonic analysis.
The responses in the auditory brainstem were characterized by strong local field potentials, making it difficult to identify unambiguous single neurons. The local field potential generated by responses of auditory neurons is called the neurophonic. The sources of the neurophonic in alligators are unknown but have been previously described in barn owls. The neurophonic inn the NL of owls consists of two components (Kuokkanen et al., 2010): (1) an oscillatory signal component that is generated by strong phase-locking of the inputs to NL (Kuokkanen et al., 2013); and (2) a noise component that represents the non–phase-locked contributions to the neurophonic. We focused our analysis on the signal component as we were interested in whether the delay lines (i.e., the inputs to NL) form maps of ITDs and because the signal component in the alligators was strongly tuned to ITD and frequency. The signal component can often be identified by a visibly oscillatory response waveform (see Fig. 1a) and a peak in the amplitude spectrum at the stimulus frequency (Fstim; see Fig. 1b). To exclude effects of the noise floor and increase the strength of ITD tuning (vector strength [VS], see below), we applied a narrow bandpass filter to the raw waveform (see Fig. 1c,d). The filter was an eighth-order Butterworth filter centered at the stimulus frequency with cutoff frequency 50 Hz below and above stimulus frequency. The noise floor contains power in frequencies of the waveform of outgoing spikes (Kuokkanen et al., 2018). Hence, it can provide information about the response characteristics of the NL neurons, although it was difficult to isolate spikes in the recorded waveforms. It is important to know whether or not the neurons tap the tuning of the signal component (i.e., their inputs) and maintain their topography. The noise component waveform was calculated by subtracting the bandpass filtered waveform (see above) from the raw recording the noise (see Fig. 4a) (for detail, see Kuokkanen et al., 2010). The variance in mV2, that is, the power of the signal (or noise) AC component, of an 80 ms segment of the filtered response waveform was calculated to identify the response amplitudes as functions of ITD and frequency. The first 15 ms and last 5 ms of the stimulation time were omitted to exclude effects of a strong onset and offset response.
Data collection protocols and analysis.
First, a recording site was probed for its response threshold. To determine response thresholds, measurements of response variance at visually inspected preferred frequencies and different stimulus levels were taken at different penetration depth. A site was selected for recording of frequency and ITD tuning, if thresholds (i.e., the smallest sound pressure level that generated a measurable neurophonic) to ipsilateral and contralateral sounds differed by <10 dB. Above NL, thresholds for ipsilateral stimulation were higher than for contralateral stimulation because fibers from ipsilateral nucleus magnocellularis enter NL from the dorsal side. Fibers from ipsilateral nucleus magnocellularis enter NL from the dorsal side. Consequently, the thresholds for ipsilateral sounds above NL were also lower than for contralateral sounds. Because closer sources contribute more to the neurophonic (Kuokkanen et al., 2010), contralateral sounds needed to be louder to create neurophonic local field potential detectable by the electrode tip in dorsal NL. Monaural thresholds typically equalized when the electrode tip was close to the NL cell layer, as later confirmed by lesions. Frequency tuning curves were obtained by presenting monaural and binaural pure tone pips of different frequencies ∼15 dB above response threshold. Stimuli were repeated at least 5 times.
Frequency tuning curves were characterized by their full width at half-height (tuning width, TW) and their best frequency (BF) (see Fig. 1e). The tuning width was derived from the width of the frequency tuning curve at 50% of the maximum response. BF was then given by the midpoint between the two extreme values where the response crossed 50% of the maximum response. Binaural frequency tuning curves were recorded with 0 μs ITD. Responses to monaural stimuli were used to compute the phase delay (Δphase) at BF. The complex argument of the Fourier transform describes the phase of the monaural response. The mean phase was determined by averaging the monaural phase across stimulus repetitions. Δphase between responses to ipsilateral and contralateral stimulation was computed by subtracting the ipsilateral from the contralateral mean phase. Bandpass filtering added a constant phase shift to both ipsilateral and contralateral response waveforms and, thus, did not influence Δphase.
Click delays were used to determine the conduction delay between ipsilateral and contralateral inputs to NL. The general latency between stimulus onset and response onset was typically >5 and <10 ms. Hence, click delays were determined by a cross-correlation of the time segment between the first 5 and 10 ms of the responses to ipsilateral and contralateral clicks. The click delay corresponded to the cross-correlation lag that yielded the largest correlation.
ITD tuning curves (i.e., response variance as function of stimulus ITD) were recorded at BF and ∼15 dB above threshold. Presented ITDs ranged between at least ± 1 stimulus period in 100 or 200 μs increments. Best ITDs were calculated as described by Kuokkannen et al. (2013). Briefly, the best ITD of a recording site is the ITD with the largest expected response and corresponds to the circular mean of the response variance as a function of ITD. To calculate the VS of the ITD tuning, we took advantage of the fact that response variance varied as a cyclical function of ITD with the same period as the stimulus (see example in Fig. 1d; stimulus period = 770 μs). Therefore, the stimulus ITD was converted into interaural phase differences (IPD in cycles) with IPD = ITD × Fstim. IPDs were binned to solve nonuniform phase sampling. Response amplitudes for bins with multiple values were averaged, and values for empty bins were interpolated linearly. To prevent the bin size from affecting the outcome for best ITD, the VSs resulting from different bin sizes were averaged. Bin sizes were integer numbers from 1° to 10° (1 of 360 to 10 of 360 cycles). The VS is equal to the length of the mean phase vector. Significance is given as the probability of error of a systematic relationship between response variance and stimulus IPD. Only recording sites with significant ITD tuning (VS > 0.02 and p < 0.05) were considered. Click delays were used to resolve the ambiguous laterality of best ITD in cases where the peaks of the ITD tuning fell close to the π limit (ITD = ± 0.5 × stimulus period).
Paired statistical comparisons (ipsilateral vs contralateral response variance; ipsilateral vs contralateral BF; VS before and after filtering the response waveform) were performed with Wilcoxon signed-rank tests using the MATLAB statistical toolbox.
Histology and maps in NL.
After obtaining all measurements from recording sites that yielded significant ITD tuning, a lesion was made using the recoding electrode by injecting a 10 μA positive direct current for 9–25 ms. Shorter injections caused the lesions to be too small and difficult to find. After completing the experiment, the still anesthetized animal was killed by intracardiac injection of 2 ml pentobarbital (400 mg/ml pentobarbital sodium, Euthasol, Virbac), then perfused transcardially with 0.9% saline solution followed by 4% PFA in 0.01 m PBS. The cryoprotected (with 30% sucrose solution) brain was cut into 60 μm slices with a cryotome, mounted, and Nissl-stained. The sections containing NL were identified, and the lesion located in mediolateral and rostrocaudal extent of NL. Further recording sites in the same animal were located by their stereotactic coordinates relative to the site tagged by a lesion. The thin soma lamina of NL (<5 cells) was virtually flattened, and its shape was averaged across animals to create a normalized map to account for variation in animal size. Maps of BF and best ITD were generated by 2D nearest neighbor interpolation (MATLAB function scatteredInterpolant) of the relative location of the lesions on the averaged surface. The resolution of the map was 100 × 100 points from 0% to 100% in the mediolateral and rostrocaudal axes. Smoothed surface plots were generated after 51 points moving average filtering.
Modeling physiological ITD range.
Interaction of sounds arriving at the external and internal ear drum surfaces in internally coupled ears may lead to a phase shift of eardrum vibration with respect to the external sound. Thus, the ITD range may be increased with internally coupled ears or pressure difference receivers (Vossen et al., 2010; Vedurmudi et al., 2016). The physiological ITD ranges for alligators, chickens, and barn owls were calculated using a model of internally coupled ears described by Calford and Piddington (1988). Parameters for the model were tympanal separation through the canal (TS), head diameter (HD), meatus length (ML), and attenuation of sound through the interaural canal (transmission gain). Alligator parameters were derived from size measurements of the experimental animals for this study (TS = 3 cm, HD = 4.6 cm, ML = 0.3 cm, for more detail on middle ear sinus development and growth in alligators, see Dufeau and Witmer, 2015). Bierman et al. (2014) provided data on the attenuation in the interaural channel. Chicken transmission gain was chosen to fit ITD ranges for the dimensions of hatchlings (Hyson et al., 1994). ITD ranges for adult chickens were then extrapolated by increasing the model head dimensions (TS = 2.9 cm, HD = 3 cm, ML = 0.15 cm). Owl transmission gain and head dimensions were taken from Kettler et al. (2016) (<−12 dB <3k kHz and TS = 3.5 cm, HD = 5.5 cm, ML = 1.7 cm).
Optimal coding model.
The optimal coding model has been described previously and was inspired by optimization during evolutionary processes (Harper and McAlpine, 2004; Harper et al., 2014). Briefly, the model predicts the distribution of best IPDs, ϕ, in a population of neurons that minimizes the error measure V(ϕ) for a tonal stimulus, ideally at the BF of the neuron. V(ϕ) depends on the IPD tuning of the neurons and the physiological IPD range at BF. The error measure is inversely correlated to the population Fisher information F(θ|ϕ) as follows:
The physiological IPD range determines the minimum and maximum of the (uniform) distribution of available IPDs (p(θ)). The IPD range depends on the head size and may be altered by internally coupled ears (see above). IPD tuning curves were modeled as in Harper and McAlpine (2004) by a modified cosine function. IPD tuning curves yield the Fisher information f(θ|ϕi) of neuron i with best IPD ϕi. The population Fisher information is given by the following:
with N being the total number of neurons at a given frequency. Model results are presented as 2D histograms with bin sizes of 0.025 cycles for IPD and 40 Hz for frequency. We used the MATLAB function fminsearch for error minimization with 60 neurons per frequency bin. The model data were compared with experimental data from alligator (current study; Carr et al., 2009), chicken (Palanca-Castan and Köppl, 2015a), and owl (Palanca-Castan and Köppl, 2015b). Data from these studies were extracted from the respective figures with the MATLAB function grabit. The optimal coding model predicted three narrow populations of best IPDs at intermediate frequencies that depended on size and species. We therefore compared the collapsed distributions of model and experimental data within this frequency range, using a two-sided Kolmogorov–Smirnov test. The test was limited to the positive side of the distributions to prevent duplication of samples. The acceptance of the null hypothesis, that the two populations originate from the same distribution, would indicate that IPDs were encoded optimally within this frequency range. The tested frequency range was 600–1850 Hz for small alligators (3 cm tympanal separation), 150–500 Hz for large alligators (10 cm tympanal separation), 900–2900 Hz for the chicken, and 350–1250 Hz for the barn owl.
Results
To test the hypothesis that alligators possess spatially ordered representations of ITD, we combined neurophonic recordings with localization of small lesions in NL. Neurophonic responses are a characteristic feature of systems with strong phase-locking and provide a useful measure of local ITD and frequency tuning (Sullivan and Konishi, 1986; Köppl, 1997; Köppl and Carr, 2008; McLaughlin et al., 2010; Day and Semple, 2011; Kuokkanen et al., 2013; Carr et al., 2015; Palanca-Castan and Köppl, 2015a, b). We will first describe our analyses of frequency and ITD tuning of the neurophonic in the alligators, then describe the maps of ITD and frequency, and conclude with the comparison of experimental data from alligators, chickens, and barn owls with an optimal coding model (Harper and McAlpine, 2004).
Frequency tuning of neurophonic responses
We report data from 83 recording sites in NL of 40 anesthetized American alligators. Recorded waveforms typically consisted of a strong oscillatory signal component and a prominent peak in the amplitude spectrum at the stimulus frequency (Fstim; Fig. 1a,b). Spikes could sometimes be identified “riding” on the neurophonic waveform, although single-unit isolation was difficult, and thus contributing to the noise floor (Kuokkanen et al., 2018). We therefore focused on the neurophonic and focused mainly on the signal component of the extracellular field potential. The sources of the neurophonic in alligators are currently unknown. However, in owls, the signal component, which is generated by the phase-locked fraction of the neuronal response, mainly derives from the presynaptic inputs to NL (Kuokkanen et al., 2010, 2013, 2018). In these birds, the signal component is attributed to the highly phase-locked NM inputs into NL neurons (Kuokkanen et al., 2013), whereas the noise component contains contributions of local action potentials generated by NL neurons and NM axons (Kuokkanen et al., 2018). Similarly, synaptic inputs are assumed to be the major contributor to the neurophonic in chicken (Köppl and Carr, 2008) and cat (McLaughlin et al., 2010). To exclude effects of noise on the tuning curves, especially the contributions of passing outgoing NL axons from different frequency regions, we bandpass filtered the signal around the stimulus frequency (see Materials and Methods).
Basic characterization of neurophonic responses. a, A 10 ms segment of a recorded oscillating local field potential (neurophonic) with 1300 Hz stimulus frequency from recording site 56.4. b, Amplitude spectrum after Fourier transformation of the waveform in a. c, Average bandpass filtered responses in a. d, Amplitude spectrum of d. Filter was an eighth-order Butterworth filter with 1250 and 1350 Hz cutoff frequencies. a–d, Shaded areas represent ±1 SD over 5 stimulus repetitions. e, Binaural frequency tuning of recording site 56.4. Average variance of the filtered response waveforms plotted against stimulus frequency. Double-headed arrow indicates width at half-height (370 Hz). Single-headed arrow points at BF (1370 Hz). f, ITD tuning of site 56.4 with 1300 Hz stimulus frequency. Response variance varies cyclically with stimulus ITD. Arrow indicates best ITD (270 μs). Positive ITD indicates that contralateral sound was leading. Dashed line indicates variance of spontaneous activity. e, f, Error bars indicate ±1 SD. g, IPD selectivity of site 56.4 plotted in polar coordinates. A full cycle corresponds to the stimulus period. The direction of the arrow indicates best IPD (0.35 cycles), and the length corresponds to the VS (VS = 0.51) of IPD (and ITD) tuning.
The neurophonic provided information about frequency specificity of locations within NL. The variance of the signal component changed with the stimulus frequency (Fig. 2a,b). Frequency tuning was quantified by a unit's BF. BF was defined as the mid-frequency at half-maximum of the tuning curve. Figure 2a, b shows the frequency tuning curves from two recording sites (Fig. 2a: 67.9, BF = 970 Hz; Fig. 2b: 59.8 BF = 770 Hz). In the recorded population, measures of binaural BF ranged from 380 to 1950 Hz (Fig. 2c), which covered almost the entire hearing range of American alligators, which goes as low as 100 Hz, with a peak sensitivity at 800 Hz, and an upper frequency between 2 and 3 kHz (Wever, 1978; Higgs et al., 2002). Monaural stimulation also elicited frequency tuning (Fig. 2a,b; red represents ipsilateral; blue represents contralateral). Ipsilateral and contralateral response variances at BF did not significantly differ (Wilcoxon signed-rank test, p = 0.9677; z value = 0.040489, signed rank = 920.5). BF of ipsilateral and contralateral responses also did not differ significantly (Fig. 2d; Wilcoxon signed-rank test, p = 0.47; z value = −0.7228, signed rank = 817) with 20 Hz median difference and 105 Hz interquartile range.
Binaural and monaural frequency tuning. a, Frequency selectivity of recording site 67.9. Black represents binaural tuning. Red represents frequency tuning with ipsilateral (right) stimulation. Blue represents tuning with contralateral stimulation. Error bars indicate ±1 SD. Arrows indicate BF (binaural: 970 Hz; ipsilateral: 970 Hz; contralateral: 960 Hz). b, Frequency tuning of recording site 59.8 (BF binaural = 770 Hz; ipsilateral (left) = 770 Hz; contralateral = 720 Hz). c, Histograms of BF in the recorded population (n = 83). d, Contralateral BF versus ipsilateral BF.
Best ITD and IPD
ITD sensitivity in NL of American alligators was reported previously for BFs between 200 and 1200 Hz, with one unit at 1500 Hz (Carr et al., 2009). Our dataset covers a wider range of frequencies (up to 1950 Hz) and analyzes ITD tuning of neurophonic potentials as a proxy for the ITD tuning of NL neurons and their inputs. We shall also show in a later section that delay lines generate a wide distribution of ITDs and create a topographic map of time differences in the NL. In the following, interaural delays were normalized with respect to the brain hemisphere of the recording site. Positive delays indicate that the contralateral side is leading, whereas negative values denote ipsilateral leading delays.
Figure 3 shows the neurophonic (signal component) ITD tuning curves of the examples in Figure 2 with response variance plotted against stimulus ITD. ITD tuning curves were recorded at or close to BF of the respective recording site. The response variance varied cyclically with ITD (Fig. 3a,b). ITD tuning was characterized by the best ITD. To calculate best ITD and best IPD, stimulus ITDs were converted into IPDs by multiplying ITD with Fstim (Fig. 3c,d). Best IPD is equal to the circular mean phase of the responses across IPDs (see Materials and Methods) and, thus, can only lie within one cycle (−0.5 to 0.5, referred to as the π limit). The length of the mean vector corresponds to the ITD tuning VS. Best IPDs were broadly distributed with −0.03 ± 0.20 cycles (circular mean ± SD) (Fig. 3e) and did not vary with BF (Fig. 3f). The amplitude of the neurophonic noise floor varied across recording sites and may depend on the distance of the electrode to the closest NL neuron or nearby outgoing NL axon since it represents the non–phase-locked output of these neurons (Kuokkanen et al., 2013). Bandpass filtering of the recorded waveform to obtain the signal component (compare Fig. 1b,d) increased the average VS of ITD tuning in all recording sites significantly from 0.385 ± 0.096 to 0.447 ± 0.091 (mean ± SD, Wilcoxon-signed-rank test, p = 5.5 × 10−8, z value = 5.432, signed rank = 2751). Bandpass filtering also eliminated the influence of the noise floor on the neurophonic signal. However, the noise floor may contain frequency information about the non–phase-locked or poorly phase-locked action potential waveforms of underlying NL neurons (Kuokkanen et al., 2013, 2018). NL neurons may linearly relay the inputs from ipsilateral and contralateral NM, which would result in the same tuning of neurophonic signal and NL neuron. On the other hand, ITD tuning may shift due to inhibitory inputs (Brand et al., 2002; Pecka et al., 2008) or intrinsic neuronal properties (Franken et al., 2015). To test this, we subtracted the signal component from the recorded response waveform to obtain the noise component waveform (Fig. 4a) (compare Kuokkanen et al., 2013). The variance of the noise component was indeed significantly tuned to ITD in 66 of 83 recording sites, although ITD tuning was weaker (VS = 0.295 ± 0.077) than that of the signal component (VS = 0.447 ± 0.091). Figure 4b–d shows the ITD tuning of the three examples from the previous figures (sites 56.4, 67.9, 58.9) with noise ITD tuning in blue, signal ITD tuning with dashed lines and inverted triangles, and the raw response ITD tuning with dotted lines and upward triangles. Best ITD of noise and signal was similar (see arrows on the x axes). Noise and signal best ITD correlate strongly in all recording sites (Fig. 4e; Pearson's correlation coefficient r = 0.96). The same observation was made for best IPDs (Fig. 4f) where the average difference between noise and signal best IPD for all recordings was close to 0 (mean ± SD: −0.05 ± 0.22 cycles). For those recording sites that were clearly identified by lesions, the difference and scatter were even smaller (mean ± SD: −0.01 ± 0.14 cycles). Thus, the neurophonic in alligators is a viable method for investigation of the distribution of ITD tuning.
Sensitivity to ITD and IPD. a, b, ITD tuning curves of recording sites 67.9 (a) and 59.8 (b). Response variance as a function of ITD. Negative ITDs denote ipsilateral leading sources; positive ITDs denote contralateral leading sounds. Dashed lines indicate variance of spontaneous activity. Arrows indicate best ITD (a: 142 μs; b: −201 μs). c, d, IPD tuning of a and b plotted in polar coordinates as normalized variance versus IPD in fractions of stimulus period. Direction of the arrows indicates best IPD (c: 0.14 cycles; d: −0.16 cycles), and the length of the arrow indicates VS of IPD tuning (c: VS = 0.5; d: VS = 0.47). e, Best IPD as a function of BF. f, Histogram of best IPDs (n = 83, bin width: 0.25 cycles). Average best IPD = −0.03 ± 0.20 cycles (circular mean ± circular SD).
ITD tuning in noise floor responses as proxy for NL output. a, The neurophonic response in NL can be separated into a signal component generated by phase-locked activity and a noise component. Voltage traces of one stimulus repetition at recording site 59.8 are shown (stimulus ITD = 200 μs, stimulus frequency = 800 Hz). b–d, Variance of response (dotted lines), signal (dashed lines), and noise (solid blue line) as a function of ITD. Black arrows indicate signal best ITD. Blue arrows indicate noise best ITD (b: recording site 56.4; c: 67.9; d: 59.8). e, Correlation between signal and noise best ITD. f, Histogram of best IPD offset (signal best IPD − noise best IPD). IPDs are wrapped into a single cycle (−0.5 to 0.5). Black bars represent all data. Gray bars represent only data from recording sites, which location was confirmed by lesion.
Best ITD and internal delays
Monaural click responses provided a measure of the internal conduction delays generated by the delay lines. The oscillatory (neurophonic) response to broadband clicks was typically dominated by frequencies near the BF of the recording site (see example in Fig. 5a; BF = 970 Hz). The time delay between the two waveform responses to ipsilateral and contralateral stimulation was determined by cross-correlation (see Materials and Methods). The cross-correlation lag that produced the maximum correlation corresponds to the click delay (Fig. 5b). Click delays were expected to have the opposite sign as best ITD because the internal delay compensates for the external delay (ITD). This is illustrated by the cross-correlation function in Figure 5b of the waveforms in Figure 5a with −205 μs click delay (black lines) and the normalized ITD tuning curve with 142 μs best ITD. Click delays, indeed, were inversely correlated and predicted best ITD well (Pearson's correlation coefficient r = −0.95; Fig. 5c). Click delay was also used to disambiguate the laterality of best ITD. In 7 of 52 recording sites where click delay was recorded, click delay and best ITD were more than one stimulus cycle apart. In these cases, best ITD was outside the π limit (Fig. 5d, solid black line). The average tympanal separation in this study of 3 cm (mean ± SD: 3 ± 0.36 cm) was used to calculate the physiological ITD range with a model of internally coupled ears (see Materials and Methods). Therefore, the range of best ITDs decreased with increasing BF; 36% (n = 30; Fig. 5d) of the recording sites had best ITDs outside the physiological range. Overall, best ITD clustered around 0 μs with both ipsilateral and contralateral preferred ITDs (Fig. 5e; mean ± SD: 4 ± 358 μs). Dufeau and Witmer (2015) estimated the resonant frequency of the middle ear sinuses from Helmholtz' resonator equation to be between 720 and 1360 Hz across all dimensions of their study specimens. This frequency range correlates with the best hearing range and the dominant frequency in juvenile calls. We, however, did not see an influence of frequency on ITD tuning (shape and best ITD) in our data.
Click delays. a, Average response at recording site 67.9 to presentation of 128 monaural clicks. Red represents right/ipsilateral response. Blue represents left/contralateral response. b, The click delay was determined by cross-correlation of the monaural click responses (black line). For recording site 67.9, the click delay (cross-correlation lag with maximum correlation) was −205 μs (black arrow). Dashed purple line indicates corresponding ITD tuning curve of site 67.9. Dashed purple arrow indicates best ITD (142 μs). c, Correlation of click delays and best ITD. Black line indicates best fit with −0.933 × x − 29.642, r = − 0.95, and n = 52. d, Best ITD as a function of BF. Laterality was disambiguated by responses to monaural clicks if click delays were available. Circles represent unambiguous best ITDs. Triangles represent ambiguous best ITDs. Shaded areas represent sample mean ± 1 SD of physiological ITD range. Dashed lines indicate ITD range of the smallest (tympanal separation 2.3 cm) and largest alligator (3.6 cm) range derived from a model of internally coupled ears (Calford and Piddington, 1988). e, Distribution of best ITDs. Negative ITDs denote ipsilateral leading sounds. f, Differences in the ipsilateral and contralateral BF of 64 recording sites versus best ITD. The Pearson's correlation was not significant with the best fit −0.037 × x − 32.88 and r = 0.
Effects of cochlear delays (compare stereausis model) (Shamma et al., 1989) on the data can be excluded. The stereausis model postulates that internal delays are generated by differences of traveling times of sound information along left and right basilar membranes. Thus, best ITD should vary systematically with the mismatch between ipsilateral and contralateral BF. A larger difference between ipsilateral and contralateral BF would lead to larger best ITDs. However, no effect of the frequency mismatch on best ITD was found (Fig. 5f; Pearson's correlation coefficient r = 0, p = 0.80). These results confirm data from Carr et al. (2009).
Best IPDs were well predicted by monaural phase delays (n = 64) similar to the prediction by click delays. Figure 6a shows filtered response waveforms (signal component) of recording site 56.4 (compare its binaural waveform in Fig. 1c) in response to monaural stimulation with 1300 Hz (shaded areas represent ± 1 SD with 5 stimulus repetitions). The response variance with contralateral stimulation (0.37 mV2, blue) was slightly lower than with ipsilateral stimulation (0.5 mV2, red). More importantly, the waveforms were phase shifted. It was expected from a Jeffress-model-like array with purely excitatory inputs that the internal delay compensates for the external ITD. The internal phase delay (Δphase = −0.39) was calculated from the monaural mean phase (Fig. 6b; shaded areas represent ± 1 SD) for the same frequency at or close to BF at which ITD/IPD tuning curves were recorded (Fstim = 1300 Hz for site 56.4, best IPD = 0.35; compare Fig. 1g). Indeed, Δphase was inversely correlated with best IPD (Fig. 6c; Pearson's correlation coefficient r = −0.93 for all 64 sites where Δphase was recorded). The phase cycles could extend beyond −0.5 and 0.5 if the best ITD was disambiguated by click delays (compare Fig. 4). The correlation also applied to individual frequency ranges (Fig. 6c; BF < 800 Hz, r = − 0.91; 800 Hz ≤ BF ≤ 1300 Hz, r = −0.97; BF > 1300 Hz r = −0.95). The same tendency was observed for very low frequencies (BF < 500 Hz; Fig. 6c, white inverted triangles). In summary, maximum responses were generated by ITDs that compensated for the internal delays, a principle of the Jeffress model.
Monaural phase delays. a, Filtered recordings of site 56.4 in response to monaurally presented 1300 Hz tones. Red represents right/ipsilateral. Blue represents left/contralateral. Shaded areas represent 1 SD with 5 repetitions. b, Phase spectra of the signals in a. Solid lines indicate circular mean. Shaded areas represent 1 circular SD. The monaural phase delay (Δphase = contralateral phase (−0.43 cycles) − ipsilateral phase (−0.04 cycles) = −0.39 cycles; best IPD = 0.35 cycles) was calculated from the mean monaural phases at 1300 Hz (double-headed arrow). c, Monaural phase delay versus best IPD of the cyclic component. A few IPDs extend beyond [−0.5, 0.5] because of disambiguation by click delays (see Fig. 4). Magenta triangles represent data for sites with BF < 800 Hz with best fit as dotted magenta line (r = −0.86). Empty magenta triangles represent data points with BF < 500 Hz and are included in fit for BF < 800 Hz. Black circles represent sites in mid frequency range with BF ≥ 800 Hz and ≤ 1300 Hz. Solid black line indicates best fit for mid-frequencies with r = −0.97. Blue triangles represent BFs > 1300 Hz with best fit as dashed blue line (r = −0.95).
Maps in NL
Tonotopic organization characterizes the peripheral and early central auditory pathway in numerous archosaur species (for review, see Dooling et al., 2000). In the crocodilians, the basilar papilla and cochlear nuclei show a tonotopic arrangement (Manley, 1970; Leake, 1974; Wilson et al., 1985). It was suspected, but not shown so far, that the same holds for crocodilian NL. Furthermore, the computational basis for ITD detection is similar across archosaurs. Therefore, it is likely that ITDs are also arranged in a map. To test these hypotheses, 27 sites were successfully marked with lesions or their position was determined using stereotactic coordinates (Fig. 7a,b). Figure 7c,d shows the frequency and ITD tuning curves of the tagged recording site (site 59.2) from Figure 7a. The location within a normalized NL was indicated by an X in Figure 7b. The recording site was in the caudomedial NL, with BF = 420 Hz and best ITD = −213 μs. BFs changed systematically in the caudorostral and mediolateral axes (Fig. 7e,f; r = 0.73 and r = −0.57, respectively). Projection onto a normalized NL (see Materials and Methods) clearly shows a gradient of BFs (Fig. 7g) with high frequencies in the rostromedial region and low frequencies in the caudolateral range. Best ITDs were also correlated with NL location in both axes. However, the relationship was more pronounced in the mediolateral axis than in the caudorostral axis (Fig. 7h,i; r = 0.29 and r = 0.77, respectively). Figure 7j shows the projection of best ITDs onto the NL. Ipsilateral leading ITDs were located in the medial part of NL throughout the rostrocaudal axis, whereas extremely contralateral leading best ITDs (>500 μs) were located laterally at the widest point of the NL. Frequency and ITD maps were thus perpendicular to each other, with a smaller range of ITDs being represented in the higher frequency range than in the lower frequency range. This was expected from the general relationship between best ITD and BF (Fig. 5d). These data show, for the first time, in crocodilians that delay lines generate ITD maps within iso-frequency axes in NL.
Maps in NL. a, Coronal section of auditory brainstem of alligator 59. An electrolytic lesion in NL after recording from recording site 59.2 is indicated with an arrow. The image is left-right inverted; hence, the lesion was localized in the right hemisphere. b, Identified positions of 27 units. Coordinates are normalized in mediolateral and caudorostral dimension relative to the maximal extend of NL. 0 on the mediolateral axis corresponds to the brain midline and 1 to the maximum width of NL. Black lines indicate the boundaries of NL. c, d, Frequency and ITD tuning curves of the cyclic component at recording site 59.2. BF (420 Hz) and best ITD (−213 μs) are indicated by arrows. e, f, BF versus relative location of the recording site on the caudorostral and mediolateral axis, respectively. g, Frequency map in NL. 2D nearest neighbor interpolation of the data in e, f. Interpolated map is smoothed with a running average filter and squeezed into the boundaries of NL. Color code represents BF. h, i, Same as e, f, but for best ITD. j, ITD map in NL. Same method for interpolation used as for f. Color code represents best ITD. b, d–i, Black cross indicates recording site 59.2.
Optimal coding
Next, we tested whether maps are the most efficient (i.e., the optimal) way to encode ITD. Figure 8 illustrates the relationship between best IPD and frequency in NL of archosaurs (alligator: current study; Carr et al., 2009; chicken: Palanca-Castan and Köppl, 2015a; and owl: Palanca-Castan and Köppl, 2015b) and shows the predications of an optimal coding model (Harper and McAlpine, 2004). The optimal coding model assumes that best IPDs are distributed to maximize the information carried by a neuronal population at a specific frequency within the constraints of the natural IPD range. The natural IPD range depends on frequency and head size. Therefore, the optimal encoding model predicts a differing best IPD distribution in different species. With small heads and at low frequencies as, for example, in gerbils and kangaroo rats, the optimal coding model predicts two populations of neurons narrowly tuned to IPD outside the physiological range (Harper et al., 2014, their Fig. 2). Model predictions for archosaurs with similar head dimensions differ from those for mammals because archosaur ears function as pressure difference receivers and thus, have an increased IPD range at low frequencies (Fig. 8a–d). The model predicts only a narrow frequency band ∼<400 Hz where two channels outside the physiological range are optimal for the three archosaurs. Above that frequency, two or three central narrow populations within the physiological range were predicted to be optimal with an intermediate frequency range that only showed two narrow peaks left and right of 0 (range indicated in Fig. 8a–d, white lines). The model predicts a uniform distribution consistent with the Jeffress model for high frequencies where the physiological IPD range is greater than the π limit (Harper and McAlpine, 2004), as shown in barn owls (Fig. 8c) and large alligators (TS = 10 cm; Fig. 8d). In summary, at very low frequencies the optimal coding model predicts a two-channel distribution with best IPDs outside the physiological range and consistent with a two-hemisphere model (McAlpine et al., 2001; Grothe et al., 2010). At high frequencies, the distribution of best IPDs is consistent with a Jeffress-like place code (for review, see Konishi, 2003) and for intermediate physiological IPD ranges and frequencies optimal coding predicts two or three central populations of best IPDs within the physiological range. Figure 8 (middle row) shows experimental data for alligators, chicken, and barn owls. The best IPDs are folded into a single cycle from −0.5 to 0.5 and mirrored ∼0 because the same but mirrored distribution is expected in the two brain hemispheres. In all three archosaurs, best ITDs (and thus best IPDs) can be outside the physiological range at low frequencies but are broadly distributed in each frequency band (compare Fig. 4d with Fig. 4e). This distribution is predicted for a place code of ITD but may not be optimal. To test this, we compared the collapsed experimental data and model data within the frequency bands in which two central populations were predicted. We focused on those bands where prominent populations to the left and right of 0 did not shift with increasing frequencies (Fig. 8a–d, white lines). The frequency bands are 600–1850 Hz for small alligators, 150–500 Hz in large alligators, 900–2900 Hz for chicken, and 350–1250 Hz in the barn owl. Figure 8h–k shows the cumulative distributions for positive IPDs of both collapsed experimental and model data across the intermediate frequency band. These distributions were compared using a two-sided Kolmogorov–Smirnov test with the null hypothesis, that model and experimental data come from the same distribution; in other words, that the IPD is encoded optimally by the NL neurons. These tests revealed that in all four conditions the distribution of best IPDs recorded in the archosaurs differed significantly from the model prediction (Kolmogorov–Smirnov test, small alligators: p = 2.98 × 10−44, nexperimental = 107, nmodel = 625; large alligators: p = 1.74 × 10−9, nexperimental = 35, nmodel = 175; chicken: p = 5.29 × 10−21, nexperimental = 63, nmodel = 1000; barn owl: p = 1.46 × 10−17, nexperimental = 38, nmodel = 450). There are few published data available on the tympanal distance in extremely large alligators. Witmer and Ridgely (2008) reported on a large alligator (total skull length, 37 cm, corresponding to ∼260 cm total length, for skull and total length relationship in alligators; compare Woodward et al., 1995). We used their Figure 3 to estimate a tympanal separation of 7.8 cm from this specimen. We also interpolated data from Crocodylus porosus (saltwater crocodile) (Webb and Messel, 1978) to gain realistic tympanal separations for larger alligators. Although the heads grow large over an alligator's lifetime, the eye and ear distance stay relatively small (TS up to 15 cm). Alligators, however, seldomly get larger than 350 cm (Woodward et al., 1995). Therefore, maximum relevant tympanal separation should be ∼13 cm (Webb and Messel, 1978). The model predicted two or three narrow IPD channels for <500 Hz in this case. Increasing TS to 15 cm only reduces the threshold between narrow channels and uniform IPD distribution to 450 Hz. The combined physiological data from this study and Carr et al. (2009) show a broad distribution of best IPD for frequencies between 350 and 500 Hz (Fig. 8e), thus indicating that best IPDs are not distributed optimally for sound localization, even with larger heads (Kolmogorov–Smirnov test, TS = 13 cm, p = 5.87 × 10−4, nexperimental = 35, nmodel = 300). In summary, we show that archosaur IPD and ITD are not encoded optimally at the detection stage. Increasing head size in alligators increases the frequency range over which a map of ITD is optimal for precise sound localization, but encoding remains nonoptimal overall, even in extremely large individuals. Thus, archosaurs may have evolved a stable local optimum that is not the global optimum and that is also different from a two-hemisphere code. It is important to note that the formation of a map at the detection level in the early auditory pathway does not necessarily exclude a hemispheric rate code in later stages. Transformation of the representation of ITDs may take place after the detection level (for review, see Vonderschen and Wagner, 2014). For example, there is evidence that in the owl forebrain a hemispheric rate code is realized (Beckert et al., 2017) in contrast to an ITD map in its midbrain (Wagner et al., 1987).
Optimal coding of IPD. Top row represents the predictions of an optimal coding model (Harper and McAlpine, 2004) of best IPD distributions in a neuronal population at different frequencies for small alligators (a), chicken (b), barn owl (c), and large alligators (d). The predictions depend on the physiological range of IPDs (red lines). Sizes of the 2D histograms bins were 0.25 cycles (horizontal) and 50 Hz (vertical). e, Experimental data for alligators from this study and Carr et al. (2009). Solid red lines indicate IPD range for small alligators with 3 cm tympanal separation. Dashed red lines indicate the IPD range for large alligators with 10 cm tympanal separation. Bin sizes are 0.05 cycles and 150 Hz. f, Experimental data for chicken (source: Palanca-Castan and Köppl, 2015a). Bin sizes are 0.05 cycles and 450 Hz. g, Experimental data from barn owls (source: Palanca-Castan and Köppl, 2015b). Bin sizes are 0.05 cycles and 500 Hz. h–k, Cumulative distributions of collapsed model data (dashed blue line) and experimental data (red line) within the frequency range indicated by white lines in a–d. Results of a two-sided Kolmogorov–Smirnov test are indicated in the respective figure panel.
Discussion
We have shown that one of the closest living relatives of birds, the crocodilian Alligator mississipiensis, has well-organized maps of ITD in its NL, extending from ipsilateral best ITDs of ∼500 μs to contralateral ITDs of ∼1500 μs. Alligators are most sensitive to low-frequency sound, with BFs recorded between 400 and 2000 Hz (Smolders and Klinke, 1986; Bierman et al., 2014). Thus, these crocodilians have a very similar distribution of best ITDs to that found in the chicken (Köppl and Carr, 2008; Aralla et al., 2018). Our analysis of physiological data from the alligator, the chicken, and the barn owl revealed that ITD coding across archosaurs is not consistent with optimal coding and may result from a local optimum in evolution arising from common ancestry and independent appearance of tympanic ears in tetrapod evolution.
ITD coding
ITD discrimination is assumed to be based on responses from populations of neurons whose activity provides ITD resolution beyond that provided by individual cells (Fitzpatrick et al., 1997; Takahashi et al., 2003). Populations of ITD-sensitive neurons may be arranged in a topographically organized map, or place code, or grouped into two broad, hemispheric channels. In the map scheme, afferent inputs form delay lines that compensate for the range of external delays and innervate arrays of coincidence detectors, leading to the formation of a place map of ITD. In the two-channel scheme, the single neurons on one side of the brain are narrowly tuned to ITD, with opposite tuning in the other hemisphere. Sound source location is encoded by the relative difference in the average activities of the two populations, with this coding strategy therefore referred to as the two-channel hemispheric (or hemispheric difference) model (McAlpine and Grothe, 2003; Grothe et al., 2010; Grothe and Pecka, 2014; Lingner et al., 2018).
In alligators, ITD-sensitive neurons in NL are organized as a map, with ipsilateral best ITDs located medial in the nucleus, and contralateral ITDs lateral. Similar maps, or place codes, have been found in the 2 birds studied, barn owls and chickens (Köppl and Carr, 2008; Carr et al., 2015), with additional support for a place code in emus (MacLeod et al., 2006). By contrast, there is a preponderance of support for a two-channel scheme in gerbils (Pecka et al., 2008), and cat (Karino et al., 2011) with additional support from work in cat inferior colliculus (Hancock and Delgutte, 2004). In mammals, ITDs are not arranged in a topographically organized map at either the level of ITD processing in the auditory brainstem (Karino et al., 2011; Grothe and Pecka, 2014) or at higher processing stages, including the superior colliculus (Campbell et al., 2006). These differences are the likely consequence of the parallel evolutionary origins of spatial hearing in mammals and birds (for reviews, see Christensen-Dalsgaard and Carr, 2008; Grothe and Pecka, 2014; Walton et al., 2017; Lingner et al., 2018). Overall, archosaurs, including birds, and mammals can both localize sounds in space, but use different neuronal strategies to encode sound location. This interesting finding suggests that both solutions may be “good enough” (Marder and Goaillard, 2006; Schnupp and Carr, 2009) and not necessarily consistent with optimal coding (Harper and McAlpine, 2004).
Evolution of sound localization circuits in archosaurs and mammals
It is not unreasonable that natural selection led to similar but different solutions in archosaurs and mammals because their common ancestors did not have tympanic hearing (Clack, 2002), and their tympana (when they evolved) developed from different tissues. The early tetrapod ear appears not have been adapted for hearing in air (Lombard and Bolt, 1979; Clack, 2002). Tympanic hearing is hypothesized to have developed independently in at least five major tetrapod groups, the anurans, lepidosaurs, archosaurs (and turtles), and mammals, and is a true evolutionary novelty (Christensen-Dalsgaard and Carr, 2008; Christensen-Dalsgaard and Manley, 2013; Kitazawa et al., 2015). Tympana would have increased the frequency range and sensitivity of hearing and led to changes in the central auditory processing of both high-frequency sound and directional hearing.
Current theories support formation of the tympanum followed by later closure of the middle ear cavity (Christensen-Dalsgaard and Manley, 2013). The closed middle ear in mammals and archosaurs is thus a derived condition that would have changed the operation of the ear by decoupling the tympana and leading to a requirement for the computation of directionality in the brain. This is because acoustically coupled tympana act as pressure difference receivers and are inherently directional (Michelsen and Larsen, 2008). Loss of coupling would have changed the role of the central auditory system from enhancing a preexisting directional signal to computing a directional signal. Coding of sound source location therefore differs between archosaurs and lizards, with lizards having a directional input from the auditory nerve (Christensen-Dalsgaard et al., 2011) and a relatively small first-order nucleus magnocellularis and NL (Tang et al., 2012), whereas birds (and now alligators) have a less directional signal from the periphery and a larger nucleus magnocellularis and NL (Walton et al., 2017).
With respect to the well-studied mammals, the different evolutionary origins of tympanic ears (Rich et al., 2005; Kitazawa et al., 2015) and different available binaural cues in early mammals and archosaurs may have imposed distinct constraints on the respective binaural processing mechanisms. Grothe et al. have even proposed that the hearing of early mammals may have been dominated by high-frequency sensitivity (Grothe and Pecka, 2014). It seems likely that the early mammals were both small and nocturnal (Rosowksi and Graybeal, 1991; Wu et al., 2017), but less is known about hearing in the ancestors of mammals, the cynodont reptiles. Hopson (1966) described a cynodont specimen as having a tympanum, and middle ear cavity at the end of a wide eustachian tube, such as is seen in lizards (Hopson, 1966). A more recent analysis of several nonmammaliaform cynodonts reveal a much more complex pattern of stapedial anatomy but support the conclusion that cynodonts possessed an air-filled middle ear and tympanum. Whatever events characterized evolutionary development of the mammalian auditory system, it seems unlikely that hearing in early mammals was dominated by low-frequency directional responses. Instead, data support the view put forward by Grothe et al. (Grothe, 2000; Grothe et al., 2010; Grothe and Pecka, 2014; Lingner et al., 2018) that directional hearing in mammals may have evolved from a brainstem EI system like that in the mammalian lateral superior olive, frog superior olive (Feng and Capranica, 1978), and teleost descending nucleus (Walton et al., 2017).
Neurophonic
In both birds and alligators, NL neurons receive binaural phase-locked inputs from the axons of ipsilateral and contralateral NM (Carr and Konishi, 1990; Carr et al., 2009). In owls, phase-locked spikes generate the neurophonic, a large (millivolt range) sound-evoked, frequency-following extracellular potential (Kuokkanen et al., 2010). Putative generators of the neurophonic are the activity of afferent axons, synaptic activation of laminaris neurons, or action potentials in laminaris neurons. Theoretical and experimental analyses provide strong support for the neurophonic originating as a summed coherent signal from the densely packed afferent axons (Kuokkanen et al., 2010, 2013, 2018; McColgan et al., 2017, 2018). Similar neurophonics have been recorded in chickens (Schwarz, 1992), in mammals (Weinberger et al., 1970; Henry, 1997; McLaughlin et al., 2010; Day and Semple, 2011; Goldwyn et al., 2014), and now in alligators, and are hypothesized to originate from synaptic potentials (Schwarz, 1992; McLaughlin et al., 2010; Goldwyn et al., 2014).
Intracellular recordings from the barn owl's NL in vivo showed that presynaptic phase-locked inputs induce oscillations in the postsynaptic membrane potential (Funabiki et al., 2011). To what extent are these reflected/represented in the signal component of the extracellular neurophonic? A modeling study derived analytical relationships between presynaptic, synaptic, and postsynaptic parameters, and the signal and noise components of the oscillation in barn owl NL (Ashida et al., 2013a, b). They found that, provided the total synaptic input is kept constant, changes in the number and spike rate of NM fibers altered the ITD-independent noise, whereas the degree of phase-locking was linearly converted to the ITD-dependent signal component of the intracellular potential (Ashida et al., 2013a). In barn owls, very large numbers of NM afferent converge in NL, producing a coherent, high signal-to-noise neurophonic potential (Kuokkanen et al., 2010). The simulations of Ashida et al. (2013a) suggest that a smaller number of presynaptic NM fibers, with lower phase locking and lower mean firing rates, would increase the “noisiness” of the extracellular signal (their Fig. 3) (see also Fig. 1, current study). It is likely that the alligator NL neurons receive fewer presynaptic inputs than the barn owl, and they show weaker phase locking and mean firing rates than those observed in the owl (Smolders and Klinke, 1986; Carr et al., 2009), which would mean that fewer sources contribute to the oscillatory neurophonic component than in owls or in cats (McLaughlin et al., 2010).
Footnotes
This work was supported by National Institutes of Health Grant DCD000436. We thank Paula Kuokkanen, Janie Ondracek, Uwe Firzlaff, and Harald Luksch for advice and comments on the manuscript; Sharad Shanbhag and Go Ashida for programming and updating tytology; Ruth Elsey (Rockefeller Wildlife Refuge) for help with collecting American alligators; and Hilary Bierman for helping with head size measurements.
The authors declare no competing financial interests.
- Correspondence should be addressed to Lutz Kettler at lutz.kettler{at}tum.de