The precedence effect describes the phenomenon whereby echoes are spatially fused to the location of an initial sound by selectively suppressing the directional information of lagging sounds (echo suppression). Echo suppression is a prerequisite for faithful sound localization in natural environments but can break down depending on the behavioral context. To date, the neural mechanisms that suppress echo directional information without suppressing the perception of echoes themselves are not understood. We performed in vivo recordings in Mongolian gerbils of neurons of the dorsal nucleus of the lateral lemniscus (DNLL), a GABAergic brainstem nucleus that targets the auditory midbrain, and show that these DNLL neurons exhibit inhibition that persists tens of milliseconds beyond the stimulus offset, so-called persistent inhibition (PI). Using in vitro recordings, we demonstrate that PI stems from GABAergic projections from the opposite DNLL. Furthermore, these recordings show that PI is attributable to intrinsic features of this GABAergic innervation. Implementation of these physiological findings into a neuronal model of the auditory brainstem demonstrates that, on a circuit level, PI creates an enhancement of responsiveness to lagging sounds in auditory midbrain cells. Moreover, the model revealed that such response enhancement is a sufficient cue for an ideal observer to identify echoes and to exhibit echo suppression, which agrees closely with the percepts of human subjects.
- echo suppression
- precedence effect
- Clifton effect
- dorsal nucleus of the lateral lemniscus
- inferior colliculus
- binaural processing
- persistent inhibition
Faithful localization of an initial sound source in the presence of a multitude of its echoes is a fundamental challenge to our auditory system. The system copes with this challenge by suppressing the directional information of echoes (echo suppression) without eliminating their overall perception. Thus, we are aware of the presence of echoes but do not localize them. This phenomenon is referred to as “precedence effect” and applies when lagging sounds trail leading sounds in the range of 2 to 10–20 ms (depending on the nature of the sound). Two sounds delayed by <2 ms are spatially fused and are heard as a single sound located midway between the two. Sounds temporally segregated by >10–20 ms exceed the so-called “echo threshold” and are perceived as independent entities with their own spatial location (Zurek, 1987; Blauert, 1997; Litovsky et al., 1999). Moreover, depending on the context, echo suppression can break down [“Clifton effect” (Clifton, 1987)], indicating facultative processing of echoes in higher brain centers.
The neural mechanisms underlying the precedence effect have to match three criteria: first, the circuitry has to be part of the binaural system, because interaural disparities are the main cues for sound localization (Rayleigh, 1907; Erulkar, 1972). Second, it has to operate on a time scale in the range of ∼2–20 ms, hence on a different time scale than the binaural disparity detectors, which operate on a microsecond time scale. Third, the circuitry must be able to either specifically suppress the directional information of the trailing sound or identify and tag it as an echo for additional context-dependent processing.
A candidate circuitry that matches at least two of these criteria has been described for echo-locating bats (Pollak, 1997). The key structure in this circuit is the dorsal nucleus of the lateral lemniscus (DNLL) (Fig. 1). First, many DNLL neurons respond to sounds from the contralateral ear but are inhibited by sounds from the ipsilateral ear and thus are sensitive to interaural intensity differences (IIDs), a feature they inherit from the lateral superior olive (LSO) (Glendenning et al., 1981; Shneiderman et al., 1988). Second, in bats, DNLL neurons display an additional characteristic not known from the LSO: depending on the spatiotemporal succession of stimuli, their response can be suppressed for tens of milliseconds as a result of GABAergic inhibition (Yang and Pollak, 1994a,b, 1998; Burger and Pollak, 2001). It has been hypothesized that such “persistent inhibition” (PI) is derived from the opposite DNLL via the commissure of Probst (Yang and Pollak, 1998) (Fig. 1A).
Here, we show the existence of PI in the DNLL of the Mongolian gerbil, a choice mammalian animal model for high- and low-frequency hearing, and prove that its cellular basis derives from GABAergic innervation provided by the opposite DNLL. Based on these findings, we developed a model that explains how the DNLL output can generate context-dependent suppression of directional information of lagging sounds in the auditory midbrain. Finally, we report results from a psychophysical echo-suppression test with human subjects that closely matches the predictions of the model.
Materials and Methods
In vivo recordings.
All experiments were approved according to the German Tierschutzgesetz (AZ 211-2531-40/01). Mongolian gerbils (Meriones unguiculatus; 2–3 months of age) were anesthetized by an initial intraperitoneal injection (0.5 ml/100 g of body weight) of a physiological NaCl solution containing ketamine (20%) and xylazine (2%), with supplementary doses of 0.05 ml of the same mixture given subcutaneously every 30 min. The animal was then transferred to a sound-attenuated chamber and mounted in a custom-made stereotaxic instrument (Schuller et al., 1986). A small hole was cut into the skull (∼1 mm2), and the dura was removed. Ringer's solution was frequently applied to the opening to prevent dehydration of the brain. Constant body temperature (37–39°C) was maintained using a thermostatically controlled heating blanket. After recordings (10–12 h), the animals were killed (injection of 0.2 ml of T61). For some recording sessions, current-induced lesions (5 mA for 5 s) using metal electrodes (5 MΩ) were made to mark the recording site after successful experiments. The brains of these animals were fixed, sliced, and Nissl stained by standard methods.
Acoustic stimuli were digitally generated at a sampling rate of 50 kHz by TDT System III (Tucker-Davis Technologies, Alachua, FL), converted to analog signals (DA3-2/RP2-1; Tucker-Davis Technologies), attenuated (PA5; Tucker-Davis Technologies), and delivered to the earphones [Stereo Dynamic Earphones (MDR-EX70LP; Sony, Tokyo, Japan) or EC1 electrostatic speaker (Tucker-Davis Technologies); for details and calibration procedures, see Siveke et al. (2006)]. All signals had rise–fall times (RFTs) of 5.0 ms and were presented at a repetition rate of 4 Hz unless stated differently. Action potentials from single cells were recorded extracellularly using tungsten electrodes (5 MΩ; World Precision Instruments, Berlin, Germany) or glass electrodes filled with 1 m NaCl (∼10 MΩ). The recording electrode was advanced under remote control, using a motorized micromanipulator (Digimatic; Mitutoyo, Neuss, Germany) and a piezodrive (Inchworm controller 8200; EXFO Burleigh Products Group, Fishers Victor, NY). Spikes were amplified, filtered, and fed to an analog-to-digital converter (RP2-1; Tucker-Davis Technologies), and the digitized signals were fed to the computer. To search for acoustic responses, 200 ms uncorrelated noise bursts were delivered binaurally with equal intensities at the two ears (IID = 0 dB). When a neuron was encountered, its best frequency (BF) (the frequency that elicited response at the lowest intensity) and absolute threshold were determined audiovisually (IID = 0). BFs ranged from 650 Hz to 18 kHz. Sixty-four percent of the neurons (32 of 50) were tuned to “high” frequencies (BF > 2 kHz). Binaural and monaural pure tones were presented to determine the binaural properties of the neuron. IID sensitivity was assessed by holding the intensity on the excitatory ear constant at 20 dB above the threshold of the cells while varying the intensity on the inhibitory ear in 10 dB steps between 10 dB below and 50 dB above threshold. Neurons were defined as IID sensitive if ipsilateral stimulation at BF reduced the maximal response elicited by contralateral stimulation by >50%.
To test an IID-sensitive neuron for PI, we evoked a steady response by presenting a tone burst (200 ms at BF) at 20 dB above threshold on the excitatory ear. Additionally, we presented shorter tone bursts (10 or 20 ms at BF) with several different intensities on the inhibitory ear midway through the excitatory stimulus. Stimuli were cos2-function gated with RFTs of 5 ms for the contralateral and 2 ms for the ipsilateral side (if not stated otherwise). A DNLL neuron was defined as persistently inhibited if the duration of total suppression of responses to contralateral stimulation exceeded ipsilateral stimulus duration by at least 5 ms. The duration of PI was evaluated from peristimulus-time histograms of 1 ms bin width.
In vitro recordings.
Coronal slices of DNLL were prepared from 14- to 19-d-old Mongolian gerbils (M. unguiculatus). Animals were anesthetized by isoflurane inhalation (Isofluran Curamed; Curamed Pharma, Karlsruhe, Germany) and decapitated. The brainstem was dissected out in ice-cold dissection Ringer's solution [(in mm) 125 NaCl, 2.5 KCl, 1 MgCl2, 0.1 CaCl2, 25 glucose, 1.25 NaH2PO4, 25 NaHCO3, 0.4 ascorbic acid, 3 myo-inositol, and 2 pyruvic acid; all chemicals from Sigma (Deisenhofen, Germany)]. Sections of 200 μm were cut with a vibratome (VT1000S; Leica, Nussloch, Germany). Slices were transferred to an incubation chamber containing extracellular solution (ECS) [(in mm) 125 NaCl, 2.5 KCl, 1 MgCl2, 2 CaCl2, 25 glucose, 1.25 NaH2PO4, 25 NaHCO3, 0.4 ascorbic acid, 3 myo-inositol, and 2 pyruvic acid; all chemicals from Sigma], bubbled with 5% CO2–95% O2, and incubated for 1 h at 37°C.
All recordings were performed at ∼36°C. After incubation, slices were transferred to a recording chamber and continuously superfused with ECS at 3–4 ml/min through a gravity-fed perfusion system. DNLL neurons were viewed at 40× through a Zeiss (Oberkochen, Germany) Axioskop 2 FS microscope equipped with differential interference contrast optics. Whole-cell recordings were made with an EPC 10 double amplifier (HEKA Instruments, Lambrecht/Pfalz, Germany). Signals were filtered at 5–10 kHz and subsequently digitized at 20–100 kHz using Patchmaster version 2.02 software (HEKA Instruments). During recordings, series resistance was compensated electronically up to 60% to achieve a remaining series-resistance error <4 MΩ. All voltages are corrected for the junction potentials of the two different intracellular solutions.
Whole-cell recordings were performed with the following intracellular solutions (in mm): for current clamp, 125 K-gluconate, 5 KCl, 10 HEPES, 1 EGTA, 10 sodium-phosphocreatine, 2 Na2-ATP, 2 Mg-ATP, 0.3 Na2-GTP, pH adjusted to 7.25 (all chemicals from Sigma); and for voltage clamp, 140 CsCl, 10 HEPES, 10 EGTA, 2 NaCl, 1 CaCl2, 2 Mg-ATP, 0.3 Na2-GTP, pH adjusted to 7.3 (all chemicals from Sigma). The Cl− reversal potential was estimated to be at +2 mV. Because of a holding potential of −60 mV, GABAergic chloride currents are reported as inward currents. During all recordings, 500 nm strychnine hydrochloride (Sigma) and 2.5 mm kynurenic acid were added to the bath to block glycinergic and glutamatergic transmission, respectively. During voltage-clamp recordings, 5 mm QX-314 (lidocaine N-ethyl bromide; Alomone Labs, Jerusalem, Israel) was added to the intracellular solution to block sodium channels, preventing action potential generation.
Synaptic currents were elicited by stimulation of the commissure of Probst with a 5 MΩ bipolar stimulation electrode (matrix electrodes with 270 μm distance; FHC, Bowdoinham, ME). Stimuli were 100-μs-long square pulses adjusted to elicit maximal responses (35–90 V) and were delivered with an STG 2004 computer-controlled four-channel stimulator (Multichannel Systems, Reutlingen, Germany) and a stimulation isolation unit (Iso-Flex; AMPI, Jerusalem, Israel).
The model of the auditory brainstem is based on computational models of dynamic spiking neurons that were developed by T. P. Zahn. The cochlea base element contains a filter cascade of second-order all-pole gammatone filters (Lyon, 1997) corresponding to 16 different locations along the basilar membrane. The outputs of each of the 16 filter channels were fed to 16 hair-cell models that generated oscillatory potentials for each frequency channel (2–8 kHz). Their amplitude, phase, and frequency are coded by spike trains in the auditory nerve fibers generated by three sets of ganglion cells for each frequency channel. Thus, each of the modeled nuclei in the auditory brainstem contains 16 cell models, each tuned to a different frequency.
The neural modeling approach that we used, termed Spike Interaction Model (SIM), uses the precise spatiotemporal interaction of single spike events for coding and processing of neural information. SIM can identify and code phase, frequency, and amplitude dynamics on the time resolution of 10 μs. The model exclusively uses single spike events for information coding, transfer, and interpretation. The basic elements are integrate-and-fire neurons extended by specially designed dynamic transfer functions. SIM extends the kernel transfer function of synapses and neurons by dynamic properties, leading to nonuniform dynamic responses depending on the firing history of the elements and their surroundings. The main synaptic features simulated for each synapse were depolarization time and slope, repolarization time and slope, transmitter availability, and overall synaptic efficiency. The dynamic features of the cells included dendritic delay and decay, dendritic spatial and temporal summation, somatic summation, dynamic firing threshold of the axon hillock, afterhyperpolarization, and axonal delay. The SIM model that we used consists exclusively of Neural Base Elements of a specifically designed Neural Base Library that extends the commercially available environment of MATLAB/Simulink (The MathWorks, Natick, MA) by a set of neural models intended to simulate the intrinsic dynamic properties of neurons, synapses, dendrites, and axons. To create PI in the model DNLL neurons, hyperpolarization of the cell membrane potential caused by the inhibitory inputs of the contralateral DNLL and ipsilateral LSO were modeled with different time constants of 12 and 5 ms, respectively. Detailed information about model parameters can be found in supplemental Tables 1 and 2 (available at www.jneurosci.org as supplemental material) [for additional details, see Zahn et al. (1997) and Zahn (2003)].
To quantify the responses of the left and right model inferior colliculi (ICs) for the first and the second signal independently, we used discrete time bins in which spikes were counted. Time bins were of the same duration as the signals. For delays in which the leading and lagging signal were partially overlapping, only nonoverlapping periods of the respective time bins were quantified. Responses to 12 distinct recordings were averaged for each lead–lag delay. We determined the ratios of average responses of the left and right model IC in the respective time bins for all delays. The leading signal was always presented from the left, and the lagging signal was always presented from the right. To create an ideal observer, we introduced a directional sensor that assigned “right” to the lagging sound if the ratio of left-to-right IC response was ≥2, thereby modeling the ability of a listener to localize the lagging sound. For smaller ratios, the sensor assigned “left” to the signal to model a listener's perception of the lagging sound at the location of the leading sound as a result of spatial fusion. Use of this particular threshold provided the best match to the perceptual data.
The psychophysical experiments were designed to determine both the detection and localization ability of a lagging tone burst. Binaural free-field tone bursts with a frequency of 4 kHz (10 ms in duration; 2 ms RFT) were computer generated, digital–analog converted (Fireface 800; RME-Audio, Haimhausen, Germany), amplified (TA-FE 330R; Sony), and broadcast by two speakers [Canton (Weilrod, Germany) Plus XS], located 45° to the left and right of the head of the subject at a distance of 1.5 m in a sound-attenuated, anechoic chamber (2 × 3 × 2.2 m). The tone bursts from the speakers were identical, except that the leading signal was always broadcast from the left speaker. The delays between the left and right speaker were varied from 0.5 to 32 ms in 13 steps with exponential increments. Stimuli were presented at an average level of 80 dB SPL, randomly roving within ± 10 dB.
The experiment was executed in two versions with identical stimuli but with different instructions given to the listeners: in the first version, listeners were instructed to indicate whether a second tone burst with a distinct location was perceived. In the second version, listeners were instructed to indicate whether one or two tone bursts were perceived, independent of location. Whereas the first version leads to an estimate of echo threshold as defined by Blauert (1997) and Litovsky et al. (1999), the second instruction leads to an estimate of lag detectability. The lead–lag delay for each trial was selected at random from the 13 delays. For each delay and experimental version, 33 decisions were obtained.
Ten normal-hearing listeners (two female, eight male; 25–46 years of age) completed both versions of the experiment. Performance was averaged across listeners, and a sigmoid function was fitted to the psychometric function. The 50% values of these fits were taken as threshold.
The signals as they arrived at the ears of the subjects were recorded by two Sennheiser (Old Lyme, CT) K6 capacitor microphones placed directly between the tragus and the antitragus of the subjects' ears. The shadowing of the head and ears created an interaural level disparity of ∼13 dB at 4 kHz. The signals recorded from the microphone in each ear were amplified (Eurorack MX 802A; Behringer, Willich, Germany), digitized (Digi 96/8 PST Sound Card; RME-Audio), and stored as stereo .wav files. Subsequently, the stored files were converted into a MATLAB-readable format, digitally amplified, and fed to the left and right inputs of the model.
In vivo physiology reveals persistent inhibition
We tested 70 neurons in the left DNLL of Mongolian gerbils, of which 70% exhibited PI. All of these neurons responded to monaural tones at the right ear with sustained discharge trains that had durations that match the duration of the tone bursts evoking them. When neurons were stimulated binaurally, the discharges evoked by stimulation of the right ear were progressively suppressed by increasing stimulus intensities at the left ear and thus were sensitive to IIDs. On average, neurons exhibited 50% reduction in spike rate at an IID of 3.3 ± 1.3 (SEM) dB. The average difference between the IID eliciting maximal and the IID eliciting minimal spike counts was 29.8 ± 2 (SEM) dB (n = 41). Because these DNLL neurons were excited by sound from the right ear and inhibited by the left ear, they are termed excitatory/inhibitory (EI).
We evaluated PI in 30 DNLL neurons by driving the cells with a 200 ms tone burst at the BF of the neuron presented to the excitatory ear, while simultaneously presenting a 20 ms BF tone burst to the inhibitory ear, temporally embedded in the long stimulus (Fig. 2A). The intensity of the long, excitatory tone burst was held constant, whereas the intensity of the shorter, inhibitory tone was varied from 20 dB below to 20 dB above the intensity at the excitatory ear (Fig. 2A). With increasing levels of the inhibitory tone burst, a progressive suppression was evident as gaps in the spike trains that elongated with more negative IIDs. In the example neuron (Fig. 2), the maximal duration of spike suppression exceeded the inhibitory stimulus duration by 21 ms (IID = −20 dB) (Fig. 2A, bottom) (i.e., created PI of 21 ms). On average, the maximum duration of PI, derived from the PI duration at the most negative IID tested, was 17.4 ± 1.5 (SEM) ms, ranging between 6 and 38 ms (n = 30).
The above findings demonstrate that binaural sounds that favor the inhibitory ear (negative IIDs) create PI, which suppresses contralateral excitation several milliseconds longer than the duration of the sound. Thus, trailing excitatory sounds (simulating echoes) should be subject to similar suppression if they arrive within the time of PI. We presented a binaural sound (at BF; 10 ms) that favored the inhibitory ear and created PI in the cell followed by two monaural sounds (at BF; 10 ms) only presented to the excitatory ear at different interpulse intervals (IPIs). In Figure 2B1, the initial binaural sound with an IID of −30 dB was followed by two trailing, monaural sounds with a 2 ms IPI (top). Importantly, the PI created by the initial sound completely suppressed the responses to the first trailing sound (the periods for which responses to the first monaural sound were expected are illustrated by the gray shaded areas) and also affected the response to the second trailing sound. As the IPI was lengthened to 10 ms (middle), the second trailing sound evoked robust discharges but not the first. The response to the first trailing sound only recovered when the IPI was 20 ms (bottom). Within our sample of neurons (n = 20), full recovery of the response to the first trailing sound ranged from 5 to 30 ms.
The suppression of responses to the trailing sounds was dependent on the IID of the initial sound (Fig. 2B2). When the IPI was held constant at 10 ms and the IID of the initial sound was positive, the discharges evoked by the trailing sounds were not suppressed. However, if the initial sound had a more negative IID than −10 dB, it generated a PI that suppressed the excitation evoked by the trailing sounds.
In vitro physiology identifies the source of persistent inhibition
The in vivo results show that initial binaural signals with IIDs favoring the inhibitory ear generate PI at the DNLL in gerbils, similarly to bats (Pollak, 1997). The PI in the bat DNLL, moreover, has been shown to be mediated by GABAergic inhibition (Yang and Pollak, 1994b). Because the majority of inhibitory inputs to the DNLL arise from its contralateral counterpart (Glendenning et al., 1981; Shneiderman et al., 1988), the possibility is raised that PI is mediated by the opposite DNLL. However, no prolonged or strongly delayed firing, which could explain the persistent nature of the inhibition, has been observed (Covey, 1993; Bajo et al., 1998; Siveke et al., 2006). We therefore used in vitro whole-cell patch-clamp recordings to test the hypothesis that PI results from properties of the GABAergic transmission in the DNLL.
To imitate excitatory input to DNLL cells in the brain slices, action potentials were elicited via continuous current injections (Fig. 3A). Midway through the current injection, fibers in the commissure of Probst were stimulated with a short train of three pulses at 500 Hz. This procedure elicited apparent suppression of spikes (Fig. 3A, left, asterisk) for 55.6 ± 9.4 ms (n = 11) after the end of the fiber stimulation. However, because of the underlying average firing frequency of the cells of 43.2 ± 10.3 Hz (resulting in interspike intervals of ∼23 ms), the effective spike suppression was ∼32 ms. Application of the GABAA receptor blocker 2-(3-carboxyl)-3-amino-6-(4-methoxyphenyl)-pyridazinium bromide (SR95531) (Hamann et al., 1988) eliminated this PI (n = 5), indicating that it was caused by the GABAergic projections of the commissure of Probst (Fig. 3A, right). We next tested effects of PI on simulated sound-evoked phase-locked excitation. To do so, 0.5-ms-long current injections presented every 10 ms (100 Hz) were applied to the soma of principal DNLL neurons. Each of these simulated excitatory input trains was adjusted to elicit one action potential per pulse (Fig. 3B). Fiber stimulation of the commissure (three pulses at 500 Hz; 6 ms duration, as in Fig. 3A) elicited suppression of spikes for the following two current injections. Hence, PI lasted for at least 14 ms after cessation of fiber stimulation. On average, PI was found to last for 19.4 ± 3.2 ms at 90% recovery (n = 6) (Fig. 3C).
If PI is produced synaptically by GABAergic inhibition, the kinetics of postsynaptic GABA currents would resemble the time course of PI on a cellular level. We assessed the kinetics of GABAergic IPSC elicited by the same three-pulse fiber-stimulation paradigm as above. The IPSC decay of the example depicted in Figure 3D was best described by a double-exponential fit with fast and slow time constants of 12.3 and 43.7 ms, respectively. The average time constants were 12.6 ± 0.5 and 39.5 ± 5.1 ms (n = 5). In the same five cells, blocking GABAA receptors with SR95531 reduced the IPSCs by 96.3 ± 1.7%. Using one instead of three pulses yielded results (11.8 ± 1.1 ms for the slow and 26.6 ± 5.6 ms for the fast time constant; n = 8) that were not significantly different (pfast = 0.599; pslow = 0.143; two-tailed paired t test). Comparing the time course of spike suppression (Fig. 3A–C) and IPSC kinetics (Fig. 3D) suggests that the fast component of the IPSC is the main source of PI. In summary, our in vitro data suggest that GABAergic inhibition provided by the contralateral DNLL through the commissure of Probst is sufficient to explain PI.
Modeling the processing of binaural signals and “echoes”
The DNLL sends inhibitory projections mainly to the contralateral IC, the major integration site in the auditory midbrain. Thus, we next ask what effect PI has on the response of IC cells innervated by the DNLL. To predict responses of such IC cells to leading and trailing sounds, we used an SIM based on dynamic integrate-and-fire neurons, which interacted exclusively through single spike events (for details, see Materials and Methods). The model simulated the frequency decomposition and neural coding of sound stimuli into 16 tonotopically organized auditory nerve fibers for the left and right cochlea–ganglion complex separately. A tonotopic organization into 16 frequency channels (2–8 kHz) was maintained for all nuclei of the model, illustrated as horizontal block lines in Figure 4A. Sound-evoked spikes at both anteroventral cochlear nuclei (AVCNs) were transferred to LSO, DNLL, and IC via the known excitatory and inhibitory connections (Figs. 1, 4A). The model was restricted to a specific population of IC neurons, as described by pharmacological studies (Li and Kelly, 1992; Burger and Pollak, 2001): these IC cells receive an excitatory projection from the contralateral AVCN in combination with an inhibitory projection from the contralateral DNLL. This interaction creates EI properties, and hence IID sensitivity, in the IC de novo. PI was incorporated as an intrinsic feature of the DNLL–DNLL interaction via a slowly decaying hyperpolarization with a time constant of 12 ms, as determined in vitro. Detailed information about all model parameters can be found in supplemental Tables 1 and 2 (available at www.jneurosci.org as supplemental material). It is important to note that the model output was extremely robust to parameter variations. For instance, synaptic strength of the DNLL–DNLL inhibition may range between 0.1 and 0.5 to yield similar IC responses. Figure 4A shows the spike trains generated in each frequency channel of each nucleus up to the IC for a single digitally created binaural tone burst of 4 kHz favoring the right ear. Note that because of the inhibition from the left DNLL, the right IC does not respond to the sound (red shaded area), despite the monaural excitatory input from the left AVCN.
In Figure 4B, the model responses of DNLL and IC cells to two 10 ms binaural tone bursts separated by a delay of 16 ms (resulting in an IPI of 6 ms) are shown. The tone bursts were acoustic stimuli (10 ms; 4 kHz) recorded with probe microphones inserted into the ear canals of a human subject while the subject was performing a typical echo-detection task (see below). The first binaural tone burst had an IID of ∼13 dB favoring the left ear, whereas the IID of the trailing sound favored the right ear (∼13 dB). Recall that IIDs that favor the right ear suppress responses in the right IC (Fig. 4A). In the modeled responses shown in Figure 4B, the first binaural sound favoring the left ear generated PI in the left DNLL. The trailing binaural sound had an IID that favored the right ear, and hence, should evoke a response in the left DNLL. However, because of the PI generated by the first sound, the left DNLL failed to respond to the trailing sound (blue shaded area). Therefore, it failed to inhibit the right IC, as it would without the preceding sound. As a consequence, the right IC responded to both the first and the trailing sound (red shaded area), although the second sound evoked no activity in the right IC when presented alone (compare Fig. 4A).
Thus, whether or not the IC responded to the trailing sound depended on its recent history, which was determined by the IID of the initial sound and its temporal separation from the trailing sound. These features are consistent with a previous study that determined responses to initial and trailing sounds in EI cells that are innervated by the DNLL in the IC of bats (Burger and Pollak, 2001). We used the model to investigate how the IC response patterns changed when PI was not incorporated as an intrinsic feature of the DNLL–DNLL interaction. The model predicts that the activation of the right IC in response to the second tone burst was dependent on PI in the left DNLL (Fig. 4C). Omitting PI from the model system enabled the left DNLL to respond to the trailing tone burst (blue shaded area). This activity in the left DNLL suppressed all responses to the trailing tone burst in the right IC (red shaded area).
We next tested different lead–lag combinations of delays between 0.5 and 32 ms (the left speaker always led) and quantified the IC responses in 10 ms bins. Response bins in the left IC to the leading and trailing tone bursts were termed L1 and L2, and responses in the right IC were termed R1 and R2. This analysis was performed with (+) and without (−) PI implemented in the model circuit. Figure 4D displays the averaged number of spikes in the left and right IC in response to the first tone burst for both conditions. A total alignment of the values for L1+ and L1− as well as for R1+ and R1− shows that the responses of both ICs to the first stimulus is unaffected by the exclusion of PI from the circuit. However, analysis of the response to the second tone burst revealed the significance of PI for the response formation in the IC (Fig. 4E). Although the left IC response to the trailing sound was similar for both model conditions (L2+ and L2−), there were substantial differences in the responses of the right IC to this signal with and without PI in the DNLL (R2+ and R2−). Because of PI in the left DNLL, the right IC was deprived of its inhibition by the left DNLL, enabling the trailing tone burst to elicit a substantial number of spikes in the right IC (R2+). If PI was removed from the model circuitry (R2−), no responses were elicited in the right IC. Importantly, the responses of neurons in the right IC to a sound favoring the right ear were not observed when presented alone (compare Fig. 4A), but only when a sound from the left preceded the sound from the right. Together, the model predicts that during PI in the DNLL, the contralateral IC displays an enhancement of responsiveness to trailing sounds that were not excitatory if presented alone.
We tested whether such response enhancement, which depends on the spatiotemporal stimulus configuration, would be a sufficient cue for higher-order centers to achieve suppression of directional information carried by trailing sounds based on IC rate code evaluation. Because we were particularly interested in the ability to localize a trailing signal (or the lack thereof), we calculated the ratio of responses to the trailing signal in the left IC to the responses to the trailing signal in the right IC (L2/R2) (Fig. 4F). Without PI (black line), the ratios changed even for very small delays from values >4 to ∼1. When PI was present (red line), the ratio was <2 for all delays smaller than 10 ms. Thus, because of PI, the right IC response was >50% of the response observed in the left IC at small delays, although no response would be seen to the trailing sound in the right IC if presented alone.
We used the ratio data to introduce an ideal observer to the model. To do so, we established a localization threshold based on the ratios of responses in the two ICs to define localization capabilities. We chose a threshold of 2, meaning that for ratios of >2 (response in left IC at least twice as high as in right IC), the trailing sound was localized on the right side by the observer. For ratios <2, the trailing sound was not localized but fused to the location of the leading sound on the left side (Fig. 4G). Clearly, Figure 4G shows that when PI was present in the model (red line), the trailing sound was not localized independently for delays ≤8 ms but fused to the location of the leading sound. At delays >8 ms, however, the ratio was >2, enabling the ideal observer to localize the second sound. This continuity and unambiguity in the analysis was only seen when PI was included. Without PI, the ideal observer failed to identify the trailing sound as an echo and to suppress the directional information, even at very small delays. Thus, the ideal observer was able to exhibit suppression-like behavior of directional information of lagging sounds based on the response ratio between the ICs, which in turn was dependent on PI in the DNLL. It is noteworthy that regardless of the ratio threshold, such behavior of the observer was not achievable without PI.
Determination of echo threshold and lag-detection threshold of human subjects
Finally, we tested whether the time course of the suppression of directional information in lagging sounds of the ideal observer corresponds to human perception. Human listeners (n = 10) performed a perceptional free-field echo-threshold test, hearing the same sounds that were presented to the model. A leading sound was presented from a speaker 45° to the left from midline followed by a sound with varying delays between 0.5 and 32 ms from a speaker 45° to the right. The subjects had to indicate whether a second tone burst with a distinct location was perceived. The results of this perceptual test are depicted in Figure 4H (blue line). In most of the trials, subjects were not able to independently localize the trailing sound when it was delayed by 8 ms or less. In contrast, for delays of >16 ms, two sounds with distinct locations were perceived in most of the trails. The echo threshold of the 10 subjects for perceiving two tones with distinct locations (50% criterion) was a lead–lag delay of 12.3 ms. Evidently, the time course of suppression of directional information of lagging sounds was similar in ideal observer and human subjects [Fig. 4, compare G (red line), H (blue dashed line)].
The model also predicts that at lead–lag delays shorter than echo threshold, information about the presence of the lag is still existent at the level of the IC. Hence, the model predicts that human listeners should be able to detect lagging tones at delays shorter than the echo threshold. We tested this prediction by performing the same experiment as we did for echo-threshold determination but with different instructions given to the subjects: now, subjects had to indicate whether one or two tones were perceived, regardless of the ability to resolve their location. The average results of this lag-detection test are depicted in Figure 4H (green dashed line and symbols). The lag-detection threshold of the 10 subjects (50% criterion) was a lead–lag delay of 7.5 ms. Thus, on average, listeners were able to detect the presence of a lagging tone at delays almost 5 ms shorter than required for localizing the lagging tone (Fig. 4H, compare green and blue dashed lines).
The DNLL fulfills the three criteria laid down in the introduction that characterize a circuit sufficiently to explain the context-dependent phenomenon of the precedence effect. First, as in other mammals, it is part of the binaural pathway and contains many EI cells (Brugge et al., 1970; Covey, 1993; Markovitz and Pollak, 1993; Kelly et al., 1998). Second, many EI cells in the gerbil DNLL show persistent inhibition evoked by binaural signals that favor the ear ipsilateral to the DNLL. This causes the suppression of lagging sounds that would normally evoke discharges. We suggest that PI is a feature of the auditory system in all mammals, because it now has been described in rodents (this study) as well as in bats (Yang and Pollak, 1994a; Burger and Pollak, 2001). Our in vitro recordings infer that PI is a feature of the GABAergic transmission of the commissure of Probst and, for the first time, imply a cellular basis for PI. Fiber stimulation persistently inhibited action potential generation by pulsed current injections for ∼20 ms, similar to our observations under in vivo conditions (17 ms). Additional work has to identify presynaptic or postsynaptic cellular properties as the underlying source of PI. Third, implementing the DNLL circuitry (including its target cells in the IC) and the intrinsic features of the GABAergic inhibition into a model revealed IC response properties to lagging sounds that correspond to features of the precedence effect derived from human psychophysical studies.
The dynamic integrate-and-fire model accurately simulated the discharge patterns of neurons in each nucleus found in in vivo recordings, including the responses in the IC to trailing signals, which were virtually the same as those found in the IC of bats (Burger and Pollak, 2001). Moreover, the IC population response was sufficient to produce precedence-like precepts in an ideal observer similar to the percepts of human subjects when presented with the same stimuli. The model of the IC population response is oversimplified in that it only includes the inputs of one functional circuitry, and certainly the information that the IC presents to higher nuclei is far more complex than simulated (Aitkin, 1986). Nevertheless, the results provide strong evidence for the role of the DNLL in echo suppression and show the high potential of the model for predicting responses of auditory brainstem structures to all kind of complex stimuli.
The psychophysical results demonstrate that the stimuli used in the in vivo experiments do evoke precedence in human listeners. A direct comparison of lag-detection threshold and echo threshold with the exact same stimulation and listeners further shows that listeners can detect the presence of a lagging sound at considerably shorter delays than they can localize the lagging sound separately from the lead. These psychophysical findings are well reflected by the response behavior of our model, because IC responses to the lagging sound are present also at delays smaller than the ideal observer's echo threshold (compare Fig. 4E). Hence, the distinctive feature of our model is that information about a lagging sound is not suppressed at the level of the IC but rather echoes are identified as such by additional activity of a subpopulation of IC neurons.
Previous studies concerned with physiological mechanisms of the precedence effect have been mainly conducted in the IC of rabbits and cats (Yin, 1994; Fitzpatrick et al., 1995, 1999; Litovsky and Delgutte, 2002; Tollin et al., 2004). In those studies, the response to the lagging sound was suppressed by the leading sound, and the studies focused on the interstimulus delay at which the lag responses recovered. The assumption in those studies is that the suppression of the response to the trailing sound corresponds to a change in the coding of the location of that sound source. The suppression of responsiveness to trailing sounds as a result of the presentation of a leading sound is exactly the opposite effect from that found in the bat IC and from the modeled IC responses we obtained in this study. Here, we showed an enhancement of the responsiveness to trailing sounds in IC neurons as a result of DNLL PI evoked by a leading sound. We suggest that these differences in IC responses evoked by similar stimulus configurations were a consequence of recordings from different types of IC cells. For instance, we focused on high-frequency EI cells, whereas the majority of cells reported in previous studies were tuned to low frequencies and were most likely EE cells. This is significant because the circuits that create the binaural properties of EE cells are different from those that create EI cells. One recent study from behaving cats showed suppression of responses to trailing sounds in high-frequency IC cells (Tollin et al., 2004). However, these cells responded briskly to sounds presented from the ipsilateral and contralateral side and therefore were not EI and not processed by the IID circuitry. This difference in the processing of high- and low-frequency stimuli is also noteworthy in regard to psychophysical experiments. We conducted our tests using 4 kHz tones, which forced the subjects to use IID cues processed by EI neurons of the LSO. In contrast, many previous studies used broadband signals (cf. Blauert, 1997), in which substantial energy is present in low-frequency bands, and therefore spatial processing is dominated by the EE circuitry via the medial superior olive.
Our interpretation of these results is that correlates of precedence at the IC are a consequence of multiple processes. The periods of suppression of responses to trailing sounds in some types of IC cells is one feature that contributes to precedence (Yin, 1994; Fitzpatrick et al., 1995, 1999; Litovsky and Delgutte, 2002; Tollin et al., 2004). However, such a mechanism alone cannot account for the fact that the precedence effect is facultative. For example, in humans, precedence breaks down when the lead–lag arrangement is switched [Clifton effect (Clifton, 1987)]. In bats, echo suppression occurs while the animal is passively listening to communication calls (Schuchmann and Wiegrebe, 2005), but not while bats are actively echo locating (Schuchmann et al., 2006).
Our model suggests an alternative or additional mechanism that is compatible with the context-dependent nature of the precedence effect. The model shows how PI in the DNLL could change the coding of directional information conveyed by lagging sounds in IC cells whose EI properties are formed or shaped by DNLL projections. The circuitry is reconfigured by the initial sound such that the neurons respond to lagging sounds from spatial positions that would not elicit responses to single sounds. Hence, compared with leading sounds, the IC population response to a trailing sound contains an additional subgroup of firing neurons. Higher centers should be able to interpret this additional firing as a tag indicating an echo. Because other parallel pathways still convey the binaural information, the system thereby can weight spatial information in a context-dependent manner.
This work was supported by The Max Planck Society (B.G.), the German Research Foundation (Grants Gr1205/12-1 and GRK 1091), and the Bernstein Center for Computational Neuroscience. We thank R. Michael Burger and H. von Gersdorff for helpful discussions and comments on this manuscript. We also thank Hamish Meffin for his assistance on the quantification of the model readout as well as for helpful comments.
- Correspondence should be addressed to Prof. Dr. Benedikt Grothe, Department of Biology II, Biocenter, Ludwig-Maximilians-University of Munich, Grosshaderner Strasse 2, D-82152 Martinsried, Germany.