To localize sounds in space, humans heavily depend on minute interaural time differences (ITDs) generated by path-length differences to the two ears. Physiological studies of ITD sensitivity have mostly used deterministic, periodic sounds, in which either the waveform fine structure or a sinusoidal envelope is delayed interaurally. For natural broadband stimuli, however, auditory frequency selectivity causes individual channels to have their own envelopes; the temporal code in these channels is thus a mixture of fine structure and envelope. This study introduces a method to disentangle the contributions of fine structure and envelope in both binaural and monaural responses to broadband noise. In the inferior colliculus (IC) of the cat, a population of neurons was found in which envelope fluctuations dominate ITD sensitivity. This population extends over a surprisingly wide range of frequencies, including low frequencies for which fine-structure information is also available. A comparison with the auditory nerve suggests that an elaboration of envelope coding occurs between the nerve and the IC. These results suggest that internally generated envelopes play a more important role in binaural hearing than is commonly thought.
- sound localization
- coincidence detection
- inferior colliculus
- auditory nerve
- temporal coding
Humans have an exquisite ability to compare temporal information in the waveforms of sounds to the two ears. Two basic forms of interaural temporal sensitivity have been identified psychophysically: to the detailed time waveform or “fine structure” of low-frequency sounds and to the amplitude fluctuations or “envelope” of high-frequency sounds (Strutt, 1907; Zwislocki and Feldman, 1956; Henning, 1974; Nuetzel and Hafter, 1976; Bernstein and Trahiotis, 1994). Ample evidence documents two corresponding physiological forms of interaural time difference (ITD) sensitivity to fine structure (Rose et al., 1966; Goldberg and Brown, 1969; Moiseff and Konishi, 1981; Yin and Chan, 1990) and to envelopes (Yin et al., 1984; Batra et al., 1989; Joris and Yin, 1995).
Natural sounds such as speech generally span a wide range of frequencies and provide both fine-structure and envelope cues. However, it is unknown how the two forms of ITD sensitivity (to fine structure and to envelopes) physiologically interact in response to wideband sounds. Previous studies of ITD sensitivity to noise focused on low frequencies and did not specifically examine the influence of envelopes on the responses. On the other hand, studies of ITD sensitivity to envelopes have only used high-frequency amplitude-modulated tones. I systematically studied the responses of cells in the inferior colliculus (IC) to ITDs of broadband noise, using a paradigm that allowed disambiguation of sensitivity to fine-structure and envelope cues. The motivation was the psychophysical observation that (1) interaction between fine-structure and envelope cues occurs at surprisingly low frequencies (Bernstein and Trahiotis, 1996) and (2) wideband high-frequency stimuli can mediate stronger effects on laterality than the envelopes of modulated tones (Trahiotis and Bernstein, 1986).
Materials and Methods
Single-unit recordings from the IC were pooled from 18 pentobarbital-anesthetized cats, of which 17 were histologically processed to confirm the site of recording to the central nucleus. All procedures were approved by the University of Wisconsin Animal Care Committee and the K. U. Leuven Ethics Committee for Animal Experiments and were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals. Anesthesia was induced with a 1:3 mixture of acepromazine and ketamine and maintained for surgical preparation and recording with pentobarbital. The animals were placed on a heating pad in a double-walled sound-attenuated chamber (Industrial Acoustics Company, Niederkrüchten, Germany). The bullae were vented with a polyethylene tube. The IC was exposed anterior to the tentorium. Single units were isolated with glass-insulated tungsten electrodes. Sound stimuli were delivered dichotically with dynamic speakers (Supertweeter; Radio Shack, Fort Worth, TX) coupled to ear bars that were tightly inserted into the cut ear canals. The stimuli were generated digitally with custom-built (Rhode, 1976) or commercial hardware (Tucker-Davis Technologies, Alachua, FL) and were compensated for the acoustic transfer function measured with a probe tube near the eardrum and a 12.7 mm condensor microphone (Brüel & Kjær, Nærum, Denmark). The neural signal was amplified, filtered, timed (1 μsec resolution), and displayed using standard techniques.
Characteristic frequency (CF) (frequency of lowest threshold) was determined with a threshold tracking algorithm to contra and/or binaural stimulation. Pseudorandom noise bursts (lower cutoff, 100 Hz; upper cutoff between 4 and 32 kHz, chosen to be well above CF) were presented (duration/repetition interval × number of presentations: 1/1.5 sec × 10 or 20, or 5/6 sec × 3) at an average suprathreshold level of 30 dB. Independently generated noise tokens (e.g., A and B) were presented in several pairwise combinations of the original and inverted waveforms (e.g., A/B, A/-A, B/-B, etc.).
In three cats, these same noise stimuli (5/6 sec × 10 or 20) were delivered monaurally while recording from the auditory nerve. Micropipettes (3 m KCl) were inserted under visual control into the nerve trunk, exposed through a posterior fossa craniotomy. Correlograms were constructed with bin widths of 50 μsec and normalized to the number of permutations.
Polarity-tolerant noise delay functions
When a pair of perfectly correlated, i.e., identical, broadband noise stimuli (shorthand, A/A) is played to the two ears and ITD is systematically varied, the firing rate of many neurons in the midbrain shows sensitivity to ITD (Geisler et al., 1969; Yin et al., 1986; McAlpine et al., 1996). The noise-delay curve in Figure 1A illustrates the classical description of such sensitivity. The firing rate shows an oscillatory dependence on ITD. This pattern has been interpreted as the output of a coincidence detector operating on afferent signals that have undergone bandpass filtering in the cochlea (Yin et al., 1986). In support of that interpretation, presentation of anticorrelated noise pairs (A/-A) to the two ears, obtained by inversion of the noise waveform in one ear, results in a noise-delay curve that is still oscillatory but is inverted compared with the response to correlated noise (Yin et al., 1987). Uncorrelated noise pairs (A/B) evoke a response that is independent of ITD.
I systematically obtained noise-delay curves to correlated and anticorrelated noise pairs and found cells in the IC that were ITD sensitive but with a pattern that differed strongly from the classical pattern (Fig. 1C). In these cells, the noise-delay function to a correlated noise pair often showed a single peak rather than an oscillation as a function of ITD, and this pattern did not invert in response to the anticorrelated noise pair. Because the shape of such noise-delay functions shows little dependence on stimulus polarity, I call them “polarity tolerant.”
Many neurons showed a mixed pattern (Fig. 1B) in which an oscillatory component was present, which inverted with inversion of the stimulus to one ear, as well as a polarity-tolerant component. To accentuate these differences, Figure 1 shows the difference (row 2, DIFF) and sum (row 3, SUM) of the noise-delay functions to correlated and anticorrelated noise. A perfect inversion with changing stimulus polarity would result in a sum that is constant with ITD, whereas independence of polarity would result in a constant difference: these conditions are approached by the responses illustrated in columns A and C, respectively. The responses in column B show both an antiphasic oscillatory component as well as a common mound of activity.
Systematic dependence on CF
A simple explanation for polarity-tolerant behavior is envelope ITD sensitivity because the envelope of sound waveforms is independent of their polarity (phase shift rule; Hartmann, 1997). Neurons in the central and peripheral auditory system are specialized to transmit temporal features of acoustic signals in the form of phase locking, i.e., the timing of their action potentials is synchronized to the acoustic waveform. Besides temporal information on fine structure, present at frequencies up to 4–5 kHz in the cat (Rose et al., 1967; Johnson, 1980; Joris et al., 1994), auditory neurons also carry temporal information related to fluctuations in the envelope of the acoustic waveform, as modified by cochlear filtering and various nonlinear processes. Envelope phase locking is present at all carrier frequencies but is transmitted with higher gain and wider bandwidth in cells tuned to high frequencies (Palmer, 1982; Joris and Yin, 1992, 1998). If polarity tolerance reflects a cross-correlation-type operation on envelope signals, this response pattern should predominate in neurons tuned to frequencies at which phase locking to fine structure declines.
To obtain a simple metric for the tendency of cells to have similar or inverted responses to correlated and anticorrelated noise pairs, the Pearson product correlation coefficient between these responses was calculated for all ITDs within a range of ±1000 μsec, for 171 cells. In 85 cells, responses to A/A and A/-A showed an inverse relationship, indicating a classical pattern as in Figure 1 A (i.e., the response to the A/A pair is high when that to the A/-A pair is low and vice versa), and the Pearson product correlation coefficient was significant and negative. In 49 polarity-tolerant cells, the responses to the two conditions tended to covary, and the correlation coefficient was significant and positive. Most of these responses showed a single peak as in Figure 1C, but some showed a trough (n = 5) or a more complex pattern (n = 6). Finally, in 37 cells with a mixed pattern (Fig. 1 B), there was no systematic relationship because of the opposing tendencies, and the correlation coefficient was not significant. Figure 2 A shows the correlation coefficient as a function of the CF of the cells. Cells with the classical form of ITD sensitivity tended to have a low CF, whereas those with the polarity-tolerant pattern tended to have a high CF. However, both types of ITD sensitivity were observed over a common range of CFs. Moreover, in this common range, many cells showed the mixed pattern.
Polarity-tolerant noise-delay functions have not been described before, probably because previous studies of ITD sensitivity to noise focused on cells tuned to low frequencies (Yin et al., 1986; McAlpine et al., 2001) and because the response to anticor-related noise was not systematically collected so that the effect of fine structure versus envelope could not be disambiguated.
A possible source of polarity-tolerant tuning is onset time difference: the gating window is an envelope feature that is always present in the stimuli. However, such onset differences are also present in uncorrelated noise pairs, which did not result in ITD tuning (Fig. 1). To exclude the possibility that the ITD sensitivity in high-CF neurons was somehow attributable to the low-frequency acoustic energy in the stimulus, I studied 10 cells with the noise energy below CF removed. All cells remained ITD tuned with the same polarity-tolerant pattern. Thus, high-frequency energy is necessary and sufficient for polarity-tolerant ITD sensitivity, which argues strongly in favor of the hypothesis that temporal envelope patterns underlie this type of ITD sensitivity. However, an envelope was not imposed on the broadband noise stimulus used here, so a remaining question is the origin of the temporal patterns.
Cochlear origin of envelopes
It is well known in signal analysis that bandpass filtering of wideband noise imposes envelope fluctuations (Fig. 3A) at a rate that reflects the bandwidth of the bandpass filter (Rice, 1954). Likewise, broadband noise is bandpass filtered peripherally in the cochlea, so that the effective stimulus transmitted to the CNS and to the binaural coincidence detectors in the brainstem contains a temporal envelope. This envelope has not been characterized physiologically; therefore, a method was sought to quantify the “effective stimulus” in a way that affords straightforward comparison with noise-delay curves.
Autocorrelation of spike patterns of low-CF auditory nerve fibers in response to broadband noise reveals periodicities imposed by cochlear filtering (Ruggero, 1973). In that analysis, all-order interspike intervals are compiled for each stimulus presentation and averaged for all spike trains. The same analysis applied to fibers tuned to high frequencies fails to reveal any temporal structure, except a trough near zero that is attributable to the refractory period. To avoid this trough, I calculated autocorrelation functions by tallying intervals across spike trains (Fig. 3B): this technique of shuffling is classically used to reveal stimulus-locked time structure in cross-correlation studies (Perkel et al., 1967). Thus, in a shuffled autocorrelation function, all intervals are tallied across spike trains to a different presentation of an identical stimulus (e.g., noise token A). Similarly, cross-stimulus correlation functions are constructed by tallying all intervals across spike trains evoked by different stimuli (e.g., to anticorrelated noise tokens A and -A or to uncorrelated tokens A and B). In the remainder of the text, the term “correlogram” is used as a shorthand for “correlation function.”
It is important to note that the auditory nerve correlation analysis predicts the output of the simplest conceivable coincidence detector. Each pair of spike trains being compared can be thought of as providing left and right input to a binaural coincidence detector, which counts spikes coincident within a rectangular integration window (equal to the bin size used in the correlation computation). Moreover, the inputs are exactly equal in all properties (because they are in fact derived from the same cell). The process of tallying interspike intervals is completely equivalent to counting coincidences in spike trains at varying delays. Thus, correlograms provide a natural way to compare monaural temporal properties with noise-delay functions of real binaural cells.
Figure 1 (bottom) shows superimposed correlograms for correlated, anticorrelated, and uncorrelated noise tokens for three nerve fibers. The similarity of these patterns to the classical, mixed, and polarity-tolerant patterns obtained in the IC (Fig. 1, top) is obvious. The distribution of these patterns as a function of CF was studied in 76 nerve fibers using the same quantification as used on IC responses and yielded a similar sigmoidal scatter diagram (Fig. 2 B), with inverting correlograms at low CFs, polarity-tolerant correlograms at high CFs, and a transition region in which both patterns as well as mixed patterns are found.
Elaboration of envelope coding between nerve and IC
Clearly, the temporal patterns in the auditory nerve provide a possible basis for the polarity-tolerant noise-delay curves in the IC, but there are also several differences. Compared with the IC (Fig. 2B, solid lines), the transition region in the auditory nerve is transposed upward in frequency, by ∼ 1 kHz, and is less dispersed. This probably reflects the reduced upper-frequency limit on phase locking found in second-order neurons projecting to the binaural coincidence detectors, when compared with their auditory nerve inputs (Joris et al., 1994), and possibly additional reductions in the upper limit of phase locking at the next integration stages in the medial superior olive (MSO) and IC. A second difference is the existence of an upper frequency limit in the IC, but not in the nerve, to the existence of polarity-tolerant patterns. Possibly the small representation of high frequencies in the MSO accounts for the absence of ITD sensitivity in cells with CF >6.1 kHz.
A third difference is that polarity-tolerant patterns in the nerve appear wider and shallower than those in the IC (Fig. 1, compare C, F). Two measures of tuning were obtained for all polarity-tolerant responses (neurons with significant positive correlation in Fig. 2) in the nerve and IC. Response modulation, defined as (maximal response - minimal response)/(maximal response), quantifies the degree to which the response is modulated by changes in delay, and the width at half-height [(maximal response - minimal response)/2] quantifies the sharpness of tuning. To remove any influence of fine structure on these measures, they were taken from the sum of responses to correlated and anticorrelated stimuli (compare with Fig. 1, SUM). Correlograms in the auditory nerve were clearly less modulated (median, 0.37; n = 65) than IC noise-delay functions (median, 0.87; n = 41) (Mann—Whitney U test; U = 99; p << 0.001) (Fig. 4A). Tuning width was inversely related to CF (Fig. 4B), as expected from the increase in the bandwidth of frequency tuning with CF (Kiang et al., 1965; Evans, 1972; Rhode and Smith, 1985). Because of the shift in transition region between nerve and IC (Fig. 2 B), there is only a narrow region (dashed vertical lines) over which comparisons can be made: over this region, the IC neurons are more narrowly tuned (median half-width, 630 μsec) than the nerve (median, 970 μsec) (U = 72; p << 0.001). Thus, although the temporal patterns in the nerve provide the basis for polarity tolerance, there is clearly an elaboration of envelope coding and ITD sensitivity at later stages.
Most psychophysical studies on human envelope ITD sensitivity use stimuli restricted to frequencies above the range of phase locking. Such stimuli generally result in weak lateralization, from which it is concluded that envelope ITDs are a subordinate cue. The results presented here show that ITD sensitivity in the IC of the cat is dominated by envelope features, generated by bandpass filtering in the cochlea, for a large fraction of cells extending over a wide range of CFs (1–6 kHz). This range includes approximately two octaves of the “phase-locking range,” which extends up to 4–5 kHz in the cat (Johnson, 1980). The results suggest that internally generated envelopes play a more important role in human lateralization than is commonly thought but over a frequency range that differs from where it is usually sought.
The human upper limit of behavioral sensitivity to fine structure, measured with tones, is ∼1.3 kHz (Zwislocki and Feldman, 1956), approximately one octave lower than in cats (2.8 kHz) (Jackson et al., 1996). Psychophysical studies of the relative weights of different sound localization cues often assess “envelope ITD sensitivity” by restricting stimulus energy to frequencies above the phase-locking range (Wightman and Kistler, 1992; Levine et al., 1993; Macpherson and Middlebrooks, 2002). Such stimuli usually indicate that the envelope ITD cue is weak, perhaps because the neural machinery to analyze the cue in that frequency range is limited, as suggested by the data presented here. However, the converse procedure, of restricting stimulus energy to the phase-locking range, does not remove envelope ITDs. Indeed, whereas the upper limit at which humans can detect fine structure is ∼ 1.3 kHz, the transition of binaural performance based on fine structure to that based on envelope starts at a much lower frequency and can be modeled assuming a synchronization low-pass filter with a cutoff frequency of ∼ 425 Hz (Bernstein and Trahiotis, 1996).
In conclusion, the upper limit on phase locking in the peripheral auditory system does not coincide with the limit at which lateralization and ITD sensitivity make a transition from being based on fine structure to being based on envelopes. It is interesting to observe that envelope information begins to dominate binaural performance near but below the frequency at which ITDs based on fine structure become an ambiguous cue attributable to spatial aliasing [∼800 Hz for humans (Blauert, 1983) and 1.4 kHz for the cat, depending on the subject's interaural distance].
This work was supported by National Institutes of Health/National Institute on Deafness and Other Communication Disorders Grant DC00116, Fund for Scientific Research (Flanders) Grants G.0297.98 and G.0083.02, and Research Fund Katholieke Universiteit Leuven Grant OT/01/42. Thanks to B. Delgutte, A. Recio, D. Tollin, M. van der Heijden, and T. C. T. Yin for comments and to T. C. T. Yin and Yin-Inn for support.
Correspondence should be addressed to Philip X. Joris, Laboratory of Auditory Neurophysiology, Campus Gasthuisberg O&N, K. U. Leuven, B-3000 Leuven, Belgium. E-mail:.
Copyright © 2003 Society for Neuroscience 0270-6474/03/236345-06$15.00/0