Our understanding of cochlear mechanics is impeded by the lack of truly panoramic data. Sensitive mechanical measurements cover only a narrow cochlear region, mostly in the base. The global spatiotemporal pattern of vibrations along the cochlea cannot be inferred from such local measurements but is often extrapolated beyond the measurement spot under the assumption of scaling invariance. Auditory nerve responses give an alternative window on the entire cochlea, but traditional techniques do not allow recovery of the effective vibration pattern. We developed a new analysis technique to measure cochlear amplitude and phase transfer of fibers with characteristic frequencies <5 kHz. Data from six cats yielded panoramic phase profiles along the apex of the cochlea for an ∼5 octave range of stimulus frequencies. All profiles accumulated systematic phase lags from base to apex. Phase accumulation was not gradual but showed a two-segment character: a steep segment (slow propagation) around the characteristic position of the stimulus, and a shallow segment (fast propagation) basal to it. The transition between the segments occurred in a narrow region and was smooth. Wavelength near characteristic position decreased from ∼3.5 to ∼1 mm for frequencies from 200 to 4000 Hz, corresponding to phase velocities of ∼0.5 to ∼5 m/s. The accumulated phase lag between the eardrum and characteristic position varied from ∼1 cycle at 200 Hz to ∼2.5 cycle at 4 kHz, invalidating scaling invariance. The generic character of our analysis technique and its success in solving the difficult problem of reconstructing the effective sensory input from neural recordings suggest its wider application as a powerful alternative to customary system analysis techniques.
The mammalian cochlea is more than a mechanoelectrical transducer. It maps the frequencies of the audible range onto different portions of its sensory tissue, creating a tonotopical organization that is preserved throughout the auditory pathway. Despite the impressive improvements in measurement techniques achieved over the past decades (for review, see Robles and Ruggero, 2001), our understanding of cochlear mechanisms is still incomplete. Direct measurements are difficult: the cochlea is vulnerable and poorly accessible. A major stumbling block is the scarcity of “panoramic” data, i.e., measurements spanning a sufficiently large portion of the intact cochlea to map out the spatiotemporal patterning evoked by sounds. Cochlear-mechanical measurements on sensitive preparations are restricted to a few isolated spots or a narrow range, which makes them unsuited for panoramic measurements.
Single-unit recordings from the auditory nerve (AN) provide an alternative window on cochlear mechanics. Their major advantage is the unrestricted access to all regions of a single, intact cochlea. For instance, Pfeiffer and Kim (1975) measured the response of a population of AN fibers from a single animal to a small set of single tones, thus obtaining a spatiotemporal overview of cochlear motion that cannot presently be matched by mechanical measurements. Ideally, AN responses should reveal the “effective stimulus,” i.e., the local vibrations stimulating individual sensory cells. Unfortunately, the transduction process distorts this information. Limited temporal resolution prevents the AN from coding the phase of high-frequency (>5 kHz) stimuli (Johnson, 1980b), and the limited dynamic range causes weak components of the effective stimulus to be “masked” by stronger ones. Moreover, the highly nonlinear character of the transduction process prevents a quantitative interpretation of tone responses in terms of cochlear frequency selectivity. These problems restrict the reach of straightforward approaches such as pure-tone stimulation.
An important method to measure cochlear transfer from AN responses is the Wiener kernel approach (for review, see Eggermont, 1993), a black box method using responses to wideband noise. Here we introduce a novel method (“first-order zwuis”) that uses tonal complexes. It is restricted to the phase-locking region, but, unlike our previous study (Van der Heijden and Joris, 2003), with this method, the absolute phase of components can be determined. It yields cochlear amplitude and phase transfer over the entire frequency region that drives a given low-frequency fiber, whereas Wiener kernels yield only a relative narrow band around the characteristic frequency (CF) of a fiber. We measured transfer functions of AN fibers of the cat. In six cases, coverage of CFs was sufficient to reconstruct panoramic phase patterns in the apical region (CF <4 kHz) of single cochleas and to determine wavelength, propagation speed, and dispersion.
The two novel aspects of this study are the use of a tailored stimulus for the isolation of the linear part of the response and the panoramic synthesis of measurements from single cochleas. The generic character of our analysis technique and its success in solving the difficult problem of reconstructing the effective sensory input from neural recordings suggest its wider application as a powerful alternative to customary system analysis techniques.
Materials and Methods
Theory: response of the auditory nerve to tone complexes.
The response of AN fibers to low-frequency tones is phase locked: spikes tend to cluster around a certain phase value of the stimulus cycle. The degree of phase locking is usually quantified by the vector strength R, which is defined in terms of a vector average of phase values (Goldberg and Brown, 1969). Alternatively, R can be viewed as the normalized Fourier component at the signal frequency of the poststimulus histogram (PSTH). Both descriptions are mathematically equivalent (Johnson, 1974). Statistical significance of R is evaluated using the Rayleigh test (Mardia and Jupp, 2000). The decision statistic of the Rayleigh test is NR2, where N is the number of spikes, and a confidence level of 0.001 is typically used as the criterion for significance of phase locking.
When multiple low-frequency tones are presented simultaneously, the response of a single AN fiber usually shows phase locking to several tones at the same time. This “multiple phase locking” can be found in several studies in which the response to harmonic tone complexes was analyzed (Young and Sachs, 1979; Evans, 1981; Horst et al., 1985). One may view such data as a “collection of separate R values,” but the identification of the R values with Fourier components suggests a more natural, spectral interpretation of multiple phase locking. Multiple phase locking simply reflects the fact that the frequencies present in the input (acoustic stimulus) show up as spectral peaks in the output (neural response). Here it is understood that the neural response is adequately represented by a train of standard impulses or, equivalently, by a PSTH with sufficiently narrow bins. That idealization renders the collection of R values mathematically equivalent to the amplitude spectrum of the response.
Figure 1 shows the magnitude spectrum of the PSTH of an AN fiber to a nonharmonic tone complex (details are provided below). Because the same number of spikes enters the computation of each spectral component, Rayleigh significance at a prescribed level (say p = 0.001) corresponds to the same critical R value for each frequency. We used this fact to normalize the spectrum of Figure 1 as follows: 0 dB (dashed line) corresponds to a Rayleigh significance of 0.001. This normalization convention turns the 0 dB line into the noise floor of the spectrum.
The spectral components at the stimulus frequencies are represented by filled circles and solid line in Figure 1. Most of them clearly exceed the noise floor, reflecting significant multiple phase locking. Although the stimulus was an equal-amplitude tone complex, the magnitudes of the spectral components in the PSTH show a bandpass character that is consistent with a CF of the fiber of 410 Hz. Thus, cochlear frequency selectivity is reflected in the PSTH spectrum. Below we will argue that the shape of the spectrum is in fact a faithful representation of the spectrum of the effective stimulus, which enables the assessment of cochlear filter characteristics from PSTH spectra like those of Figure 1.
The spectrum of Figure 1 also shows significant phase locking to nonstimulus components. The dominant type of significant, nonprimary components consists of second-order distortions of the stimulus, which are marked by squares in Figure 1. These second-order distortions occur at frequencies fk ± fl, where fk and fl are any two primary frequencies. Second harmonics (filled squares) at frequencies 2fk form a subset of the second-order distortions. The abundance and strength of second-order distortions is expected from the rectification of the effective stimulus by the inner hair cell and the absence of “negative spike rates” (Johnson, 1974). Third-order distortion components of the type 2fk − fl are marked by triangles in Figure 1. Only a few third-order components reach significance; this is typical of our data.
When studying cochlear transfer properties from PSTH spectra as shown in Figure 1, it is essential to be able to distinguish the linear part of the response (filled circles) from the nonlinear part (squares and triangles). The former primarily reflect the spectrum of the effective stimulus, whereas the latter primarily reflect nonlinear effects of the transduction process. Had we used harmonic tone complexes, in which all primary frequencies are multiples of the fundamental frequency, many distortion products would have coincided with primary stimulus components. Thus, for harmonic stimuli, the linear and nonlinear components of the response cannot generally be disentangled. We solved this problem by using irregularly spaced tone complexes instead of harmonic stimuli. This procedure is illustrated by the 700–900 Hz frequency region of Figure 1. In this spectral region, second-order distortions (squares) dominate over the linear components of the response (circles). Despite the dominance of distortions, the use of a tailored stimulus still allowed us to determine the linear part of the response (circles) in that region. This is achieved by choosing the primary frequencies in such a way that the frequencies of second- and third-order distortion products never coincide with any primary frequencies. We call such tone complexes zwuis stimuli (“zwuis” is a contraction of the Dutch words for “beats” and “noise”), because their properties are virtually identical to the zwuis stimuli introduced in Van der Heijden and Joris (2003) (see below, Stimuli and data collection). Because second- and third-order distortions are the most prominent distortions observed in the AN (Fig. 1), the use of zwuis stimuli affords a reliable isolation of the linear part of the response. The current analysis technique, which is based on the linear part of the response, will be called the “first-order zwuis method” to distinguish it from the zwuis method of our previous study, which is based on the second-order distortions (difference frequencies) of the response.
The nonlinearities of the transduction by inner hair cells produce new spectral components (which, because of the tailored stimulus, never coincide with the linear response components), but the relative strengths of the primary components are essentially unaffected by the transduction. In other words, the spectral shape of the effective stimulus is hardly affected by the nonlinearities of the transducer. The linear part of the response, i.e., the spectrum of the response restricted to the stimulus frequencies, is therefore an accurate representation of the spectrum of the effective stimulus. The mathematical reason for the robustness of the spectral shape of the linear part is the same as that behind the interpretation of reverse correlation (revcor) data: the presence of many input components makes the contribution by each of them small, justifying a linearization (Marmarelis and Marmarelis, 1978; De Boer, 1997; Van der Heijden and Joris, 2003).
We performed a numerical test of the robustness of the linear part of the spectrum against nonlinear transduction. The equal-amplitude zwuis stimulus underlying Figure 1 was passed through a bandpass filter, producing an artificial effective stimulus (AES), which was then “nonlinearly transduced” by a half-wave linear rectifier, followed by a hard clipper set at the root mean square of the AES. The final spectrum differed from that of the AES by the occurrence of second- and third-order distortions, but the shape of the linear part of the final spectrum, when restricted to those components within 30 dB from the peak component, was equal to that of the AES to an accuracy of 1 dB. Different choices of the nonlinear transducer affected the distortions, but the linear part of the spectrum was robust against such variations. Of course, a bandpass filter followed by a memoryless nonlinearity is an oversimplified model of cochlear transduction because it ignores any dynamical and frequency-dependent aspects of mechanical and transducer nonlinearities. On the other hand, the theoretical work by De Boer (1997) justifies the heuristic application of simple yet more complicated models to characterize the response of complex linear systems to stationary wideband stimuli.
We also obtained empirical data that justify the straightforward interpretation of the PSTH spectrum in terms of the spectrum of the effective stimulus. A seven-tone zwuis complex ranging from 470 to 680 Hz was presented, and the spectrum of the PSTH of an AN fiber (CF of 580 Hz) was determined. All tones except the central one at 566 Hz were presented at a constant, equal level of 40 dB sound pressure level (SPL) per component. The intensity of the central component was varied between presentations from 20 to 65 dB SPL. Because our method only measures the relative strengths of response components within the same presentation, a direct comparison of response magnitudes across presentation is meaningless. We therefore used the total power of the surrounding components as a reference for the varied component. That is, for each level of the central component, we determined from the PSTH spectrum the power of the response component at 566 Hz relative to the total power of the surrounding six components.
Figure 2 shows this relative response magnitude as a function of stimulus intensity. The relationship is linear over the whole range over which significant phase locking to all seven components could be obtained. Thus, despite the highly nonlinear effects of rectification by hair cells, saturation of transduction currents, etc., the relative magnitudes of stimulus components are preserved in a linear manner over a 45 dB range! The observed linearity confirms that the restricted PSTH spectra can be viewed as spectra of the effective stimulus before transduction and justifies their interpretation in terms of cochlear transfer functions.
Disregarding for a moment the need for tailored stimuli, the analysis technique is straightforward and familiar: one presents a broadband stimulus with a known spectrum and measures the spectrum of the response. A comparison of the input and output spectra is then used to characterize the system. This type of spectral analysis is routinely used in system identification, and, if a system is linear, the comparison of complex input and output spectra yields an exhaustive description of the input–output relationship of the system (Kay, 1988). The nonlinear character of cochlear mechanics and mechano-neural transduction by the inner hair cells complicate the interpretation of AN spectra, but the use of zwuis stimuli allows one to extend the straightforward spectral analysis of linear systems to nonlinear responses like those of the AN. Limitations of the method are considered in Discussion.
We recorded from several hundred nerve fibers in six cats, using standard techniques described previously (Louage et al., 2004). Animals were placed on a heating pad in a sound-attenuated chamber; surgical preparation and recording were done under pentobarbital anesthesia. Micropipettes (3 m KCl) were inserted under visual control into the nerve trunk, exposed through a posterior fossa craniotomy. All procedures were approved by the K.U. Leuven Ethics Committee for Animal Experiments and were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals.
Stimuli were computed using custom software, run within Matlab (MathWorks, Natick, MA) on a personal computer, and played at a 60 kHz sample rate using a 16 bit analog-to-digital converter (PD1; Tucker-Davis Technologies, Alachua, FL). Programmable analog attenuators (PA5; Tucker-Davis Technologies) were used to control the intensity while maintaining the full 16 bit dynamic range. Sound was delivered over dynamic phones (supertweeter; Radio Shack, Fort Worth, TX) connected to hollow Teflon earpieces, which fit tightly in the transversely cut ear canal. The transfer function of the closed acoustic assembly was obtained via a probe whose tip was placed within 2 mm of the ear drum and that was coupled to a 12.7 mm (¾ inch) condenser microphone and conditioning amplifier (Bruël and Kjær, Nærum, Denmark). All stimuli were compensated for this transfer function, and the stimuli were specified in SPL (decibels relative to 20 μPa). The neural signal was amplified and filtered (300 Hz to 3 kHz) (DAM 80; World Precision Instruments, Sarasota, FL). Spikes were converted to standard rectangular pulses with a custom-built peak detection circuit. These pulses were timestamped to an accuracy of 1 μs (ET1; Tucker-Davis Technologies).
Stimuli and data collection.
The search stimulus was a wideband (50–30,000 Hz) noise, 300 ms long and presented at 70 dB SPL. When encountering an AN fiber, its CF and spontaneous rate were determined using an automated threshold-tracking procedure. The zwuis tone complexes used for the main measurements had a duration of 45 s and were gated using 500 ms cos2 ramps. Before each actual zwuis measurement, the threshold of a fiber to the tone complex was estimated audiovisually from the neural response to short stimulus presentations at varying SPL.
The zwuis stimuli consisted of 7–25 pure tones with an approximately regular frequency spacing. The primary frequencies f1… … fN of a given complex were chosen in such a way that second-order and third-order distortion products never coincided with the primaries. Put mathematically, with the implicit exception of trivialities such as f1 − f2 + f2 = f1. The inequality (Eq. 1b) implies that all differences fk − fl are unique, which is also the defining property of the zwuis stimuli introduced in our previous study (Van der Heijden and Joris, 2003). The current stimuli are therefore a specialization of these zwuis stimuli. The requirements stated in Equations 1a and 1b are met as follows. For a given N, the integer numbers k1… … .kN, defined recursively by have the property that all differences ki − kj are different, provided that M > ¾N2. Given such a sequence ki, the frequencies are readily checked to obey Equations 1a and 1b. The actual range of frequencies spanned by this sequence can be controlled by varying k1, M, and Δ. The spacing of the fi is approximately regular, with the typical distance of ∼5ΔM between adjacent components. The tone complex is periodic with fundamental frequency Δ, which is a small fraction of the primary spacing. An example is Δ = 1 Hz, N = 5, M = 20, and k1 = 40, resulting in the set of frequencies of 201, 306, 416, 531, and 651 Hz. This set is readily checked to obey the requirements laid down in Equations 1a and 1b and has a periodicity of 1 Hz. When using the zwuis stimuli to determine the spectral transfer of a nerve fiber, the parameters were chosen so as to yield a sufficient frequency range (∼5ΔNM), a dense enough spacing of components (∼5ΔM) and sufficiently regular spacing (small M/N). Furthermore, Δ was always chosen commensurate with the sample rate, so that one period of the stimulus contained an integer number of samples.
The components of a single zwuis stimulus either had equal amplitudes or a spectral tilt was imposed (see Results), in which case each next primary was attenuated by an equal amount of decibels. The components were given random phases. This served two purposes. First, it avoids special phase configurations that result in pulse-like waveforms, such as cosine phase. Second and most importantly, it is (in addition to the Rayleigh test) a safeguard against the over-interpretation of spurious spectral components. The phase of spectral components is computed relative to the random stimulus phase; therefore, spurious components in the PSTH spectrum (components that are not a reflection of phase locking to the stimulus components) will show an erratic phase behavior. Conversely, components that do show a systematic phase behavior are unlikely to result from high-order distortions or measurement noise (note that the phase of a distortion product is determined by the phases of the primaries whose interaction produce it). Spectral analysis of the data were performed during the experiments. In some cases, a measurement was repeated multiple times with exactly the same stimulus to lower the noise floor, and subsequent analyses were based on a pooling of spike times across measurements.
Spectra like those in Figure 1 were computed by first compiling a PSTH with a bin width equal to the 16.7 μs sample period of the stimulus. Time values were then wrapped according to the stimulus periodicity Δ, and a discrete Fourier transform was computed. Strictly speaking, the spectra thus obtained are cycle-histogram spectra (with periodicity Δ), not PSTH spectra, but the distinction is irrelevant for the present purposes (note that only those frequencies that are commensurate with the Δ need to be considered). For the measurements reported here, 1 Hz ≥ Δ ≥ 0.25 Hz, and Δ was always commensurate with the sampling rate. In this way, the 45-s-long stimuli always contained at least 10 stimulus periods, and the PSTH spectra did not show any splatter across neighboring (i.e., Δ Hz apart) components (Fig. 1). Phase values reported are expressed relative to the primary phases.
Spectra of AN responses to equal-amplitude tone complexes
Figure 3 shows a collection of amplitude and phase curves, obtained at low SPLs (<35 dB above threshold), that is representative of the different CF regimens within the phase-locking range. A linear frequency scale was used for both the amplitude curves (left column) and the phase curves (right column). Unlike the complete PSTH spectrum of Figure 1, here the display is restricted to stimulus frequencies and is further limited to those components that yield significant phase locking (p = 0.001). The amplitude curves are represented relative to their peak values. Phase curves are shown advanced in time by amounts indicated in the graph; this mode of display removes the steep average slopes indicative of overall group delay (Van der Heijden and Joris, 2003). The advance τadv was generated by adding to each phase value a value τadv × f, where f is the stimulus frequency of the component; τadv was chosen to optimally display dispersive effects in each panel. Before advancing the phase curve, it was unwrapped by minimizing the phase difference across neighboring components. The frequency spacing of the stimuli was always chosen dense enough to render this choice of unwrapping the only one that was consistent with realistic values of group delay (e.g., <20 ms for Fig. 3B). The slanted bars in the phase plots indicate the group delays that correspond to the slopes of the original, uncompensated phase curves.
For the lowest CFs (<200 Hz), amplitude curves have a steep low-frequency flank and a shallow high-frequency flank (Fig. 3A). The shallowness of the upper flank is illustrated by the amplitude ∼600 Hz (almost 2 octaves above the CF of 170 Hz), which is only 10 dB below the peak at CF. The shallow upper flanks of low-CF fibers are in marked contrast with the steep upper flanks of high-CF tuning known from AN tuning curves and cochlear-mechanical data. The companion phase curve (Fig. 3B) has a steep low-frequency segment around CF (group delay of 7 ms) and a shallower segment above CF (group delay of 4.1 ms). The sharp transition between the segments, just above CF (here at ∼220 Hz), is typical of fibers having CFs below 200 Hz. In this frequency region, the phase curve is convex: its slope becomes less steep with increasing frequency. This type of phase behavior, with smaller group delays at higher frequencies, is called “anomalous dispersion” in the physics literature (Elmore and Heald, 1985). Convex phase curves of low-CF nerve fibers were reported previously by Pfeiffer and Molnar (1970) and Van der Heijden and Joris (2003). When CF is somewhat higher (200–500 Hz), the amplitude curves become more symmetrical. This trend is visible when comparing Figure 3C (CF of 310 Hz) to Figure 3A. The phase curve (Fig. 3D) shows the same two-segment character as that of the lowest-CF fibers, but the slopes reflect lower values of group delay (5.7 ms near CF; 3.9 ms above CF). For still higher CFs, the upper flank of the amplitude curves becomes steeper, causing an increasingly symmetric shape when plotted on a linear frequency scale (Fig. 3E) (CF of 620 Hz). The corresponding phase curve (Fig. 3F) retains the two-segment character of the lower-CF curves. The slopes are further reduced (4.5 ms near CF; 3.3 ms above CF), and the difference in slope between the two segments is smaller than for the lower CF fibers.
With increasing CF, the trends of steeper upper flanks in the amplitude curves and of shallower and straighter phase curves continue. In some fibers with CFs of ∼1 kHz, this results in an almost symmetrical amplitude curve (Fig. 3G) (CF of 1050 Hz) and a phase curve that shows little dispersion (Fig 3H) (group delay of 3.5 ms around CF and 3.1 ms above CF). For CF of ∼2 kHz, the amplitude curve typically has a steep high-frequency flank (Fig. 3I) (CF of 2050 Hz). The corresponding phase curve (Fig. 3J) consists of a shallow segment below CF, a steeper segment around CF, and a shallower segment above CF, leading to a sigmoidal shape of the phase curve. The slopes of the segments are 2.1, 2.9, and 2.3 ms, respectively.
For CFs of ∼3 kHz and higher, the upper flank of the amplitude curve becomes increasingly steeper (Fig. 3K) (CF of 3225 Hz); in the phase curve, the shallower part above CF disappears, leaving a concave phase curve (Fig. 3L) reminiscent of phase curves obtained from cochlear-mechanical measurements in the base of the cochlea. For CF >3.5 kHz, the amplitude curves have a reduced dynamic range; their peaks are reduced because of the rapid decline in phase locking above 3 kHz. The resulting amplitude curves are strongly biased toward low frequencies and do not provide a reliable window on cochlear tuning. No individual curves for CFs >3.5 kHz are shown; the CF region of 3–5 kHz, however, is covered by the composite curves presented next.
The pairs of amplitude and phase curves of Figure 3 can be used to compute the impulse response (Van der Heijden and Joris, 2003). This analysis in the temporal domain is not pursued in the present study, but it is worth mentioning that the systematic change from convex to concave phase curves with increasing CF (Fig. 3, right column) is matched by a change in the direction of frequency chirps or “glides” in the impulse responses. At low CFs, the impulse responses show downward glides, whereas high-CF fibers show upward glides. This was first shown by Carney et al. (1999) using revcor data from the AN of the cat.
The data of Figure 3 illustrate a number of features generally found in AN fibers in the range of CFs covered by our techniques. However, these data do by no means provide an exhaustive description of our findings. The most important restriction is the use of relatively low stimulus levels (within 35 dB above threshold). The effects of SPL on tuning, which can be quite pronounced (Evans, 1977; Recio-Spinoso et al., 2005), can well be measured by our techniques but are outside the scope of the present study.
Increasing the spectral range by constructing composite curves
Those components in the effective stimulus (the stimulus after cochlear filtering) that are >20 dB below the dominant components usually fail to evoke significant phase locking; they are masked or overpowered by the stronger components (Fig. 1). Because of the bandpass character of cochlear transfer, the 20 dB dynamic range of AN coding restricts the spectral window of the measured transfer functions to a limited band around CF, as can be clearly seen in Figure 3. This limitation is overcome by the construction of “composite curves” from multiple measurements by using different, partially overlapping, zwuis stimuli (Van der Heijden and Joris, 2003). Figure 4illustrates the construction of a composite curve from three pairs of amplitude and phase curves obtained from a single fiber (CF of 620 Hz). The stimuli were presented at ∼15 dB above threshold.
The amplitude curve covering the frequencies around CF (Fig. 4A, circles) was obtained using an equal-amplitude tone complex. As in Figure 3, the dynamic range was ∼20 dB. The low and high flanks of the amplitude transfer (Fig. 4A, triangles and squares) were obtained in two separate measurements using tilted stimulus spectra. Starting from a flat spectrum, the component closest to CF was attenuated by 20 dB, and each next component was attenuated less by a fixed amount of decibels in such a way that the level of the component most remote from CF was unattenuated. The direction of this spectral tilt of the stimulus spectrum was chosen to oppose the spectral tilt in the response caused by cochlear filtering. The net effect is an equalization of the output spectrum and a subsequent reduction of the masking of weak components by stronger ones. Indeed, the resulting gain curves (Fig. 4A, triangles and squares), which were computed by correcting the response spectrum for the spectral tilt of the stimulus, show an improved dynamical range (40 and 30 dB, respectively) over the 20 dB typically obtained with flat input spectra.
The composite curves are constructed by combining the individual curves. The different phase curves (Fig. 4B, symbols) are consistent in their regions of overlap. The composite phase curve (Fig. 4B, solid curve) was obtained by a simple averaging of the phase values of the individual curves in the regions of their overlap. The individual amplitude curves (Fig. 4A), however, are not so easily combined because the amplitudes are only relative amplitudes (see discussion of Fig. 1): each individual amplitude curve has an unknown vertical offset. We aligned the individual curves by amounts that optimize, in a least squares sense, their mutual overlap. The resulting composite amplitude curve (Fig. 4C) again shows a good consistency across measurements, and its dynamical range is ∼50 dB. This composite curve covers the entire spectral range over which the fiber responded to stimuli not exceeding 75 dB SPL per tone; in that sense, it is an exhaustive determination of the tuning properties of the fiber.
The possibility to cover a large spectral range for each AN fiber, combined with the possibility to measure in a single ear many fibers having different CFs, allowed us to work toward a much more panoramic view of cochlear tuning than has been obtained with customary techniques. This is illustrated in Figure 5. The left column (Fig. 5A–D) shows a representative collection of composite amplitude curves obtained from a single animal. The high-CF (>3 kHz) amplitude curves in Figure 5A–D were shifted downward to compensate for the low-frequency bias caused by the decline of phase locking; these shifts were chosen to align the low-frequency tails with those of the other curves. The companion phase curves are shown in the right column (Fig. 5E–H). The phase curves within each panel were aligned using the unwrapping procedure described in the next section. To facilitate comparison with cochlea-mechanical data, all phase curves in Figure 5 were compensated for an estimated 1 ms synaptic delay (Ruggero and Rich, 1987).
The enlarged spectral window afforded by the use of composite curves (Fig. 5) yields a more exhaustive assessment of the tuning properties at different CFs than the individual curves of Figure 3. Moreover, the composite curves of Figure 5 illustrate the variation between animals. Eventually, such panoramic data can lead to a detailed quantitative description of excitation across extended portions of the cochlea (“portrait of traveling waves”). The remainder of this study is concerned with a first step toward this goal: the analysis of the variation of phase across CFs and stimulus frequencies.
Panoramic phase analysis in the apex
By definition, phase is a cyclic quantity: it assumes values that lie on a circle. To analyze how phase varies continuously with an independent parameter, it is necessary to unwrap the phase values. This trivial manipulation is routinely applied when presenting phase curves. For instance, the individual phase curves in Figures 3⇑–5 have been unwrapped across stimulus frequency. When attempting to arrive at a global picture of phase across the cochlea, one faces the less trivial problem of unwrapping phase with respect to two variables at the same time: CF and stimulus frequency (fstim). The problem is further complicated by the irregular sampling of CF values, a factor that is beyond the experimenter's control. Our solution of the two-dimensional unwrapping problem is based on visualizing the unprocessed, cyclic phase values.
Figure 6 depicts phase data from a single ear of a cat as a function of both CF (abscissa) and fstim (ordinate), measured at stimulus levels not exceeding 35 dB above threshold. Phase values are represented by a color code taken from the cyclic color map shown in the color bar next to the graph. Note that this is a representation of bare phase; no unwrapping has been applied. There is an obvious diagonal structure to these raw phase data: it is immediately clear to the eye which data points of similar color “belong together” and which are one or several full cycles apart.
The black lines in Figure 6 indicate two paths along which phase is approximately constant. Within the domain enclosed by this line pair, phase varies by much less than half a cycle. By manually inserting several such lines (lines not shown), we divided the graph into multiple domains of approximately constant phase. This partitioning of the data points reduced the two-dimensional unwrapping across neighboring data points to a one-dimensional unwrapping across neighboring domains. The average phase in each domain was computed by a vector average (Goldberg and Brown, 1969), and the domain averages were unwrapped with respect to domain order (i.e., multiples of one cycle were added to the individual domain averages in such a way that the phase difference between neighboring domains was minimized). Finally, unwrapped phases φu of the individual data points were computed by adding to each bare value φ an integer number n of cycles: φu = φ + n, where n was chosen to bring φu as close as possible to its unwrapped domain average.
The resulting unwrapped phases from Figure 6 as a function of CF and stimulus frequency are shown as a filled contour plot in Figure 7B. Phase between the individual data points was obtained by two-dimensional linear interpolation; no form of smoothing or running average was used. Contour lines are 0.5 cycle apart. Phase contour plots of five other animals are shown in Figure 7, A and C–F. The phase data of all six animals show a comparable systematic patterning. Phase is approximately constant along the CF = fstim diagonal. Above the diagonal, where fstim> CF, the contour lines are relatively dense, indicating a rapid change of phase with fstim/CF ratio. (Note that constant values of fstim/CF correspond to lines running parallel to the diagonal.) In this upper region, contour lines farther away from the diagonal tend to run more horizontally than those near the diagonal, particularly for CFs below 500 Hz.
Mathematically, the failure of the contour lines to run exactly parallel to the diagonal means that a given change in CF cannot be globally compensated by an equal change in fstim. In view of the logarithmic frequency axes, this amounts to the breaking of the so-called scaling invariance postulated by a class of cochlear models (Shera et al., 2000). In scaling-invariant models, phase only depends on the ratio fstim/CF, resulting in all contour lines running parallel to the diagonal. The low-frequency (<500 Hz) portions of Figure 7 in particular show a systematic deviation from such a scaling-invariant patterning.
In the region below the diagonal, where fstim< CF, the contour lines become more widely spaced, indicating that phase varies slowly with fstim/CF ratio. The data appear more noisy in this region of the (CF, fstim) area, and the variation across animals is larger. Note, however, that contour plots are overly sensitive to fluctuations in regions in which systematic variation is small, an effect comparable with the enhanced contrasts occurring when sunlight grazes a rough surface.
From the panoramic phase patterns of Figure 7, one can readily determine how the phase of single stimulus components changes with CF. This amounts to determining the phase profile along the cochlea of “traveling waves.” Figure 8 shows these phase profiles for all six animals. Each curve of Figure 8 was obtained by taking a horizontal section of a phase pattern of Figure 7 at a fixed stimulus frequency. Phase profiles are shown for stimulus frequencies ranging from 200 to 4000 Hz. Again, phase values of each curve were compensated for an estimated conduction and synaptic delay of 1 ms (Ruggero and Rich, 1987). Note that a different choice of this delay would not affect the shape of the individual curves because phase compensation is constant along the curve; the choice only affects the relative vertical positions of the curves. In Figure 8, CF was converted to cochlear position using an empirical cochlear map for the cat (Greenwood, 1990). In this way, each curve in Figure 8 shows the variation of the phase of a single stimulus component along the basilar membrane. The open circle on each phase curve indicates the characteristic position (CP) of the stimulus frequency, that is, the cochlear location that is most sensitive to that frequency.
The phase profiles of all six cats exhibit a similar systematic patterning. Among those phase profiles that span a sufficiently wide CF region, many are well approximated by two straight line segments: a shallow, almost horizontal, segment situated basal to the CP, and a steeper segment surrounding the CP. In most cases, the transition between the two segments (Fig. 8, filled diamonds) is quite abrupt. Formulated in terms of traveling waves, the shallow segments indicate rapid propagation from more basal parts of the cochlea, whereas the steep segments correspond to slow propagation near the CP. The slowing down of the wave takes place in a narrow region basal from the CP.
At first sight, it seems reasonable to identify these two segments with the “fast waves” and “slow waves” previously invoked to describe cochlear mechanical data (Cooper and Rhode, 1996; Olson, 1998). There are, however, some crucial differences. The mechanical data, which are obtained from a single longitudinal cochlear location, show an SPL-dependent interference between slow and fast waves. Furthermore, there are indications that the fast wave is really an artifact caused by opening the cochlea (Cooper and Rhode, 1996). In contrast, our data are panoramic (Fig. 8), were obtained from an intact cochlea, and portray the fast and slow segments as two ends of one and the same traveling wave. The transition between the two ends is relatively sharp but shows no signs of interference effects or discontinuities that one would expect if the slow and fast segments were really two independent “competing” waves. Instead, our data suggest that there is just one wave, which simply slows down on its way toward the apex. In view of these differences, we will avoid the terms “slow wave” and “fast wave” and use more neutral terminology such as “slow segment,” “slow propagation,” etc.
Assuming that all phase profiles converge at the basal end of the cochlea near the stapes, the data shown in Figure 8 suggest that the traveling waves experience a cumulative, stapes-to-CP phase lag that increases with stimulus frequency. This cumulative phase lag varies between ∼1 cycle (fstim of 200 Hz) and ∼2.5 cycles (fstim of 4 kHz). This patterning of cumulative phase lag is also visible in the composite phase curves of Figure 5, which were mutually aligned using the unwrapping procedure explained in the preceding paragraphs.
Note that the estimates of cumulative phase lag depend on the tentative 1 ms synaptic delay and on the additional assumption that the lowest stimulus frequency components suffer little phase lag when traversing the basal portion of the cochlea (Ruggero and Rich, 1983). The nonconstancy of cumulated phase lag with frequency is another demonstration of the failure of scaling invariance mentioned in the discussion of Figure 7 and is consistent with the theoretical analysis of Greenwood (1977) of AN data reported by Anderson et al. (1971).
The approximate two-segment character of most of the phase profiles allowed a straightforward evaluation of cochlear wavelength and phase velocity. For each phase profile, we determined by visual inspection the point at which the two segments meet (Fig. 8, filled diamonds) and used linear regression to compute the slopes of the steep segment apical from the meeting point (region of slow propagation) and the shallow segment basal from it (region of fast propagation). The slope measures change of phase with distance, so the wavelength λ is the inverse slope (distance per cycle). The phase velocity follows by cφ = fstim × λ. The estimates of phase velocity and wavelength near characteristic position (i.e., of the slow portion of the wave) are shown in Figure 9 for all six animals. Note that Figure 9 is not a dispersion graph in the usual sense (Elmore and Heald, 1985), because the different wavelengths are not evaluated at the same point; instead, for each stimulus frequency, λ and cφ are evaluated at a different cochlear region characteristic for that frequency (i.e., the region around CP). We extracted four wavelength estimates from the phase data of one cat reported by Pfeiffer and Kim (1975, their Fig. 2). They are reproduced in Figure 9A by symbols connected by a solid line.
Phase velocity of the slow portion of the waves (Fig. 9B) increases with stimulus frequency and varies from ∼0.5 m/s for 200 Hz stimuli to ∼5 m/s for 4 kHz stimuli. In this frequency range, wavelength varies between 3.5 and 1 mm (Fig. 9A). Consistency across animals of the data in Figure 9 is excellent for stimulus frequencies <3000 Hz and degrades somewhat above 3000 Hz. The estimates of phase velocity and wavelength at the high end (fstim >3000 Hz) might be affected by methodological limitations; when approaching the limits of phase locking, measurement of phase curves become less efficient and more time consuming, leading to a poorer coverage of CFs in the upper frequency range.
Wavelength and phase velocity of the fast portions of the wave follow from the slopes of the shallow segments basal from CP (region of fast propagation). These slopes are more difficult to extract from the phase profiles of Figure 8 than those of the steep segments (region of slow propagation). The main problem is the relative scarcity of data at stimulus frequencies remote from CF. For some of the phase curves, a shallow segment could not be identified with confidence, but for those curves for which a shallow, basal, segment was available, estimates of wavelength and phase velocity are shown in Figure 10. As expected, the data are more scattered than those of the region of slow propagation (Fig. 9). The estimated wavelengths (Fig. 10A) of the fast propagation of stimuli between 500 and 2000 Hz are in the order of 10–20 mm, with outliers having values >100 mm. Given that the total length of the basilar membrane of the cat is ∼22 mm (Greenwood, 1990), the fast propagation appears to be very fast indeed. The corresponding phase velocity in the fast region is in the order of 10–20 m/s (Fig. 10B). The contrast between the slow and fast propagation apparent from Figures 9 and 10 confirms the earlier observation from Figure 5 that most of the phase lag appears to be accumulated in a relatively restricted region around CP.
Many models of cochlear mechanics assume that the vibration pattern in the cochlea is well described by propagating fluid waves (Patuzzi, 1996). The ratio of wavelength and depth of the fluid in the scalae is an important parameter in such models. When the wavelength is large compared with the depth, a fluid-mechanical description in terms of “shallow waves” suffices. For the corresponding models, usually called “long-wave models” of the cochlea, a one-dimensional description of fluid motion is sufficient. In contrast, when the wavelength is small compared with the depth of the scalae, a description in terms of “deep waves” is needed, a “short-wave model” must be invoked, and a one-dimensional description is no longer valid. Lighthill (1981) provides the following criteria for deep waves, 2πd/λ > 1.5, and for shallow waves, 2πd/λ<0.5, where d is the depth of the scalae. This depth is fairly constant in the apical turns of the cat cochlea covered by the present study: d ≈ 0.5 mm (Wysocki, 2001). Using this value, Lighthill's criteria for deep waves and for shallow waves become λ < 2.1 mm and λ > 6.3 mm, respectively. These critical values are indicated in Figures 9 and 10 by dashed lines. Above 2 kHz, slow propagation (Fig. 9) turns out to be mediated by deep waves. For lower frequencies, the wavelengths of slow propagation are in the intermediate range 2.1 < λ < 6.3 mm; because their values are closer to 2.1 mm than to 6.3 mm, waves in this range can be characterized as “quite deep.” For virtually all of the wavelength estimates of the fast portion of the waves (Fig. 10), λ > 6.3 mm, which makes them shallow waves.
A final characterization of the phase profiles concerns the transition from fast propagation to slow propagation (Fig. 8, filled diamonds). Figure 11A shows the estimated location of the transition as a function of stimulus frequency. The different symbols indicate different animals. For reference, two dashed curves are drawn indicating CF = fstim (bottom curve) and CF = 2fstim (top curve). In Figure 11B, the same data are plotted but now with cochlear position converted to CF. The transition from fast propagation to slow propagation occurs “1 octave above the stimulus frequency,” or, more accurately, at a cochlear position having a CF of approximately twice the stimulus frequency. This frequency ratio appears to decrease with frequency, but it should be noted that, in the high-frequency region, the extraction of transition points from the phase profiles becomes less certain, because the fast portions of the waves are approaching the phase-locking limit (Fig. 8).
We presented a novel method to measure transfer characteristics of the auditory periphery based on a spectral analysis of AN responses to irregularly spaced tone complexes. We obtained detailed amplitude and phase-transfer functions for stimuli in the phase-locking range (<5 kHz) and determined panoramic phase maps covering large apical portions of the cochleas of six animals. We thus analyzed the propagation of stimulus components along the cochlea, determined the change of speed of propagation from fast (base) to slow (apex), and obtained estimates of wavelength, phase velocity, and cochlear location of the transition for stimulus frequencies between 100 and 5000 Hz.
The panoramic aspects of the present work may be viewed as a follow up of the pure-tone population study of Pfeiffer and Molnar (1970). Our work extends that study in the following respects: range of CFs, range of stimulus frequencies, quantitative detail (particularly the amplitude transfer), number of animals, and completeness of panoramic synthesis.
Methodological aspects: comparison with other measurement techniques
Among customary analysis techniques, revcor is most similar to our approach. Both revcor and first-order zwuis use wideband stimuli and characterize transfer properties in terms of an equivalent linear response. The caveats and limitations of such a linear systems approach to the highly nonlinear auditory periphery have been amply discussed in the context of revcor (for review, see Eggermont, 1993). Because the same considerations apply to the present study, we restrict the discussion of nonlinearities to two brief remarks. First, the effects of SPL on cochlear transfer can be accurately studied using our methods (Van der Heijden and Joris, 2006) but are beyond the scope of this report. Second, our restriction to intensities within 35 dB of threshold is motivated by our panoramic ambitions; higher SPLs sometimes cause quite drastic changes in phase transfer, thereby complicating across-CF phase analyses.
Revcor and first-order zwuis are both restricted to stimulus frequencies below the phase-locking limit. For the highest CFs, both techniques yield amplitude curves that are biased toward low frequencies, reflecting the decline of phase locking (Fig. 5). Revcor and single zwuis measurements have comparable dynamic ranges (∼20 dB) (Fig. 3), but the construction of composite curves (Figs. 4, 5) yields a vast improvement over revcor data. A known critique of revcor analysis is the coupling of higher-order nonlinearities to the first-order Wiener kernel (Johnson 1980a). The design of zwuis stimuli decouples first-order and third-order terms (see Materials and Methods). On the one hand, this is an advantage of the zwuis method. On the other, the relatively insignificant contribution of third-order terms to the AN spectrum in response to zwuis (Fig. 1) may well be interpreted as a hindsight justification for applying the revcor technique to AN measurements.
In revcor studies, amplitude and phase curves are obtained by Fourier transformation of impulse responses (Carney and Yin, 1988; Recio-Spinoso et al., 2005) (Fig. 10). This necessitates temporal windowing, which limits spectral resolution. Also, the choice of temporal window has a systematic effect on the derived amplitude curves. The practice of adapting the window to the measured revcor data therefore complicates across-fiber comparisons of amplitude curves. Zwuis is free from these complications because it is an inherently spectral method. Its spectral resolution is determined by the component spacing, which can be chosen arbitrarily fine. The importance of good spectral resolution is evident in Figure 3, in which many details (e.g., the abrupt phase transitions in Fig. 3D,J) occur at a fine scale (<CF/10).
Another important methodological aspect concerns the judgment of statistical significance. Revcor spectra may always be computed, but no objective criterion exists to disentangle the contributions of the frequency selectivity of a fiber from background noise. The noise floor in revcor spectra is judged a posteriori (e.g., where does the phase curve become erratic?), which may lead to over-interpretation or under-interpretation of data. Zwuis provides an a priori statistical criterion based on the Rayleigh test (Fig. 1), an accepted standard in temporal analyses of neural data.
Single tones are often used to assess cochlear frequency selectivity, e.g., with threshold curves, but also with Fourier analysis (Pfeiffer and Kim, 1975). It is difficult, however, to assess cochlear amplitude transfer from pure-tone responses: rate saturation and differences across AN fibers in threshold, spontaneous rate, and dynamical range generally interact with, and obscure, the amplitude characteristics of cochlear filtering. Moreover, cochlear-mechanical nonlinearities manifest themselves more saliently in the case of tonal stimulation, whereas wideband stimulation has a linearizing effect (Marmarelis and Marmarelis, 1978). Tones and wideband stimuli thus provide different, complementary ways of probing spectral properties of the auditory periphery. The contrast can be expressed as follows: tones and wideband stimuli probe, respectively, frequency selectivity (specificity of the response to single components) and frequency resolution (ability to separate competing components). This distinction becomes crucial when evaluating the relevance of peripheral tuning to hearing. For instance, amplitude transfer measured by wideband techniques (revcor, zwuis) is probably more relevant to understanding auditory masking and source separation than, say, Q10 values of threshold curves.
The derivation of cochlear phase transfer from pure-tone responses seems straightforward and unproblematic, but the same nonlinearities that obscure the assessment of cochlear amplitude may also distort phase measurements. Cycle histograms to single tones often show large deviations from a sinusoidal shape attributable to rectification, thresholding, saturation, refractory effects, and peak splitting (Johnson, 1980b). Particularly at high SPLs, this may cause asymmetric deformations of the cycle histogram. The origin of these deformations is in the transduction stage; they do not reflect cochlear-mechanical features, yet they contaminate the measurement of the phase of the underlying effective stimulus.
Zwuis, like revcor, reduces these spurious effects. The presence of multiple stimulus components linearizes the effect of each individual component on the response. This linearization is illustrated in Figure 1: the vector strength R of the second harmonics (filled squares) is small (at least 12 dB down) compared with R of the corresponding primary frequencies. Linearization is also apparent in cycle histograms at the primary frequencies; their shape remains quite sinusoidal even at the highest SPLs tested (data not shown). This linearization renders the phase estimates insensitive to the nonlinearities of the transduction, thus improving their connection with cochlear transfer characteristics.
Panoramic phase pattern
The region within a single cochlea covered by our data (Fig. 8) is larger than in any mechanical measurements in a sensitive cochlea. Because of their panoramic character, our data are a physiological counterpart of Bekesy's (1960) measurements on cadaver cochleas. The much sharper frequency tuning of our measurements (Fig. 5) probably reflects the difference in physiological condition. The phase patterning in our data roughly agrees with Bekesy's findings and is strongly suggestive of traveling waves propagating from base to apex. Importantly, Bekesy's (1960) book contains only one figure (Fig. 11–58) that displays phase profiles (phase as function of cochlear position). The scarcity of data, limitation to very low frequencies (50–300 Hz), sketchy nature of the curves, and internal inconsistencies in the data (Dallos, 1973, p 166) make a detailed comparison doubtful. Moreover, the cochleas used by Bekesy were severely damaged by extensive removal of bone tissue, and the effective stimulus intensity was extremely high. Nevertheless, Bekesy's figure suggests a gradual change of slope unlike the localized transition in our phase profiles (Fig. 8). To our knowledge, the only other published phase profiles having sufficient range to elucidate the nature of the fast-to-slow transition in the apex are the AN data of Kim et al. (1980). The single-tone profiles in their Figures 3, 6, and 10, which cover only two stimulus frequencies (2.1 and 2.7 kHz), display a clear two-segment character similar to our Figure 8. Thus, the data of Kim et al. show a sharp transition in phase velocity rather than a gradually changing propagation speed.
Many theoretical models of the cochlea predict gradually sloping phase profiles (Dallos, 1973, p 161), but the apical response patterns of such models is not backed up by any panoramic data. Obviously, the only way to determine how phase varies along the cochlea is through panoramic measurements. Any extrapolations of local measurements to extended regions are based on untested assumptions. Likewise, panoramic data measured in the basal region (Ren, 2002) cannot be extrapolated to the apex.
Given the scarcity of mechanical data from the apex of sensitive cochleas, the relationship between basilar membrane motion and auditory nerve responses is less established in the apex than in the base. Many species without a cochlea have excellent sensitivity and sharp tuning at low frequencies, so it is entirely possible that responses of low-CF fibers are partly shaped by mechanisms local to inner hair cells, i.e., by processes that are not directly coupled to basilar membrane motion. Even if AN responses are exclusively determined by basilar membrane motion, our observations do not prove the exclusive traveling-wave character of that motion in the apex. That question requires scrutiny of the envelope of the waves and of group delays and energy flow (Ruggero, 1994). Regardless of future findings, descriptions of the apex as a scaled version of the base will fail, because they miss qualitative differences between apex and base. It is therefore important to study the apex in its own right. Fortunately, the poor accessibility of the apex to mechanical measurements is offset by the accurate temporal coding by the nerve fibers innervating it. The opportunities offered by this circumstance are far from exhausted.
This work was supported by Fund for Scientific Research–Flanders Grants G.0083.02 and G.0392.05 and Research Fund K.U. Leuven Grants OT/01/42 and OT/05/57. We thank two anonymous reviewers for helpful comments.
- Correspondence should be addressed to Marcel van der Heijden, Laboratory of Auditory Neurophysiology, Campus Gasthuisberg O&N2, Herestraat 49, bus 1021, K.U. Leuven, B-3000 Leuven, Belgium.