Abstract
People with cochlear hearing loss have substantial difficulty understanding speech in real-world listening environments (e.g., restaurants), even with amplification from a modern digital hearing aid. Unfortunately, a disconnect remains between human perceptual studies implicating diminished sensitivity to fast acoustic temporal fine structure (TFS) and animal studies showing minimal changes in neural coding of TFS or slower envelope (ENV) structure. Here, we used general system-identification (Wiener kernel) analyses of chinchilla auditory nerve fiber responses to Gaussian noise to reveal pronounced distortions in tonotopic coding of TFS and ENV following permanent, noise-induced hearing loss. In basal fibers with characteristic frequencies (CFs) >1.5 kHz, hearing loss introduced robust nontonotopic coding (i.e., at the wrong cochlear place) of low-frequency TFS, while ENV responses typically remained at CF. As a consequence, the highest dominant frequency of TFS coding in response to Gaussian noise was 2.4 kHz in noise-overexposed fibers compared with 4.5 kHz in control fibers. Coding of ENV also became nontonotopic in more pronounced cases of cochlear damage. In apical fibers, more classical hearing-loss effects were observed, i.e., broadened tuning without a significant shift in best frequency. Because these distortions and dissociations of TFS/ENV disrupt tonotopicity, a fundamental principle of auditory processing necessary for robust signal coding in background noise, these results have important implications for understanding communication difficulties faced by people with hearing loss. Further, hearing aids may benefit from distinct amplification strategies for apical and basal cochlear regions to address fundamentally different coding deficits.
SIGNIFICANCE STATEMENT Speech-perception problems associated with noise overexposure are pervasive in today's society, even with modern digital hearing aids. Unfortunately, the underlying physiological deficits in neural coding remain unclear. Here, we used innovative system-identification analyses of auditory nerve fiber responses to Gaussian noise to uncover pronounced distortions in coding of rapidly varying acoustic temporal fine structure and slower envelope cues following noise trauma. Because these distortions degrade and diminish the tonotopic representation of temporal acoustic features, a fundamental principle of auditory processing, the results represent a critical advancement in our understanding of the physiological bases of communication disorders. The detailed knowledge provided by this work will help guide the design of signal-processing strategies aimed at alleviating everyday communication problems for people with hearing loss.
Introduction
People with hearing loss face significant challenges understanding speech in daily life. While hearing aids improve speech perception in quiet environments by increasing audibility, these devices provide limited benefit under real-world conditions with background noise and reverberation (Duquesnoy, 1983; Woods et al., 2010). The physiological basis of this problem remains a topic of active debate with clear translational relevance.
The cochlea acts as a bank of bandpass auditory filters with center frequencies distributed across the frequency range of hearing. The auditory filtering process decomposes complex sounds, such as speech, into an array of narrowband component signals, each of which contains a slow-varying temporal envelope (ENV) and faster fluctuations in temporal fine structure (TFS). In the auditory nerve, TFS and ENV are encoded through neural phase locking, i.e., fluctuations in discharge rate synchronized to the temporal structure of the sound after cochlear filtering (Fig. 1). Whereas all auditory nerve fibers phase lock to ENV structure, TFS coding occurs primarily below 3–5 kHz (Johnson, 1980).
Recent behavioral research suggests that speech perception problems in people with hearing loss may be caused by impaired processing of TFS (Lorenzi et al., 2006, 2009). These studies show that while ENV-based speech is intelligible regardless of hearing status, TFS-based speech can be understood by normal-hearing (NH) listeners but not listeners with hearing loss. Moreover, the addition of TFS information to ENV-based speech in noise improves perception more in NH listeners than in listeners with hearing loss (Hopkins et al., 2008). While completely independent manipulation of TFS and ENV in these studies is questionable (Oxenham and Simonson, 2009; Swaminathan and Heinz, 2012; Shamma and Lorenzi, 2013), they nonetheless implicate a contribution of temporal processing deficits to perceptual impairment with hearing loss.
Surprisingly, neurophysiological studies in nonhuman animals have shown relatively few changes in temporal processing with cochlear damage (Henry and Heinz, 2013). In auditory nerve fibers, noise overexposure does not affect phase locking to the TFS of tones and amplitude-modulated tones in quiet environments, and in fact enhances ENV coding (Miller et al., 1997; Kale and Heinz, 2010, 2012; Henry et al., 2014). In contrast, TFS coding is reduced for tones in noisy environments (Henry and Heinz, 2012) and altered for synthetic vowels (i.e., reduced synchrony capture; Miller et al., 1997). Other studies of drug-induced damage have produced conflicting results, with one study showing a decrease in phase locking to tones (Woolf et al., 1981), but another showing no effect (Harrison and Evans, 1979).
The few changes in temporal coding uncovered so far have led to speculation that temporal-processing deficits might arise primarily during central processing (Moore, 2008). A contribution of cochlear pathophysiology remains possible, however, because existing studies largely employed narrowband stimuli with little similarity to speech. Wiener-kernel analyses of auditory responses to broadband Gaussian noise (van Dijk et al., 1994; van Drongelen, 2010) provide a rigorous, system-identification approach for quantifying coding of TFS and ENV. The first-order and second-order Wiener kernels, calculated through spike-triggered averaging of the stimulus waveform, can be used to describe the frequency tuning (carrier-frequency band) driving phase locking to TFS and ENV, respectively. Wiener-kernel studies in NH mammals (Lewis et al., 2002; Recio-Spinoso et al., 2005) show that cochlear fibers tuned to characteristic frequencies (CFs) below 3–5 kHz encode both the TFS and ENV of a narrow carrier-frequency band centered on CF (Fig. 2A). By contrast, fibers with higher CFs encode the ENV of the CF-centered carrier band but not its TFS (Fig. 3A), consistent with the roll-off in TFS coding with increasing frequency (Johnson, 1980). These detailed analyses provide a rigorous demonstration of auditory tonotopicity, a fundamental principle of most neural-coding theories.
The present study in chinchillas used Wiener-kernel analyses of auditory nerve fiber responses to Gaussian noise to quantify the effects of permanent, noise-induced hearing loss on coding of TFS and ENV along the frequency axis of the cochlea. Noise trauma caused marked distortions in tonotopic coding of temporal structure that appear likely to impair perception of complex sounds.
Materials and Methods
Animal procedures.
All procedures were performed in male chinchillas and approved by the Purdue Animal Care and Use Committee. The neurophysiological data presented here were collected from 12 NH control animals (100 fibers), six animals exposed to a 50 Hz band of Gaussian noise with a center frequency of 2 kHz for 4 h at 115 dB SPL (76 fibers), and 10 animals exposed to an octave band of Gaussian noise with a center frequency of 500 Hz for 2 h at 116 dB SPL (72 fibers). The two noise-exposure paradigms caused similar patterns of permanent, noise-induced threshold elevation as judged from tuning curves (Fig. 4A). Mean threshold elevation at CFs of 0.5, 1, 2, 4, and 8 kHz were 8, 20, 33, 25, and 25 dB, respectively, for the 2 kHz exposure and 20, 25, 26, 21, and 16 dB, respectively, for the 500 Hz exposure. Data from the two exposure paradigms were combined for this report. Noise exposures were performed in a sound-attenuating booth under anesthesia using either a pair of dynamic loudspeakers (Fostex FT28D, Fostex International; 2 kHz exposures) or single enclosed woofer (Selenium 10PW3, Harman; 500 Hz exposures) suspended 25–30 cm above the animal's head. Anesthesia was induced with xylazine (1–2 mg/kg, s.c.) followed by ketamine (50–65 mg/kg, i.p.). Atropine (0.05 mg/kg, i.m.) was given to control mucous secretions and eye ointment was applied. Animals were held in position with a stereotaxic device, and body temperature was maintained at 37°C (TCAT2LV, Physitemp; or 50-7220F, Harvard Apparatus). Supplemental injections of ketamine (20–30 mg/kg, i.p.) were given as needed to maintain an areflexic state.
Neurophysiological data were recorded from auditory nerve fibers under anesthesia ≥3 weeks after the noise exposure using standard procedures in our laboratory (Kale and Heinz, 2010, 2012; Henry and Heinz, 2012). Previous research in cats shows that noise-induced threshold shifts are stable beginning 19 d following noise overexposure (Miller et al., 1963). Auditory nerve recordings were typically conducted within 8 weeks of the noise overexposure, with 28 weeks elapsing in one exceptional case. The neurophysiological recordings made 28 weeks after exposure (i.e., pure-tone tuning curves, Gaussian noise responses) were similar to those made within 8 weeks. Anesthesia was induced with xylazine and ketamine as described above, but maintained with sodium pentobarbital (∼15 mg/kg/2 h, i.v.). Physiological saline (1–2 ml/2 h, i.v.) and lactated ringers (20–30 ml/24 h, s.c.) were also given, and a tracheotomy facilitated breathing. Animals were positioned in a stereotaxic device (902, Kopf Instruments) in a sound-attenuating booth. The skin and muscles overlying the skull were transected to expose the ear canals and bullae, and both ear canals were dissected to allow insertion of hollow ear bars. The right bulla was vented through 30 cm of polyethylene tubing. A craniotomy was opened in the posterior fossa and the cerebellum partially aspirated and retracted medially to expose the trunk of the auditory nerve bundle. Acoustic stimuli were presented through the right ear bar with a dynamic loudspeaker (DT48, Beyerdynamic) and calibrated using a probe microphone placed within a few millimeters of the tympanum (ER7C, Etymotic). Neurophysiological recordings were made using a 10–30 MΩ glass microelectrode advanced into the auditory nerve by hydraulic microdrive (640, Kopf Instruments). Recordings were amplified (2400A, Dagan) and bandpass filtered from 0.03–6 kHz (3550, Krohn-Hite). Spikes were identified using a time–amplitude window discriminator (BAK Electronics) and timed with 10 μs resolution.
Single fibers were isolated by listening for spikes while advancing the electrode through the auditory nerve during pulsed acoustic stimulation with broadband noise. When a fiber was encountered, a tuning curve was recorded using an automated procedure that tracked, as a function of stimulus frequency, the minimum SPL of a 50 ms tone required to evoke ≥1 more spike than a subsequent 50 ms silent period (Chintanpalli and Heinz, 2007). CF was typically identified as the frequency of the tip of the tuning curve in both NH and noise-overexposed fibers. In a small number of noise-overexposed fibers lacking a clear tip (N = 7), CF was estimated as the frequency of the breakpoint in the high-frequency slope of the tuning curve because previous research shows that this value corresponds well with the CF before cochlear damage (Liberman, 1984). Next, a sequence of nine broadband Gaussian noise stimuli were presented repeatedly for ≤10 min at ∼10 dB above the threshold (for noise stimuli) until ∼20,000 total driven spikes were recorded. Noise stimuli were 1.7 s in duration with a bandwidth of 16.5 kHz and silent interval between stimuli of 1.2 s.
Wiener-kernel analyses of neural responses.
The first-order Wiener kernel (h1) and second-order Wiener kernel (h2) were computed from first-order and second-order cross-correlations, respectively, between the noise stimulus waveform x(t) and the response train of N = ∼20,000 driven spikes. Only spikes occurring >20 ms after stimulus onset and before stimulus offset were included in the cross-correlations, which were calculated with a sampling period of 0.02 ms and maximum time lag τ of 10.2 ms (512 points) or 20.4 ms (1024 points; for CFs <3 kHz). The basic Wiener-kernel computations have been described previously in detail (Eggermont, 1993; van Dijk et al., 1994; Lewis et al., 2002; Lewis and van Dijk, 2004; Recio-Spinoso et al., 2005; Sneary and Lewis, 2007). In brief, h1(τ) is calculated as
h2(τ1, τ2) is calculated as
To facilitate comparison across fibers varying in mean driven rate and stimulus SPL, we normalized the Wiener kernels by multiplying h1 by
Results
Effects of noise overexposure on classical tuning curves
Auditory nerve fibers of chinchillas with noise-induced hearing loss (148 fibers, 16 animals) had classical (pure-tone based) tuning curves with higher thresholds and broader tuning bandwidth than control fibers (100 fibers, 12 animals; Figs. 2, 3, tuning curves). Threshold elevation in noise-overexposed fibers was quantified as the observed threshold at CF minus the mean threshold of NH control fibers at the same CF. A normalized index of tuning-curve bandwidth was calculated as the octave difference between the observed bandwidth 10 dB above threshold and the mean bandwidth of control fibers at the same CF (i.e., log2[BWi/BWNHmean(CFi)] for the ith fiber). Threshold elevation varied widely across the population of noise-overexposed fibers from ∼0 to 50 dB (Fig. 4A). Both threshold elevation and broadened tuning (Fig. 4B) were observed most consistently in fibers with CFs between 1 and 4 kHz, with some impaired fibers exhibiting bandwidth indices of ≤2 octaves (i.e., tuning curves four times broader than in controls). The observed effects of noise overexposure on tuning curves were consistent with mixed damage to cochlear inner and outer hair cells (Liberman and Dodds, 1984). The proportion of high spontaneous rate auditory nerve fibers (i.e., ≥18 spikes/s in the absence of acoustic stimulation) increased from 54% in the control population to 71.6% in the noise-overexposed population (χ2 = 8.095, df = 1, p = 0.0044).
Tonotopic patterns of TFS and ENV coding in NH control fibers
The best frequency, amplitude, and bandwidth of TFS and ENV coding were calculated using Wiener-kernel analyses of auditory nerve fiber responses to broadband Gaussian noise (bandwidth, 16.5 kHz) presented ∼10 dB above threshold. For ENV coding, we use the term “best frequency” to refer to the dominant frequency of the carrier band driving phase locking to ENV structure. In control fibers, the best frequency of TFS coding was closely aligned with CF (Fig. 6A, black crosses). TFS coding was strongest in fibers with CFs <2 kHz and decreased with increasing CF (Fig. 6B, black crosses). Fibers with CFs >4–5 kHz showed no significant TFS coding (i.e., amplitude <3 SDs above the noise floor). The best frequency of ENV coding was also closely tuned to CF (Fig. 6D, black crosses). However, in contrast to TFS coding, ENV coding increased in amplitude with increasing CF (Fig. 6E, black crosses). Some fibers with CFs <2.5 kHz showed no significant phase locking to ENV. The 10 dB bandwidths of TFS and ENV coding (Fig. 6C,F, black crosses) were similar to each other and to the 10 dB tuning-curve bandwidths (Fig. 4B, black crosses), and increased with increasing CF. Patterns of temporal coding were consistent with previous Wiener-kernel analyses of auditory nerve fiber responses in NH chinchillas (Recio-Spinoso et al., 2005; Temchin et al., 2005). Finally, no reliable differences in temporal coding were observed between fibers with high and low/medium spontaneous rates. This result contrasts somewhat with previous findings of 10–20% lower TFS/ENV synchrony to amplitude-modulated tones in high than in low/medium spontaneous rate fibers (Joris and Yin, 1992), but note that our analysis used a different stimulus (Gaussian noise) presented at only a single SPL.
Hearing loss distorts tonotopic coding of TFS and ENV in basal fibers
Noise-induced hearing loss had the most pronounced effects on temporal coding in fibers with CFs >1.5 kHz. In most of these basal fibers (44 of 73 fibers), noise overexposure introduced robust, abnormal phase locking to low-frequency TFS [i.e., >1 octave below CF; best frequency, 0.91 ± 0.22 kHz; 10 dB bandwidth of low-frequency TFS coding, 1.56 ± 0.64 kHz (mean ± SD); Figs. 3B,C, temporal coding, 6A]. TFS coding was not only tuned far below CF, but the amplitude of TFS coding in these fibers was generally greater than in controls (Fig. 6B) and increased with the degree of estimated cochlear damage (Fig. 7). Specifically, amplitude was positively associated with both the tuning-curve bandwidth index [linear regression (N = 44): R2 = 0.20 p = 0.002] and threshold elevation [linear regression (N = 44): R2 = 0.24 p = 0.001]. Note that previous Wiener-kernel studies in chinchilla show that NH auditory nerve fibers with high CFs do not encode low-frequency TFS at high stimulus levels, at least up to 80 dB SPL (Recio-Spinoso et al., 2005), which includes the range of sound levels used to stimulate the majority of the noise-exposed fibers in the present study (Fig. 8). Hence, dissociation of TFS coding is a consequence of the noise overexposure, and cannot be explained by the higher stimulus level required to activate these fibers (typically 55–80 dB SPL in noise-impaired fibers vs 35–65 dB SPL in controls).
Among basal fibers with TFS coding tuned to low frequency, ENV coding generally remained faithfully tuned within a half octave of CF (34 of 44 fibers), indicating a notable dissociation between the carrier-frequency tuning of phase-locked responses to TFS and ENV (Figs. 3B, temporal coding, 6A,D). The amplitude of ENV coding in this population was often greater than in controls, particularly for CFs near 1.5–2.5 kHz (Fig. 6E), while the 10 dB bandwidth of ENV coding was usually slightly broader (Fig. 6F). These fibers were associated with a broad range of estimated cochlear damage (Fig. 7, open triangles). The tuning-curve bandwidth index ranged from −0.25 to 2.04 octaves while threshold elevation ranged from 6.9 to 40.6 dB. This range of threshold elevation corresponds to the clinical definition of slight to mild hearing loss (Clark, 1981). CF was estimated based on the breakpoint of the high-frequency slope of the tuning curve in 3 of 34 fibers in this group.
In a smaller subgroup of basal fibers with TFS coding tuned to low frequency (10 of 44 fibers), tonotopic coding of ENV near CF was lost as the best frequency of ENV coding shifted to low frequency as well [best frequency, 0.99 ± 0.29 kHz; 10 dB bandwidth of low-frequency ENV coding, 2.40 ± 0.82 kHz (means ± SD); Fig. 3C, temporal coding]. These fibers were associated with especially strong phase locking to low-frequency TFS (mean amplitude ± SD, 0.336 ± 0.078) and estimates of more pronounced cochlear damage (Fig. 7, circles). The tuning-curve bandwidth index ranged from 1.55 to 2.09 octaves in this group while threshold elevation ranged from 21.2 to 43.8 dB. This range of threshold elevation corresponds to mild-to-moderate clinical hearing loss (Clark, 1981). CF was estimated based on the breakpoint of the high-frequency slope of the tuning curve in 4 of 10 fibers in this group.
The remaining noise-exposed basal fibers not exhibiting abnormally tuned TFS coding (29 of 73 fibers) showed different patterns, depending on CF. Fibers with relatively high CFs (4–10 kHz, eight fibers) exhibited no significant phase locking to TFS rather than TFS coding tuned near CF, similar to control fibers. These fibers, like control fibers, encoded stimulus ENV in a narrow carrier-frequency band near CF and were associated with estimates of little or no cochlear damage. The tuning-curve bandwidth index ranged from −0.02 to 0.77 octaves in these fibers while threshold elevation ranged from −4.0 to 22.3 dB. Fibers with relatively lower CFs (1.5–2.5 kHz, 21 fibers) phase locked faithfully to both TFS and ENV near CF, and were associated with a broad range of estimated hearing loss. The tuning-curve bandwidth index ranged from 1.03 to 2.13 octaves while threshold elevation ranged from 6.0 to 41.0 dB.
While enhancement of TFS coding amplitude with hearing loss was observed across a broad range of basal CFs (>1.5 kHz; Fig. 6B), enhancement of ENV coding was observed primarily in fibers with CFs from 1 to 3 kHz (Fig. 6E). Specifically, ENV coding amplitude was positively associated with threshold elevation in fibers with CFs from 1 to 3 kHz [linear regression (N = 97): R2 = 0.143 p < 0.001] and unassociated with threshold elevation in fibers with CFs >3 kHz [linear regression (N = 84): R2 = 0.045 p = 0.053]. These findings are consistent with those of a previous study showing maximum enhancement of amplitude modulation coding in noise-overexposed fibers with CFs near 2–2.5 kHz (Kale and Heinz, 2010).
Finally, it should be noted that while noise overexposure does not affect the fundamental ability of fibers to encode TFS frequencies ≤4–5 kHz (Miller et al., 1997; Kale and Heinz, 2010), it reduced the maximum dominant frequency of TFS encoded by the cochlea in response to broadband noise (Fig. 6A). The highest best frequency of TFS coding in the noise-damaged population was 2.4 kHz. This pattern contrasts with NH controls, in which fibers with CFs ≤4.5 kHz showed a tuned, phase-locked response to TFS at CF. Note that this downward shift of the upper limit for TFS coding of broadband noise contrasts with the lack of shift typically observed for phase locking to narrowband sounds (e.g., tones) in quiet environments (Harrison and Evans, 1979; Miller et al., 1997; Kale and Heinz, 2010). This contrast highlights the importance of studying the effects of cochlear hearing loss using complex sounds (Henry and Heinz, 2013), for which lower-frequency TFS can overwhelm the less-robust TFS coding near CF (due to the normal roll off in phase locking; Johnson, 1980) in noise-exposed basal fibers with a reduced tip-to-tail ratio in the tuning curve (e.g., from hypersensitive tails).
Hearing loss does not dissociate TFS and ENV coding in apical fibers
Unlike the basal fibers, noise-overexposed apical fibers with CFs <1.5 kHz (N = 75) nearly always phase locked to stimulus TFS and ENV in the same carrier-frequency band (Fig. 2B, temporal coding). The best frequency of temporal coding was generally near CF (Fig. 6A,D). Compared with control fibers, the amplitude of temporal coding in noise-overexposed fibers was similar or slightly greater (Fig. 6B,E), while the 10 dB bandwidth of coding was on average broader than normal for CFs >700 Hz (Fig. 6C,F).
Discussion
The results show that noise-induced hearing loss causes remarkable changes in the neural representation of TFS and ENV in relatively basal cochlear fibers with CFs >1.5 kHz. Whereas NH control fibers in this CF range encoded primarily ENV in the high-frequency part of broadband sounds, hearing loss typically introduced robust, dissociated phase locking to low-frequency TFS. With more severe cochlear damage, coding of ENV in the high-frequency part of the sound was lost as coding of all temporal features (TFS and ENV) shifted to lower frequencies. The primary effects of hearing loss in more apical fibers were broadened frequency tuning of both TFS coding and ENV coding, but without a significant loss in tonotopicity.
The present study is the first to show that hearing loss can dissociate the frequency tuning of phase-locked responses to TFS and ENV, thereby distorting the tonotopic representation of temporal acoustic structure. This result for complex sounds can be understood based on the combined effects of the roll-off in TFS phase locking (Johnson, 1980) and previously described changes with hearing loss in the relative sensitivity of individual fibers in the tip (i.e., at CF) and low-frequency (0.5–1 kHz) tail region of their pure-tone tuning curves (Liberman and Dodds, 1984). When sensitivity is high in the tip and low in the tail region, as in NH control fibers, the phase-locked response to a broadband sound is dominated by ENV in the frequency band passing through the tip but not low-frequency TFS because the insensitive tail effectively filters out low-frequency components of the sound. Phase locking to TFS at CF is minimal or insignificant due to the high CF of the fiber. As the relative sensitivity of the tail increases due to hearing loss (i.e., tail hypersensitivity; Liberman and Dodds, 1984), rejection of low-frequency components breaks down and an additional phase-locked response to low-frequency TFS passing through the tail occurs (Fig. 3B). Phase locking to ENV passing through the tip is favored over ENV passing through the tail as long as the tip is sufficiently more sensitive than the tail. When the sensitivity of the tail exceeds sensitivity of the tip, as occurs in some cases of severe cochlear damage (Liberman and Dodds, 1984), phase locking entrains to ENV in addition to TFS in the low-frequency range of the sound passing through the tail (Fig. 3C).
The prominent increase in phase locking to low-frequency TFS of broadband sounds reported here provides a general system-identification explanation for effects seen in previous studies of auditory nerve fiber responses to vowel stimuli in animals with cochlear damage. In NH cats, fibers with CFs near a formant (spectral envelope maximum) phase lock preferentially to the TFS of a single harmonic occurring near the formant (Young and Sachs, 1979; Sachs and Young, 1980). Following noise-induced hearing loss, however, fibers with CFs near a formant phase lock less selectively (e.g., to multiple harmonics with frequencies near lower formants; Miller et al., 1997), consistent with the loss of tonotopicity demonstrated here.
The changes in neural coding of TFS and ENV shown here can potentially explain the stronger negative impact of background noise on speech perception in people with hearing loss. Most people with hearing loss have little difficulty understanding speech under quiet conditions but struggle considerably under real-world listening conditions with background noise (Duquesnoy, 1983; Woods et al., 2010). In some cases, perceptual errors can be traced to a reduction in the ability to use high-frequency acoustic cues to discriminate consonants rather than a reduction in the ability to use lower-frequency cues to discriminate vowels (Owens et al., 1968; Dubno et al., 1982). How then might the high-frequency cues necessary for consonant identification (Fig. 9A) be degraded more by background noise in listeners with hearing loss than in NH listeners? Based on a simple model of cochlear filtering and temporal coding, NH auditory nerve fibers are predicted to retain a robust, ENV-based representation of high-frequency cues in the face of competing low-frequency background noise (Fig. 9B). Hypersensitivity of the tuning curve tail in noise-exposed fibers, in contrast, introduces strong phase locking to the low-frequency temporal structure of the background noise that can mask the representation of high-frequency consonants (Fig. 9C). In cases of more severe hearing loss, our results predict an even bleaker outcome for consonant identification in background noise because high-frequency ENV cues are not encoded at all. Instead, fibers of noise-damaged ears are expected to respond exclusively to the TFS and ENV of the low-frequency noise. Verifying these predictions should be a high priority for future research.
Changes in TFS coding with hearing loss were observed more frequently than changes in ENV coding and in less severe instances of cochlear damage. These findings can explain behavioral results showing that listeners with hearing loss can understand speech containing only ENV cues but, in contrast to NH listeners, struggle to understand speech containing only TFS cues (Lorenzi et al., 2006, 2009) and to use TFS cues to boost speech perception in fluctuating background noise (Hopkins et al., 2008). Perception of ENV-based speech may be close to normal in these listeners because, as shown here, ENV coding in noise-overexposed ears is most often encoded in the normal tonotopic fashion with high-frequency ENV cues encoded in basal fibers and lower-frequency ENV cues encoded in more apical fibers. In contrast, TFS cues in noise-damaged ears are not encoded tonotopically. Instead, low-frequency TFS is encoded across a broad range of fibers regardless of CF and higher-frequency TFS information (2.5–5 kHz) is never encoded. This severely distorted and diminished representation of TFS may be sufficient to explain the inability of listeners with hearing loss to use TFS information to understand speech or improve perception of ENV-based speech in noise.
Because the mammalian cochlea shares many anatomical and physiological features across species, it is reasonable to assume that the effects of hearing loss on temporal processing documented here in chinchillas also occur in humans. The chinchilla's cochlea is shorter and probably more broadly tuned than the cochlea in humans (Shera et al., 2010; Joris et al., 2011), which raises the question of the precise quantitative translation of our results. Studies of otoacoustic emissions suggest that the transition between apical and more active basal cochlear physiology occurs at a lower CF in humans than in chinchillas (Shera et al., 2010). One possible implication of this difference is that the frequency above which temporal coding is severely degraded (i.e., 1.5 kHz in chinchillas) may be lower in humans (e.g., ≤1 kHz). In this sobering scenario, a greater proportion of the frequency bandwidth of speech would be subject to degraded tonotopic coding and the effects on perception compounded.
In conclusion, the changes in neural representation of TFS and ENV reported here are likely to have profound, negative consequences on speech perception in people with hearing loss. Other hypothesized changes in the peripheral auditory system are also thought to contribute to poor speech perception with cochlear damage, including reduced synchrony capture of vowel formants (Miller et al., 1997) and diminished phase-locking to TFS in background noise (Henry and Heinz, 2012). Broader auditory filtering might also contribute by introducing rapid fluctuations in TFS to auditory nerve fiber responses that cannot be decoded by sluggish central-processing mechanisms (Moore and Sek, 1996) and by distorting potentially valuable cross-CF differences in neural response phase (Shamma, 1985; Carney et al., 2002; Heinz et al., 2010). Finally, cochlear synaptopathy, with or without audiometric threshold shift (Kujawa and Liberman, 2009), may adversely affect speech perception by reducing the redundancy of neural coding.
Looking to the future, the neurophysiological changes with hearing loss shown so far present important challenges for efforts aimed at restoring speech perception in human listeners. For example, to what extent can high-pass filtering (Miller et al., 1999) and noise-reduction algorithms improve ENV coding in high-CF auditory nerve fibers by reducing their response to low-frequency TFS, and how can this performance be improved upon? Finally, the results of the current study highlight the fact that different strategies could be needed to address fundamentally different coding deficits between apical and basal cochlear regions.
Footnotes
This work was supported by National Institutes of Health Grants R01-DC009838 to M.G.H. and F32-DC012236 to K.S.H. We thank Mark Sayles for fruitful discussions about these data and analyses and for constructive comments on a previous version of this manuscript. We also thank Jon Boley for assisting with data collection.
The authors declare no competing financial interests.
- Correspondence should be addressed to either of the following: Michael G. Heinz, Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907, mheinz{at}purdue.edu; or Kenneth S. Henry, Department of Biomedical Engineering, University of Rochester, 601 Elmwood Avenue, Box 603, Rochester, NY 14642, kenneth_henry{at}urmc.rochester.edu