Abstract
Listeners with sensorineural hearing loss (SNHL) struggle to understand speech, especially in noise, despite audibility compensation. These real-world suprathreshold deficits are hypothesized to arise from degraded frequency tuning and reduced temporal-coding precision; however, peripheral neurophysiological studies testing these hypotheses have been largely limited to in-quiet artificial vowels. Here, we measured single auditory-nerve-fiber responses to a connected speech sentence in noise from anesthetized male chinchillas with normal hearing (NH) or noise-induced hearing loss (NIHL). Our results demonstrated that temporal precision was not degraded following acoustic trauma, and furthermore that sharpness of cochlear frequency tuning was not the major factor affecting impaired peripheral coding of connected speech in noise. Rather, the loss of cochlear tonotopy, a hallmark of NH, contributed the most to both consonant-coding and vowel-coding degradations. Because distorted tonotopy varies in degree across etiologies (e.g., noise exposure, age), these results have important implications for understanding and treating individual differences in speech perception for people suffering from SNHL.
SIGNIFICANCE STATEMENT Difficulty understanding speech in noise is the primary complaint in audiology clinics and can leave people with sensorineural hearing loss (SNHL) suffering from communication difficulties that affect their professional, social, and family lives, as well as their mental health. We measured single-neuron responses from a preclinical SNHL animal model to characterize salient neural-coding deficits for naturally spoken speech in noise. We found the major mechanism affecting neural coding was not a commonly assumed factor, but rather a disruption of tonotopicity, the systematic mapping of acoustic frequency to cochlear place that is a hallmark of normal hearing. Because the degree of distorted tonotopy varies across hearing-loss etiologies, these results have important implications for precision audiology approaches to diagnosis and treatment of SNHL.
- auditory nerve
- distorted tonotopy
- frequency-following responses
- noise-induced hearing loss
- speech perception
- temporal coding
Introduction
Individuals with sensorineural hearing loss (SNHL) demonstrate speech-perception deficits, especially in noise, which are often not resolved, even with state-of-the-art hearing-aid strategies and noise-reduction algorithms (McCormack and Fortnum, 2013; Lesica, 2018). In fact, difficulty understanding speech in noise is the number-one complaint in audiology clinics (Chung, 2004). Although audibility is a factor contributing to these difficulties (Phatak and Grant, 2014), suprathreshold deficits associated with dynamic spectrotemporal cues substantially contribute as well, especially in noise (Festen and Plomp, 1983; Zeng and Turner, 1990). Unfortunately, neurophysiological studies of impaired speech coding in the auditory nerve (AN), the first neural site affected by cochlear components of SNHL (Trevino et al., 2019), have been primarily limited to synthetic vowels in quiet and have not included everyday sounds with natural dynamics.
Common suprathreshold factors hypothesized to underlie speech-perception-in-noise deficits following SNHL are reduced sharpness of cochlear frequency tuning, degraded temporal-coding precision, and cochlear synaptopathy. Reduced frequency-tuning sharpness, often observed in listeners with SNHL (Glasberg and Moore, 1986), limits the ability to resolve spectral components of speech and allows more noise into auditory filters (Horst, 1987; Moore, 2007). Reduced perceptual frequency selectivity parallels broader tuning observed in physiological responses following outer-hair-cell dysfunction (Liberman and Dodds, 1984a; Ruggero and Rich, 1991). A second hypothesized suprathreshold deficit is a loss of temporal-coding precision (Lorenzi et al., 2006; Moore, 2008; Halliday et al., 2019). While one study has reported a decrease in AN phase-locking following SNHL (Woolf et al., 1981), others did not find a degradation (Harrison and Evans, 1979; Kale and Heinz, 2010); however, these studies have been limited to laboratory stimuli (e.g., tones). Finally, cochlear synaptopathy progresses with age and noise exposure (Kujawa and Liberman, 2009; Fernandez et al., 2015) and has been hypothesized to underlie perceptual deficits in listeners even without audiometric loss.
Another neural suprathreshold factor that may contribute to speech-coding degradations is the change in on-frequency versus off-frequency sensitivity (i.e., reduced tip-to-tail ratio; TTR) observed in AN fibers following noise-induced hearing loss (NIHL). This phenomenon, which results from reduced tip sensitivity and/or tail hypersensitivity, has been well characterized with tonal frequency-tuning curves (FTCs; Liberman and Dodds, 1984a). These effects can produce tonotopic distortions with important implications for complex-sound processing but have only recently begun to be explored for nontonal stimuli such as broadband noise (Henry et al., 2016, 2019).
Several neurophysiological studies have investigated speech coding following SNHL but not with connected speech in noise, for which listeners with SNHL struggle the most (Young, 2012; Sayles and Heinz, 2017). NH AN speech-coding studies have predominantly used short-duration synthesized vowel-like and consonant-like stimuli in quiet (Young and Sachs, 1979; Sinex and Geisler, 1983; Delgutte and Kiang, 1984b,c), with the few exploring background noise limited to vowels (Sachs et al., 1983; Delgutte and Kiang, 1984a; Geisler and Gamble, 1989; Geisler and Silkes, 1991; Silkes and Geisler, 1991). A few studies of NH coding have used connected speech sentences but not in noise (Kiang and Moxon, 1974; Delgutte et al., 1998; Young, 2008). Speech-coding studies of hearing-impaired (HI) animals have been limited to vowel-like stimuli in quiet (e.g., Miller et al., 1997; Schilling et al., 1998). Connected speech differs from synthetic speech tokens in its dynamic structure, with most information contained in its time-varying properties, like formant transitions and spectrotemporal modulations (Elliott and Theunissen, 2009; Elhilali, 2019). Therefore, accurate characterization of how hearing loss affects the neural coding of the time-varying properties in connected speech is important but missing from our current understanding of the underlying mechanisms for why listeners with hearing loss struggle to understand speech in noise.
Here, we used a connected speech sentence and collected AN-fiber spike trains from anesthetized chinchillas with either NH or NIHL. The sentence was also mixed with speech-shaped noise at perceptually relevant signal-to-noise ratios (SNRs). We analyzed dynamic formant transitions in vowel responses using new nonstationary analyses based on frequency demodulation of alternating-polarity peristimulus-time histograms (PSTHs; Parida et al., 2021), and analyzed onset and sustained responses to fricatives and stop consonants. Our results provide insights into the physiological suprathreshold mechanisms that do and do not contribute to degraded connected-speech coding in noise, specifically highlighting the important role distorted tonotopy plays in increased noise susceptibility following SNHL. These findings have important implications for better understanding individual differences in speech perception in people with SNHL.
Materials and Methods
All code and data accompanying this paper can be found at the following link: https://github.com/HeinzLabPurdue/DT_SpINcoding-paper.
Ethics statement
All procedures described below followed PHS-issued guidelines and were approved by Purdue University Animal Care and Use Committee (Protocol No. 1111000123).
Animal model
Young male chinchillas (less than one year old, weighing between 400 and 700 g) were used in all experiments. A focus of the present study was the effect of noise exposure on the spread of AN-fiber thresholds, which was expected to affect the relative audibility of vowels and consonants. Since accurate quantification of the AN-fiber threshold distribution requires pooling across animals, only male chinchillas were used here since sex differences in noise susceptibility (with females generally less susceptible than males) have been shown in chinchillas (Trevino et al., 2019). This experimental design was chosen to avoid a known sex-related factor that would confound our estimate of the AN-fiber threshold distribution following noise exposure (by adding an unrelated source of variability) if we were to pool across animals of different sexes. While this approach allowed us to address one of our specific hypotheses more accurately for the noise-exposure paradigm used here, it did not allow us address the effects of variation in the degree of noise-induced damage (including because of sex differences).
Animals were socially housed in groups of two until they underwent any anesthetized procedure, after which they recovered in their own cage. All animals received daily nonauditory environmental enrichment including dietary treats and chewing toys. The animal facility was maintained in a 12/12 h light/dark cycle.
Noise exposure and noninvasive physiological recordings
Detailed descriptions of the noise-exposure and noninvasive physiological recording procedures are provided elsewhere (Parida and Heinz, 2021), with brief descriptions provided here. A single discrete noise exposure [116 dB SPL (C-weighted), 2-h duration, octave-band noise centered at 500 Hz] using an enclosed subwoofer (Selenium 10PW3, Harman; placed ∼30 cm above the animal's head) was used to induce NIHL. Noise levels were calibrated at the entrance of the ear canal using a sound-level meter (886-2, Simpson). Animals were allowed at least two weeks to recover following noise exposure before any physiological recordings were made to minimize temporary threshold shift effects (Miller et al., 1963). Animals were anesthetized using xylazine (2–3 mg/kg, s.c.) and ketamine (30–40 mg/kg, i.p.) for data recordings and noise exposure. The rectal temperature of all anesthetized animals was maintained at 37°C using a feedback-controlled heating pad (50–7053F, Harvard Apparatus). Atipamezole (0.4–0.5 mg/kg, i.p.) was used to facilitate faster recovery from anesthesia following noninvasive experiments.
Auditory brainstem responses (ABRs) to tone pips and speech-evoked ABRs (sABRs) were recorded using three subdermal needle electrodes in a vertical montage (vertex to mastoid, differential mode, common ground near animals' nose; Henry et al., 2011; Zhong et al., 2014). ABRs (frequency range = 0.3–3 kHz) and sABRs (30 Hz to 1 kHz) were bandlimited using analog filters (ISO–80, World Precision Instruments; 2400A, Dagan). Distortion-product otoacoustic emission (DPOAEs) level was measured using an in-ear microphone (Etymotic ER–10B, Etymotic Research), following in-ear calibration for each animal. Calibrated sound was presented using ER2 loudspeakers (Etymotic Research) for electrophysiological recordings. Sound presentation and data recordings were controlled using a custom-integrated system of hardware (Tucker-Davis Technologies; National Instruments) and software (MATLAB, The MathWorks). For the HI animals, these physiological recordings (i.e., ABR, DPOAE, and sABR) were obtained before as well as after (range = 14–42 d) the noise exposure.
Surgical preparation and single-unit recordings
Detailed surgical-preparation and neurophysiological-recording procedures are described by Henry et al. (2016) and are only briefly described here. Anesthesia was induced with the same doses of xylazine/ketamine as used for ABRs and was maintained with sodium pentobarbital (∼7.5 mg/kg/h, i.p.). Animals were supplemented with lactated Ringer's solution during experiments (∼1 ml/h), which typically lasted 18–24 h. A posterior fossa approach was employed for the craniotomy in the right ear, following venting of the bulla with 30 cm of polyethylene tubing to maintain middle-ear pressure.
Spike trains were recorded from single AN fibers using glass micropipettes (impedance between 10 and 50 MΩ). Recordings were amplified (2400A, Dagan) and filtered from 0.03 to 6 kHz (3550, Krohn-Hite). Isolated spikes were identified using a time-amplitude window discriminator (BAK Electronics) and stored digitally with 10-µs resolution. Experiments were terminated if sudden shifts in FTC threshold and tuning were observed for two or more AN fibers, following which a lethal dose of Euthasol (2 ml, i.p.; Virbac AH) was administered. Single-unit data are from 15 NH (286 AN fibers) and 6 HI (119 AN fibers) chinchillas. For HI animals, AN experiments were done between 20 and 61 d following noise exposure for five animals and after 339 d for one animal (with no obvious difference in responses compared with other HI animals). These intervals following noise exposure are all expected to be long enough to avoid any temporary threshold shifts (Miller et al., 1963).
Stimuli
Screening (ABR and DPOAE) experiments
For ABRs, tone pips (5-ms duration, 1-ms on and off ramps) ranging from 0.5 to 8 kHz (octave spaced) were played at 0 dB SPL to 80 dB SPL in 10-dB steps. A total of 500 repetitions each of both positive and negative polarities were played for each condition. ABR threshold was calculated based on a cross-correlation analysis (Henry et al., 2011). Another intermediate (odd multiple of 5 dB) step was used near the preliminary ABR threshold estimate to fine-tune the final estimate. DPOAEs were measured for pairs of tones (
sABR experiments
A naturally spoken speech sentence (list #3, sentence #1) from the Danish speech intelligibility test (CLUE; Nielsen and Dau, 2009) was used for sABR experiments (Fig. 1). The sentence was: “Den gamle mand smilede stort” (IPA phonetic transcription: dɛn kamlə mæːːˀn smiːləð̠˕ˠə s). The overall level was set to 70 dB SPL for both groups by adjusting the rms of the entire utterance. Both positive and negative polarities (500 repetitions/polarity) of the stimulus were used to allow estimation of envelope and temporal fine structure components from the sABR. Both envelope (sABRENV = average of sABRs to opposite polarities) and temporal fine structure (half the difference between sABRs to opposite polarities) amplitudes were comparable between the two groups (also see Parida and Heinz, 2021). This similarity in amplitude at 70 dB SPL despite hearing loss likely results from a loss of cochlear compression because of outer-hair-cell dysfunction (Ruggero and Rich, 1991) as well as central-gain related changes in the midbrain (Auerbach et al., 2014), which substantially contribute to the brainstem responses (King et al., 2016).
The speech sentence used in this study. A, Time-domain signal (left y-axis) with orthographic representation and time-varying stimulus-level function (in dB SPL, right y-axis) computed over 40-ms moving windows with 50% overlap. B, Spectrogram of the sentences in A. Red lines represent the first three formant trajectories (i.e.,
sABRENV was used to analyze sABR onset responses. Only the onset response was considered to evaluate consonant coding because evoked responses like the sABR require synchronous activity (e.g., the onset) across populations of neurons. As sustained responses to fricatives lack a clear temporal pattern, they are rather weakly represented in the sABR (Skoe and Kraus, 2010). Onset strength was quantified as the peak-to-peak sABR amplitude in an onset window (see Fig. 8E). Note that sABRs consist of both transient ABR and sustained FFR components and are primarily of subcortical origin (Chandrasekaran and Kraus, 2010; Skoe and Kraus, 2010).
AN experiments
Monaural sound was delivered via a custom closed-field acoustic system. A dynamic loudspeaker (DT–48, Beyerdynamic) was connected to a hollow ear bar inserted into the right ear canal to deliver acoustic stimuli that were calibrated near the tympanic membrane. Calibration was done at the beginning of the experiment using a probe-tube microphone (ER–7C, Etymōtic) that was placed within a few millimeters of the tympanic membrane.
Single AN fibers were isolated by advancing the electrode while playing broadband noise (20–30 dB re 20 μPa/√Hz; higher as needed for noise-exposed animals) as the search stimulus. Monopolar action potential waveform shape was used to confirm that recordings were from AN-fiber axons (as opposed to bipolar shapes exhibited by cell bodies in the cochlear nucleus). Before collecting spike-train data in response to speech and/or noise, all AN fibers were characterized as follows. An automated algorithm was used to estimate the FTC (Chintanpalli and Heinz, 2007). FTCs were smoothed by a three-point triangular window before estimating parameters such as characteristic-frequency (CF), threshold, 10-dB quality factor (
For AN experiments, the same speech sentence was used as for the sABRs. The overall level was set to 65 dB SPL for NH chinchillas and 80 dB SPL for HI chinchillas (by normalizing the rms over the entire utterance). The spectrally flat gain of 15 dB was roughly based on the half-gain rule (Lybarger, 1978), which has been used in AN studies of NIHL (Schilling et al., 1998), and was confirmed to restore driven rates (i.e., total number of spikes within a stimulus window divided by window duration) for voiced portions of the sentence in preliminary experiments. Speech was also presented after mixing with frozen steady speech-shaped noise at three different perceptually relevant SNRs (−10, −5, and 0 dB). Frozen noise was used instead of running noise to allow for the same population-level analyses for both speech and noise. The speech-shaped noise was spectrally matched to 10 sentences spoken by the same speaker using autoregressive modeling. The two polarities of stimuli were presented in an interleaved manner (25 trials per polarity for most AN fibers). It should be noted that, in general, our sABR results and AN-fiber results do not provide an exact comparison as these data were collected at somewhat different stimulus levels. However, as shown later, results regarding selective audiometric deficits for low-intensity (e.g., consonants) but not for high-intensity (e.g., vowels) sounds still hold for both datasets despite these stimulus-level differences.
Spike-train temporal-precision analysis
The temporal precision of spike trains was measured using the Victor–Purpura (VP) distance metric (Victor and Purpura, 1996), which quantifies the dissimilarity between two sets of spike trains. If a fiber responds precisely and consistently across different stimulus trials, the VP distance between the two spike trains will be small. The VP distance between spike trains from two different stimulus trials, X (total spikes =
For a given temporal resolution
To compute the
Spectral analyses of alternating-polarity PSTHs
Temporal and spectral analyses of spike-train responses were based on alternating-polarity PSTH analyses (Parida et al., 2021), with a PSTH bin width of 200 µs in all cases. By alternating-polarity PSTHs, we refer to the collection of PSTHs that can be derived from PSTHs corresponding to positive and negative polarities, as defined in Parida et al. (2021). This collection includes the compound (Goblick and Pfeiffer, 1969) as well as the sum PSTH, which were used here to study TFS and onset coding, respectively. To emphasize temporal-fine-structure responses for voiced speech coding in Figures 5–7, the difference PSTH,
Low-frequency fractional power was quantified using a low-pass spectral window (cutoff = 400 Hz, 10th order).
Quantifying formant coding strength using the harmonicgram
Formant coding is traditionally quantified using the Fourier spectrum of the period histogram or the difference PSTH (Young and Sachs, 1979; Sinex and Geisler, 1983). These analyses provide sufficient spectrotemporal resolution for analyzing responses to stationary speech tokens like those used in many previous studies. To quantify power along dynamic formant trajectories in a nonstationary speech stimulus, the harmonicgram can be used, as it offers higher spectrotemporal resolution than the spectrogram (Parida et al., 2021).
Briefly, the harmonicgram was constructed as follows. The fundamental frequency contour (
To evaluate the coding strength of a formant,
For a given SNR condition, fractional power metrics were computed for responses to noisy speech [
Driven-rate analysis to evaluate consonant coding in quiet
Coding of a fricative (/s/) and two stop consonants (/d/ and /g/) was analyzed based on driven-rate analysis within different stimulus windows. For the fricative (see Fig. 8), two metrics were used: onset and sustained driven rates. Onset driven rate was estimated as the weighted rate over a window that had a value of 1 for the first 10 ms of the fricative and linearly dropped to 0 over the next 15 ms (i.e., total window duration = 25 ms). Sustained driven rate was estimated using a trapezoidal window that had a central duration of 30 ms and linearly dropped to 0 at the beginning and end of the fricative. These windows were based on a previous study with modifications to account for differences in fricative duration (previous study: 200 ms, current study: ∼100 ms; Delgutte and Kiang, 1984c).
Responses to two stop consonants, /d/ and /g/, were also analyzed (see Fig. 9). A stop consonant consists of a closure or “stop,” a release burst, and a voiced segment that includes formant transitions. Here, we analyzed responses only to the release burst. The driven rate was estimated over the burst period, which was ∼25 ms for both stop consonants.
Coding-fidelity metrics for fricatives in noise based on correlation and driven-rate analyses
To evaluate the fidelity of the neural representation of the fricative /s/ in noise, correlations of the slowly varying envelope responses were quantified between speech-alone and noisy-speech conditions during the fricative segment. To confirm that these correlation values were not spurious correlations between speech and noise, the speech-alone and noisy-speech correlations were corrected by subtracting the correlation between response envelopes of speech-alone and noise-alone conditions for the same fricative window.
Response envelopes were obtained from single-polarity PSTHs using a low-pass filter (fourth order, cutoff = 32 Hz). Let the response envelope to speech-alone, noisy-speech, and noise-alone be denoted by
Consonant coding in noise was also evaluated using a driven-rate analysis. Driven rate for each SNR condition was estimated within the whole 100-ms fricative window for noisy speech (
Plosive coding in noise was not considered because plosives were completely masked by the noise, even at the highest SNR (0 dB) used here. Instead, inferences about plosive coding in noise based on formant transitions (as quantified in Fig. 7) are considered in Discussion.
Statistical analyses
Statistical analyses of group effects (i.e., hearing status) were performed in R (version 4.0.3) using linear mixed-effects models (lme4 package; Bates et al., 2014) and the car package (Fox and Weisberg, 2018). Both p-value and F-value were estimated using Type II Wald F tests (Kenward and Roger, 1997). Note that these p-value and F-value correspond to partial regression coefficients of the predictors (i.e., they represent the significance of each predictor's contribution after partialling out all other predictors); this is important because these predictors (e.g.,
The effects of various physiological mechanistic factors on speech-coding fidelity in Figure 6 (tonotopic coding), Figure 7 (formant coding in noise), and Figure 10 (fricative coding in noise) were also evaluated by including the following fixed-effect predictors: log-transformed CF, FTC threshold (in dB SPL), TTR (in dB), and local
Results
Chinchilla model of NIHL captures reduced audibility and degraded frequency selectivity
Mild-to-moderate hearing loss is the most prevalent type of hearing loss (Goman and Lin, 2016). To investigate the neural coding deficits these patients likely experience, we used an established noise-exposure protocol to induce mild-to-moderate SNHL in a chinchilla model (Kale and Heinz, 2010). Thresholds for ABRs to tone bursts increased by ∼20 dB following NIHL (Fig. 2A, statistics in legends). Similarly, DPOAE levels decreased by ∼15 dB (Fig. 2B), indicating the presence of substantial outer-hair-cell damage. These electrophysiological changes indicate a mild-to-moderate permanent hearing-loss model (Clark, 1981).
Chinchilla model of NIHL captures reduced audibility, reduced frequency selectivity, and expanded AN-fiber threshold distribution. A, ABR thresholds for HI chinchillas (red) were elevated by ∼20 dB relative to NH (blue). Thin lines with symbols represent individual animals (n = 9/6, NH/HI); thick lines represent group averages (main effect of group, F = 276.4, p < 2.2 × 10−16). B, DPOAE levels were reduced by ∼15 dB (F = 943.6, p < 2 × 10−16). C, AN-fiber thresholds were elevated for the HI group (F = 646.36, p < 2 × 10−16; n = 286/119, NH/HI). Symbols represent individual AN fibers; solid and dashed lines represent 50th and 10th percentiles, respectively, computed within octave bands for which there were greater than or equal to seven fibers in the group. D,
Single-fiber thresholds were elevated by ∼25–35 dB for the most-sensitive AN fibers in the population (Fig. 2C,E). This threshold shift was accompanied by substantial near-CF broadening of tuning as quantified by reductions in local
NIHL expands AN-fiber threshold distribution: audiometric threshold shift underestimates average fiber-audibility deficit
Although audiometric thresholds are probably determined by the most-sensitive AN fibers in the population, suprathreshold speech perception in complex environments likely involves many fibers spanning a range of thresholds and not just the most sensitive ones (Young and Barta, 1986; Bharadwaj et al., 2014; but see Carney et al., 2016). Therefore, changes in AN-fiber threshold distribution could affect the coding of complex sounds. A traditional hypothesis, which assumes that following NIHL the high-threshold fibers function normally and the most-sensitive (low-threshold) fibers have elevated thresholds, predicts a compressed AN-threshold distribution (Moore et al., 1985; Ngan and May, 2001). Alternatively, across-fiber heterogeneity in susceptibility to acoustic trauma could lead to an expanded AN-fiber-threshold distribution following NIHL (Heinz et al., 2005), with a greater shift in the distribution mean than the shift in distribution minimum. If acoustic gain were based on the most-sensitive fibers (i.e., using the audiogram to compensate for hearing loss), then an expanded threshold distribution would lead to a selective “audibility” deficit for low-intensity stimuli (e.g., stop consonants) for many AN fibers, whereas a compressed distribution would not cause such a deficit (Fig. 3).
A schematic illustration shows that an expanded AN-fiber threshold distribution can lead to selective deficits in audibility for low-intensity stimuli (e.g., stop consonants). Thick lines represent threshold histograms for populations of 30,000 AN fibers under NH (blue) and HI (red) conditions. Top and bottom panels correspond to two hypothesized effects of NIHL on AN-fiber threshold distributions: compressed and expanded, respectively. These schematized data represent a 30-dB hearing loss (in all panels), as indicated by a 30-dB shift in the HI distribution minimum threshold. Triangles represent the level of a low-intensity stimulus (left) and high-intensity stimulus (right) for the NH and HI groups. For the HI group, a 30-dB gain was applied to compensate for hearing loss. Shaded regions represent the subpopulation in each group that is driven by the stimulus (i.e., “audible”). While there was no “audibility” deficit for the group with a compressed distribution for either stimulus, a substantial portion of the HI group with an expanded distribution was not driven by the low-intensity stimulus (unshaded region).
To evaluate these hypotheses, we estimated the 10th and 50th percentiles for our NH and HI threshold distributions in octave-wide CF bands (Fig. 2E). Our results showed a greater shift for the 50th percentiles than for the 10th percentiles following NIHL for all bands. We reanalyzed a more extensive previously published data set from cats (Heinz et al., 2005), which also showed an expansion (not compression) in AN-fiber threshold distribution following NIHL (Fig. 2F). These consistent results suggest that any audiometric indicator (i.e., based on the most-sensitive AN fibers) will underestimate audibility deficits for many AN fibers across the population.
Temporal-coding precision for connected speech in quiet was not degraded by NIHL
To test whether there was any degradation in the ability of AN fibers to precisely encode temporal information in response to connected speech, trial-to-trial precision was quantified using the VP distance metric (Victor and Purpura, 1996). Temporal precision is inversely related to VP distance and was computed for a range of temporal resolutions using the time-shift cost parameter (q) of the VP analysis to span the syllabic, voice-pitch, and formant time scales of speech (Rosen, 1992). The temporal precision of connected speech responses was not degraded following NIHL for any of the temporal resolutions considered (Fig. 4). In fact, there was a small but significant increase in precision for all three time-scale conditions. This increase in precision may arise because of overrepresentation of lower-frequency information associated with distorted tonotopy, where synchrony is stronger than at higher frequencies. Overall, these data provide no evidence for a degradation in the fundamental ability of AN fibers to precisely phase-lock to the temporal features in connected speech following NIHL.
Temporal-coding precision for connected speech in quiet was not degraded by NIHL. A, Across-trial precision is plotted versus discharge rate in response to a connected speech sentence in quiet at conversational levels. A time-shift cost (q) corresponding to 250-ms time scale was used to represent syllabic rate. Symbols represent individual AN fibers across all CFs. B, 10-ms time scale to emphasize voice-pitch coding. C, 0.67-ms time scale to emphasize speech-formant coding. A small but significant increase in temporal precision is observed for HI responses (group, F = 30.8, p = 5.4 × 10−8), which likely derives from increased responses to lower-frequency energy because of distorted tonotopy (CF × group interaction, F = 5.0, p = 0.025).
Changes in AN-fiber tuning following NIHL distort the normal tonotopic representation of naturally spoken vowels
Normal hearing (NH) AN fibers are characterized by high sensitivity (low threshold) and spectral specificity (sharp tuning), which allows for selective responses to stimulus energy near their CF. These NH properties produce tonotopic responses to complex sounds, as demonstrated previously for synthetic steady-state vowels (Young and Sachs, 1979; Miller et al., 1997). Here, we evaluated the effects of NIHL on natural-vowel responses (Figs. 5, 6) by examining spectral estimates [
Changes in AN-fiber tuning following NIHL distort the normal tonotopic representation of naturally spoken vowels. A, Alternating-polarity PSTHs in response to a “quasi-stationary” segment from a natural sentence in quiet (stimulus: dark gray). Darker (lighter) color shades represent PSTHs for positive (negative, reflected across x-axis for display) stimulus polarity. Stimulus scaled arbitrarily. B, Spectrum of the stimulus segment (gray, left y-axis) with fundamental frequency (F0) and first and second formants (
Distorted tonotopy enhances below-CF coding of voiced speech at the expense of near-CF representations, even when audibility is restored. A, Driven rates (DRs; solid trend lines) for NH and HI AN fibers were comparable in voiced portions of the connected speech sentence in quiet (group, F = 0.6, p = 0.44). SRs (dashed lines) were reduced for HI (F = 44.1, p = 1.5 × 10−10). Triangular-weighted trend lines (here and in the following figures) were computed for 2/3-octave averages. B, Fractional response power near CF [in dB re: total power based on the difference-PSTH spectrum, D(f)] was significantly reduced for HI primarily at lower (e.g., <3 kHz) CFs (group, F = 84.14, p < 2 × 10−16; group × CF interaction, F = 13.8, p = 2.4 × 10−4). C, Fractional response power in a LF band (<400 Hz) was enhanced for HI (group, F = 10.5, p = 1.3 × 10−3; group × CF interaction, F = 23.0, p = 2.5 × 10−6). D, Ratio of power near CF to power in the low-frequency band was significantly reduced for HI fibers (group, F = 36.86, p = 3.9 × 10−9; group × CF interaction, F = 26.7, p = 4.5 × 10−7). CF range: 0.6–5 kHz in all panels.
Distorted tonotopy enhances below-CF coding of voiced speech at the expense of near-CF representations, even when audibility is restored
To quantify the effects of distorted tonotopy on voiced-speech coding at the population level, fractional power near-CF and at low frequency (LF) was quantified as metrics related to tonotopic coding (Fig. 6). Fractional power was used to minimize the effects of overall power and rate differences, if any. Despite the expected reduction in SR (Liberman and Dodds, 1984b), driven rates during voiced segments were similar for the two groups likely because most AN fibers in both groups were saturated at the stimulus levels used (Fig. 6A). Therefore, there was no deficit in the total spike counts (i.e., driven rates) in response to voiced speech in our data.
Near-CF power was significantly lower for HI fibers, particularly at lower (<3 kHz) CFs (Fig. 6B). To quantify the susceptibility of fibers with higher CFs (>0.6 kHz) to very LF stimulus energy, the power in
What physiological factors contribute to this distortion in tonotopic coding? To address this question, a mechanistic linear mixed-effects model was constructed with relative near-CF-to-LF power (Fig. 6D) as the response variable, and threshold, local
Temporal-place formant representation was more susceptible to background noise following NIHL
The effects of distorted tonotopy on formant coding were evaluated for connected speech in quiet and in noise using harmonicgram analyses (Parida et al., 2021). Coding strength for the first three formants was quantified as the
Temporal-place formant representation was more susceptible to background noise following NIHL. A–C, Temporal-coding strength for the first formant (
Noise had a strong detrimental effect on the already degraded (reduced strength and nontonotopic) representations of
To identify the contribution of various physiological factors to these formant-coding degradations, mixed-effects models were constructed for each formant fractional power response with TTR,
Overall, these results show a degradation of higher-formant (
Unlike voiced segments, driven rates for the fricative /s/ were not restored in HI fibers despite compensating for overall audibility loss
Fricatives constitute a substantial portion of phoneme confusions among listeners with SNHL (Bilger and Wang, 1976; Van de Grift Turek et al., 1980; Dubno et al., 1982). To explore potential neural bases of these deficits, neural responses to a fricative (/s/) were analyzed. Previous studies have reported robust fricative coding by NH AN fibers in terms of onset and sustained responses (Delgutte and Kiang, 1984c). NH fibers with higher CFs (i.e., near frequencies where /s/ has strong energy; Fig. 8A) showed a sharp onset response, followed by a robust sustained response (Fig. 8B). In contrast, HI fibers showed a substantial reduction in onset response (Fig. 8B), with less of an effect on the sustained response. This reduction in onset response contrasts with previous studies that reported increased onset responses to tones following NIHL; however, those results were for equal sensation level (Scheidt et al., 2010). In the high-frequency region (3–8 kHz) where the fricative had significant energy, driven onset rates were significantly lower for the HI population (Fig. 8C), whereas sustained rates were slightly, but significantly, reduced for the HI group (Fig. 8D). While both onset and sustained neural representations for both groups resembled the fricative spectrum, reduced rates for the HI group may lead to weaker salience particularly because saturation rates (data not shown) and driven rates for voiced speech were similar for the two groups.
Unlike voiced segments, driven rates for the fricative /s/ were not restored in HI fibers despite compensating for overall audibility loss. A, Spectrum of /s/ (gray) from the connected sentence and exemplar tuning curves. B, Time-domain waveforms of /s/ (gray) and alternating-polarity PSTHs of AN fibers shown in A. Same format and data-analysis parameters as Figure 5A. Cyan and magenta temporal windows denote the masks used to compute onset and sustained response rates. While these example AN fibers had comparable sustained rates, the HI onset response was substantially degraded. C, Driven onset rates for the HI population were significantly reduced compared with NH for all CFs. Same format as Figure 6A (group, F = 90.5, p < 2 × 10−16; group × CF interaction, F = 10−3, p = 0.98). D, Driven sustained rates were also reduced for the HI group for all CFs, including in the high-CF (>2.5 kHz) region where /s/ had substantial energy (F = 4.4, p = 0.037). SRs were subtracted from driven onset and sustained rates in panels C, D for display. E, Exemplar sABR data (sABRENV) from a NH and a HI chinchilla demonstrate a reduced onset response for /s/ following NIHL. Cyan window (20-ms) indicates the onset window in which peak-to-peak amplitude was computed. F, Distributions of peak-to-peak onset amplitudes show a significant reduction in sABR onset response for HI chinchillas (F = 5.8, p = 0.039). Asterisk denotes p<0.05.
To evaluate the noninvasive correlates of these single-fiber degradations in consonant coding, sABRs were recorded in response to the speech stimulus from the same cohort of animals. Representative sABRs from two animals (one from each group) replicate these fricative onset-response degradations (∼0.75 s in Fig. 8E). While the NH sABR had a sharp onset, the HI sABR lacked any clear onset response. In contrast, responses during voiced speech were comparable for the NH and HI examples (e.g., ∼0.6 s), mirroring the AN result that driven rates to voiced speech were similar for the two groups (Fig. 6A). Thus, despite comparable responses to voiced speech, there was a significant reduction in fricative onset response in the HI group (Fig. 8F).
Driven rates for stop consonants (/d/ and /g/) were also not restored despite overall audibility compensation
Stop consonants are among the most confused phonemes for listeners with SNHL (Bilger and Wang, 1976; Van de Grift Turek et al., 1980). The neural representations of two stop consonants (/d/ and /g/) present in the speech stimulus (Fig. 9A) were also evaluated using driven-rate analysis during the release burst. In response to /d/ and /g/, the NH AN fiber showed a strong onset response, followed by sustained activity that was well-above spontaneous activity (Fig. 9B,C). In contrast, for the HI AN fiber, driven activity was substantially reduced. Population results showed that driven rates during the burst portion for /d/ and /g/ were significantly reduced for the HI population relative to the NH population (Fig. 9D,E).
Driven rates for low-intensity stop consonants (/d/ and /g/) were also not restored despite overall audibility compensation. A, Spectra for stop consonants /d/ (dark gray) and /g/ (light gray) from the connected sentence in quiet. FTCs of representative NH and HI AN fibers. B, Alternating-polarity PSTHs for the two fibers in A in response to /d/. Same format and analysis parameters as Figure 8B. Driven burst rate was reduced for the HI fiber. Burst window (green) was 25 ms. C, Same layout but for /g/, which shows the same general effects as /d/. Burst window (green) was 25 ms. D, Driven rates in response to /d/ burst were significantly reduced in the HI population compared with the NH population (F = 64.6, p = 1.3 × 10−14). Same format as Figure 8C. E, Same layout and results as D but for /g/ (F = 34.5, p = 1.8 × 10−9). SRs were subtracted from driven onset and sustained rates in panels D, E for display. F, Difference in the power spectral density of /d/ and /g/. Dashed gray = zero line. G, Difference in the driven rates during the release burst of /d/ and /g/ (i.e., rates in D minus rates in E) for both groups. Dashed gray = zero line. H, Distributions of sABR peak-to-peak onset amplitudes in response to /d/ for chinchillas in both groups show a significant reduction for the HI group (F = 5.97, p = 0.037). I, Same layout as H but for /g/. The observed reduction in sABR onset amplitudes was not significant (F = 0.58, p = 0.47). CF range for statistical analyses: 0.5–8 kHz for /d/ and 0.5–3 kHz for /g/. Asterisk denotes p<0.05. ns, not significant.
A reduction in driven rates by itself does not necessarily mean poorer coding of the stimulus spectrum for the two phonemes. Therefore, we investigated how different the rate-CF profiles for /d/ and /g/ were for each group, and whether this difference followed the spectral properties of the stimuli. We estimated the difference in rates in response to /d/ and /g/ for individual AN fibers (Fig. 9G). While the difference profile for the NH group closely matched the difference in the power spectra of the two stimuli (Fig. 9F), the difference profile for the HI group was relatively less salient (i.e., closer to the zero line). Furthermore, the steeply changing spectral edge in the spectral difference (near 2.5 kHz in Fig. 9F) was well represented in the NH profile at the appropriate CF region. In contrast, it was shifted to higher CFs in the HI representation, consistent with distorted tonotopy effects.
Next, we evaluated the representation of the two stop consonants in sABRs. The sABR onset was significantly reduced for/d/following NIHL (Fig. 9H), consistent with the universal reduction in burst rate across the whole CF range for HI fibers (Fig. 9D). The sABR onset response to /g/, however, was only slightly reduced (not significant) for the HI group (Fig. 9I). Overall, these sABR data (and those from Fig. 8) suggest that degraded consonant representations persist even when vowel audibility is restored, despite central-gain related changes that can occur in the midbrain (Auerbach et al., 2014).
Changes in tuning following NIHL eliminate the noise-resilient benefits of AN fibers with lower SR for fricative coding
As previously described, listeners with SNHL often struggle in noisy environments in identifying consonants, more so than for vowels. Here, we investigated the effect of background noise on the coding of the fricative /s/, which elicited robust sustained activity for both groups in the quiet condition (Fig. 8D). When the sentence is mixed with noise at a particular SNR, even negative SNRs, the resultant signal often has specific spectrotemporal regions with a favorable SNR (e.g., high-frequency region for /s/; Fig. 10A). These high-SNR regions likely mediate robust speech perception in noise (Cooke, 2006). In our data, NH AN fibers that were narrowly tuned near this high-SNR region responded selectively to the fricative energy (Fig. 10B). In contrast, HI AN fibers showed reduced TTR (Fig. 10A, and 5D). As a result, HI fibers tuned to higher frequencies responded poorly to fricative energy and strongly to LF energy in either the speech (e.g., voiced segments) or noise (Fig. 10C).
Changes in tuning following NIHL eliminate the noise-resilient benefits of AN fibers with lower SR for fricative coding. A, Spectra for /s/ from the connected-speech sentence and the concurrent speech-shaped noise (N) segment for 0 dB overall SNR. Although overall SNR was 0 dB, a local high-SNR (10–15 dB) region occurs at high frequencies (>3 kHz). FTCs of two example AN fibers that schematize the increased deleterious effect of speech-shaped background noise following NIHL, especially compared with pink noise (pink line). B, PSTHs (gray) in response to speech-alone (S), noisy-speech (SN), and noise-alone for the NH fiber (SR = 0.2/s) in A. Thick black curves represent response envelopes. Dashed magenta lines indicate temporal region in the stimulus (green) containing /s/. C, Same layout as B but for the HI fiber (SR = 1.1/s). D–F, Speech-in-noise coding fidelity for /s/ at perceptually important SNRs, as quantified by the corrected correlation between responses to S and SN (minimum value set to 0.001 for display). Squares and asterisks correspond to AN fibers with low/medium SR (<18/s) and high SR (>18/s), respectively. For NH, AN fibers with low/medium SR show better coding in noise than high-SR fibers; however, the opposite is true following NIHL because the noise resilience of low/medium SR fibers was lost resulting in overall degraded fricative fidelity (SR, F = 16.5, p = 6.2 × 10−5; group, F = 3.3, p = 0.07; group × SR interaction, F = 18.8, p = 2.0 × 10−5).
Fricative-coding fidelity was quantified by the
When fricative coding in noise was evaluated using the difference in driven rates for noisy-speech and noise-alone as the metric, similar results were observed (data not shown; SR, F = 19.4, p = 1.5 × 10−5; group, F = 4.6, p = 0.03; group × SR interaction, F = 11.9, p = 6.5 × 10−4). Note that these deficits in rate-place coding for the HI group do not simply reflect reduced rates in response to noisy-speech and noise-alone because these driven rates were actually slightly greater for the HI group (data not shown; noisy speech: group, F = 9.7, p = 0.002; SR, F = 69.5, p = 2.8 × 10−15; noise alone: group, F = 15.5, p = 10−4; SR, F = 91.4, p < 2.2 × 10−16), despite reduced SR (Fig. 6A) and reduced rate responses to the fricative alone (Fig. 8C,D). Rather, these deficits are consistent with increased masking by noise for HI high-CF AN fibers (including low SR fibers) because of distorted tonotopy.
To identify the contribution of various physiological factors to fricative coding in noise, a mixed-effects model was constructed with
Discussion
A common reaction from patients with SNHL is “I can hear you, but I cannot understand you,” especially in background noise, for both hearing-aid users and nonusers (Lesica, 2018). Several neurophysiological mechanisms are hypothesized to underlie these suprathreshold deficits. Here, we describe the first data, to our knowledge, characterizing SNHL effects on AN-fiber responses to a connected speech sentence in noise. These data elucidate physiological mechanisms that do and do not contribute to deficits in the peripheral coding of speech following SNHL. In particular, these data highlight the prominent role distorted tonotopy plays in degrading the coding and increasing the noise susceptibility of vowels and consonants.
Commonly hypothesized suprathreshold deficits were not the major factors degrading neural coding of connected speech following NIHL
Two suprathreshold deficits commonly speculated to affect speech perception in noise are reduced sharpness of cochlear frequency tuning (local
We found no degradation in temporal-coding precision of AN-fiber responses to connected speech (Fig. 4), consistent with the majority of neurophysiological studies that report intact phase-locking to pure tones in quiet following SNHL (Harrison and Evans, 1979; Miller et al., 1997). Thus, these connected-speech-coding data, along with most previous studies using laboratory stimuli, provide no evidence for a reduction in the fundamental ability of AN fibers to encode rapid acoustic fluctuations.
That being said, changes in phase-locking strength to complex sounds can be observed following SNHL [e.g., decreased balance of fine-structure-to-envelope coding resulting from enhanced envelope coding (Kale and Heinz, 2010), or decreased tonal phase-locking in noise because of broadened tuning (Henry and Heinz, 2012)]. Although these changes in the temporal-coding strength of complex signals may be perceptually relevant, these temporal-coding degradations occur because of factors other than a reduction in the fundamental ability of AN fibers to follow rapid fluctuations. It is also possible that the effects of synaptopathy and/or inner hair cell damage, i.e., fewer neurons coding temporal information (Kujawa and Liberman, 2009), could lead to poorer temporal information at the perceptual level that would not be apparent in peripheral responses.
Distorted tonotopy was the major factor in degraded coding and increased noise susceptibility for both vowels and consonants following NIHL
The primary factor affecting degraded coding of connected speech in noise was the loss of tonotopy, a hallmark of normal cochlear processing. For voiced-speech segments, where AN-fiber driven rates were comparable for the two groups, the spectral content in the responses differed substantially. TTR was the dominant physiological factor explaining the increased representation of LF energy at the expense of near-CF energy for voiced speech (Fig. 6). Our metric for dynamic formant coding required a technical advance in neural analyses using frequency demodulation applied to alternating-polarity PSTHs (Parida et al., 2021), because standard Fourier analysis blurs the coding of dynamic formants in speech. These single-fiber results are also consistent with our recent sABRs to connected speech, which showed overrepresented LF energy in HI sABRs (Parida and Heinz, 2021).
Another perceptually relevant effect of distorted tonotopy was the increased susceptibility to background noise following NIHL, which was seen for both voiced-speech (Fig. 7) and fricative coding (Fig. 10). Our data on fricative coding in speech-shaped noise provide insight into the real-world significance of this effect. The fricative /s/, which has primarily high-frequency energy, normally has a better-than-average SNR because of the steep spectral decay of speech-shaped noise and the tonotopic coding in the NH cochlea. Without this normal tonotopicity following SNHL, the SNR in AN fibers with CFs within the spectral band of the fricative is greatly reduced because of the substantial LF noise now driving the neural response. Based on our previous sABR results illustrating the dependence of distorted tonotopy degree on spectral shape (Parida and Heinz, 2021), this deleterious effect on fricative coding is expected to be greater for speech-shaped noise than for white or even pink background noise. While many environmental sounds have a pink spectrum (3 dB/octave drop in spectrum level), long-term speech spectrum has a downward spectral slope that is two to three times steeper (i.e., 6 to 9 dB/octave drop) than for pink noise (Fig. 10A; French and Steinberg, 1947; Byrne et al., 1994). Thus, these results have important implications for listening amid multiple talkers, which is a condition of great difficulty for many listeners with SNHL (Festen and Plomp, 1990; Le Prell and Clavier, 2017).
It should be noted that the relatively flat hearing-loss pattern observed here is not typical of the clinical population, which usually has a high-frequency sloping component in addition to a flat hearing loss (Dubno et al., 2013; Parthasarathy et al., 2020). Therefore, these results may underestimate real-world deficits. On the other hand, the head-related transfer function for humans can boost the level at higher frequencies (>1 kHz) relative to lower frequencies, and may therefore partially offset the effects seen here (Moore et al., 2008). Finally, our data support hearing-aid algorithms that prescribe more gain at high frequencies because they counteract the effects of distorted tonotopy (Moore et al., 2010).
Noise-resistant ability of low/medium SR fibers for fricative coding was substantially degraded following NIHL
The statistical analyses of our fricative-in-noise data (Fig. 10) showed a significant effect of SR, and an interaction between group and SR. The main SR effect was not surprising given previous NH studies showing the superior coding-in-noise of low-SR fibers (Costalupes et al., 1984; Young and Barta, 1986), including for consonant-vowel tokens (Silkes and Geisler, 1991). The new insight provided by our data relates to the interaction between group and SR, which arises because NIHL caused a larger deficit in speech-coding fidelity in the low/medium-SR fibers than in the normally poorer high-SR fibers. Note that raising the fricative level may partly mitigate the deficits seen for low-SR fibers; however, it would not help with two key factors: (1) tuning sharpness, which is important for noise-resistant representation; and (2) increased forward masking by preceding low-frequency energy in both speech and noise (Glasberg et al., 1987; Fogerty et al., 2017). Overall, these data demonstrate that the normally robust low/medium-SR fibers can be preferentially degraded by hearing loss, similar to recent studies on cochlear synaptopathy with age and moderate noise exposure where low/medium-SR fibers are preferentially lost (Fernandez et al., 2015, 2020; Wu et al., 2019).
Audibility restoration is not the same for consonants and vowels because of expanded AN-fiber threshold distribution following NIHL
Psychoacoustic studies suggest that degraded fricative perception for listeners with SNHL depends on both a reduced ability to use formant transitions and reduced audibility of the frication cue (Zeng and Turner, 1990). As discussed for vowels, dynamic representations of transitions for higher formants (
The “audibility” of consonants, as indexed by driven rates (Figs. 8, 9), was often not restored, in contrast to vowels (Fig. 6A). Such divergent audiometric effects are consistent with psychoacoustic studies reporting similar audibility differences for consonants and vowels even after compensating for overall audibility (Phatak and Grant, 2014), and likely contribute to consonant confusions, which are common for both fricatives and stop consonants (Bilger and Wang, 1976).
Our data suggest that the expanded AN-fiber threshold distribution following NIHL (Fig. 2E,F) may contribute to differences in the audibility of consonants and vowels for amplified speech. Although across-animal variability in susceptibility to noise exposure could have partly contributed to this expansion, similarly expanded threshold distributions are observed following NIHL in individual cats (Heinz et al., 2005; their Fig. 9B). Thus, compensating for audiometric loss only guarantees restoration of audibility for the most sensitive AN fibers. Because elevation in the 10th percentile thresholds underestimated elevation in the 50th percentile, lower-amplitude consonants activated a smaller percentage of the population following NIHL, and thus were not as “audible” despite restoration of vowel audibility. It should be noted that the reduction in driven rates for consonants likely depends on the choice of sound levels used here, and these neural representations can be improved by using louder levels. For example, near-normal AN-fiber onset rates are seen following NIHL for tones at equal sensation levels (Scheidt et al., 2010). These results are consistent with psychoacoustic studies that report improved speech recognition by listeners with HI when consonants are selectively amplified (Kennedy et al., 1998; Shobha et al., 2008) and highlight the importance of compressive gain control in hearing aids (Jenstad et al., 2000; Souza, 2016).
Variations in the degree of distorted tonotopy across etiologies may contribute to individual differences in real-world speech perception
There is general consensus among researchers regarding the inadequacies of the audiogram in accounting for real-life perceptual deficits. Individual variability in speech perception, even among listeners with similar audiograms, likely stems from variations in suprathreshold deficits. Here, we demonstrate that distorted tonotopy is a significant factor in degraded coding and increased noise susceptibility of connected speech following NIHL. Because the degree of distorted tonotopy appears to vary across different etiologies (e.g., NIHL, age-related) even for similar hearing-loss degrees (Henry et al., 2019), it is likely that this variation contributes to individual differences in speech perception. The development of noninvasive diagnostics to identify distorted tonotopy (e.g., Parida and Heinz, 2021) is critical for determining the extent and perceptual relevance of this important physiological mechanism affecting the neural coding of speech.
Footnotes
This work was supported by the International Project Grant G72 from Action on Hearing Loss (United Kingdom) and by the National Institutes of Health Grant R01-DC009838. We thank Kenneth Henry and Hari Bharadwaj for their valuable feedback on an earlier version of this manuscript, Josh Alexander for stimulating discussions on human speech perception, and Suyash Joshi for help with phonetic transcription of the speech sentence.
The authors declare no competing financial interests.
- Correspondence should be addressed to Michael G. Heinz at mheinz{at}purdue.edu