Abstract
The neural mechanisms that support the robust processing of acoustic signals in the presence of background noise in the auditory system remain largely unresolved. Psychophysical experiments have shown that signal detection is influenced by the signal-to-noise ratio (SNR) and the overall stimulus level, but this relationship has not been fully characterized. We evaluated the neural representation of frequency in rat primary auditory cortex by constructing tonal frequency response areas (FRAs) in primary auditory cortex for different SNRs, tone levels, and noise levels. We show that response strength and selectivity for frequency and sound level depend on interactions between SNRs and tone levels. At low SNRs, jointly increasing the tone and noise levels reduced firing rates and narrowed FRA bandwidths; at higher SNRs, however, increasing the tone and noise levels increased firing rates and expanded bandwidths, as is usually seen for FRAs obtained without background noise. These changes in frequency and intensity tuning decreased tone level and tone frequency discriminability at low SNRs. By contrast, neither response onset latencies nor noise-driven steady-state firing rates meaningfully interacted with SNRs or overall sound levels. Speech detection performance in humans was also shown to depend on the interaction between overall sound level and SNR. Together, these results indicate that signal processing difficulties imposed by high noise levels are quite general and suggest that the neurophysiological changes we see for simple sounds generalize to more complex stimuli.
SIGNIFICANCE STATEMENT Effective processing of sounds in background noise is an important feature of the mammalian auditory system and a necessary feature for successful hearing in many listening conditions. Even mild hearing loss strongly affects this ability in humans, seriously degrading the ability to communicate. The mechanisms involved in achieving high performance in background noise are not well understood. We investigated the effects of SNR and overall stimulus level on the frequency tuning of neurons in rat primary auditory cortex. We found that the effects of noise on frequency selectivity are not determined solely by the SNR but depend also on the levels of the foreground tones and background noise. These observations can lead to improvement in therapeutic approaches for hearing-impaired patients.
Introduction
The auditory system shares with most sensory modalities a dynamic range that spans many orders of magnitude (Pollack and Pickett, 1958). When auditory signals such as speech must be processed in the presence of background noise, however, the presence of noise reduces the effective dynamic range and thereby decreases the ability of the system to detect and classify sounds. For a fixed signal level, increases in the noise level decrease the signal-to-noise ratio (SNR) and result in a progressive decline in speech reception performance (French and Steinberg, 1947; Pollack and Pickett, 1958; Studebaker et al., 1999; Dubno et al., 2005). Although intelligibility can sometimes be restored by increasing the speech level, speech at the same SNR is less intelligible at high speech and noise levels (French and Steinberg, 1947; Pollack and Pickett, 1958; Studebaker et al., 1999; Dubno et al., 2005).
The neural basis of the increasingly adverse effects of noise at high stimulus levels on signal processing is not well understood. Representations of signals in noise differ across auditory stations. Higher auditory stations appear to provide a more noise-robust representation of signals than do lower stations (Bar-Yosef and Nelken, 2007; Moore et al., 2013; Rabinowitz et al., 2013; Schneider and Woolley, 2013; Mesgarani et al., 2014). Most studies of noise effects on neural activity tested the level-dependent response to tones and other signals, such as vowels or amplitude modulation, in the presence of a constant noise level (Sachs and Young, 1979; Costalupes et al., 1984; Rees and Palmer, 1989) and reported the effects in terms of changes to rate-level functions (RLFs). In primary auditory cortex (A1), continuous white noise decreases tone-evoked response magnitudes and increases response thresholds as noise levels increase (Phillips, 1985; Phillips and Hall, 1986). At low noise levels, the noise-induced decrease in firing rate can be compensated for by proportional increases of the tone level at an ∼1:1 ratio (Phillips, 1985, 1990; Phillips and Cynader, 1985; Phillips and Hall, 1986; Liang et al., 2014). This proportional shift matches psychophysical noise effects at low-to-moderate stimulus levels (French and Steinberg, 1947). However, as noise levels rise above ∼55 dB SPL, psychophysical performance decreases nonlinearly (French and Steinberg, 1947; Studebaker et al., 1999; Dubno et al., 2005), which cannot be explained by linear threshold shifts alone because threshold shifts are thought to be determined by the SNR rather than the absolute noise level. Such effects may instead reflect mechanisms such as the nonlinear suppression of firing rate or changes in frequency tuning.
To investigate why noise can be more disruptive at high stimulus levels for a given SNR, we recorded frequency response areas (FRAs) at constant SNRs from A1 of anesthetized rats.
We focused on A1 since cortical responses appear to be more noise tolerant than those in peripheral stations (Rabinowitz et al., 2013; Schneider and Woolley, 2013). In addition, it has been shown that the characterization of A1 receptive fields requires a higher dimensionality than for most peripheral neurons (Atencio et al., 2008; Atencio and Schreiner, 2013). This broader processing capability of cortical neurons potentially enhances noise tolerance and, more generally, the ability to handle more complex auditory stimulus conditions. FRAs provide a comprehensive and intuitive characterization of basic spectral processing properties. They capture the frequency-dependent response of neurons across the dynamic range and can be characterized by basic tuning curve properties, such as characteristic frequency (CF), response threshold, bandwidth, latency, and firing rates. However, little is known of the influence that a fixed SNR may have on frequency tuning. We also tested speech reception performance at several SNRs and signal levels in adults with normal hearing to assess the potential psychophysical interactions between speech level and SNR. Firing rate, FRA bandwidth, and spike train discriminability for cortical neurons showed significant interactions between SNR and stimulus level, suggesting that these neural effects may be correlates of the observed psychophysical effects.
Materials and Methods
Surgical procedures.
All procedures were approved by the institutional care and use committee of the University of California, San Francisco. Twelve healthy adult female Sprague Dawley rats (age range, 54–94 d; median age, 70 d) were used in this study. Each rat was anesthetized using pentobarbital (50–80 mg/kg, i.p., supplemented as needed with 10–15 mg/kg, i.p.) supported with atropine (0.2 mg/kg, i.m.), dexamethasone (1 mg/kg, i.m.), meloxicam (2 mg/kg, i.m.), and bupivicaine (0.25%, s.c.). The body temperature of the rat was maintained at near 37.0°C with a homothermic blanket system (Baxter/YSI); a tracheotomy was performed to stabilize respiratory function; and a cisternal drain was introduced to increase brain stability. After positioning the rats in a head clamp, the right temporal muscle, cranium, and dura overlaying auditory cortex were removed. Finally, the cortex was covered with silicone oil.
Electrophysiological recording procedures.
All recording procedures were performed in a sound-isolated chamber (IAC Acoustics). Parylene-C-coated tungsten wire electrodes with an impedance of 0.8–1.0 MΩ (MicroProbes) were advanced orthogonal to the pia 500 ± 40 μm into the thalamorecipient layers of cortex (layer III–IV). The neural responses were amplified, filtered (0.6–3 kHz, notch at 60 Hz), displayed, and stored with hardware and software manufactured by Tucker-Davis Technologies. Initial multiunit mapping was used to identify the primary auditory cortex, which was targeted for data collection.
The stimuli we used were tones of different stimulus levels presented in white noise at fixed SNRs. The stimulus space for our experiments can be defined in terms of the following three orthogonal dimensions: tone frequency, tone level, and noise level (Fig. 1). Typically, the noise level is fixed, and the tone level is varied, or vice versa (Shetake et al., 2011; Rabinowitz et al., 2013; Schneider and Woolley, 2013). In such cases, the stimuli sample was from slices along one dimension of the stimulus space (Fig. 1A,B). By contrast, our approach uses fixed SNRs, thus, sampling from a set of diagonal slices of this stimulus space (Fig. 1C). Because we sampled a fixed range of tone levels, the maximum noise amplitude in each block is set by the choice of SNR (Fig. 1D). By taking a set of slices through this stimulus space, we can investigate the interactions between the SNR and overall stimulus level, and disentangle the effects of SNR from the effects of the absolute levels of the signal and noise.
The stimuli used in this study were 65 tones logarithmically spaced between 0.5 and 38.98 kHz (25 ms duration gated with 5 ms cosine ramps). White noise was generated randomly for each trial (1390B, General Radio Company) with calibrated amplitude (sound meter, linear weighting, Brüel & Kjær) to form SNRs of +43, +33, +23, +13, and +3 dB. We included a condition without noise for which the SNR was infinite (SNR+∞dB). Lower SNRs (<3 dB) could not be obtained across the full range of the FRAs due to signal distortions at the highest combined tone/noise levels. The stimuli were presented in closed field to the contralateral ear with a calibrated speaker (STAX; maximum stimulus, 85 ± 5 dB across the tested frequency range). On each trial, attenuation from 0 to 70 dB was applied to both the tone and noise signals to obtain the desired constant SNRs for tone levels of 15–85 dB SPL in 10 dB steps. Noise levels were calculated using linear frequency weighting of the spectrum from 0.5 to ∼40 kHz. Tone frequency and level were varied pseudorandomly within each SNR condition.
The trial structure for the experiments is illustrated in Figure 1C. Each trial begins with 275 ms of background noise, then a 25 ms tone is added. The noise continues for an additional 150 ms, then a new stimulus is selected pseudorandomly. As a result, all tones were presented with onsets spaced 450 ms (275 + 25 + 150 ms) apart. Noise attenuation was changed at least 275 ms before each tone presentation to allow the response to the change in noise level to reach a steady-state firing rate after any onset transient. Based on previous work in A1 of anesthetized cats, this should allow sufficient time for response changes reflecting the noise level adjustment to stabilize (Phillips, 1985). The duration of a single pass through a given SNR “slice” (Fig. 1C) was ∼4 min; all six SNR conditions required ∼24 min of recording time.
Psychophysics.
Psychophysical studies were approved by the representative human institutional review board at the Hannover Medical School (approval number 6376). A total of 20 healthy and normal hearing subjects were included in the study (12 female, 8 male). Subject age ranged from 18 to 40 years (median age, 25 years). All subjects were native German speakers. Measurements were performed with conventional, calibrated audiometers and were conducted in clinically approved sound booths. The Freiburg Monosyllabic Word Test was conducted for the left ear at SNRs of 10, 5, 0, −5, and −10 dB, and at signal levels of 60, 50, 40, and 30 dB SPL. The subject was asked to repeat monosyllabic words after hearing them, and the examiner marked the correctly identified words. Ten different lists of 20 words were used to minimize the ability of the patient to learn the list across the different SNRs (Lehnhardt, 1987).
Data analysis.
All data analysis was performed with MATLAB (MathWorks). FRAs were constructed from the firing rate in a 50 ms window following tone onset at every tone frequency and level within each available SNR condition. Each tone/level combination in an FRA was presented once. The response threshold was determined by calculating the distribution of response rates within the 50 ms baseline window (Fig. 1C) immediately preceding the presentation of the tones for each distinct noise level. The threshold was set to 2 standard deviations (SDs) above the mean response rate within each SNR condition. Responses below threshold were set to zero to produce a thresholded FRA (tRFA). The tone response threshold was defined as the lowest nonempty row of the tFRA. The central frequency was defined for each row of the tRFA as the median of the nonzero frequency values for each row. Bandwidth was determined at each sound level by computing the range of frequencies with nonempty bins in the corresponding row of the tFRA. The level above threshold was determined independently for each SNR condition. RLFs were constructed as the average firing rate of frequencies within ±0.5 octaves of central frequency at each level (i.e., 11 responses were included in each average). Peristimulus time histograms (PSTHs) were calculated at 1 ms resolution. For tone responses, PSTHs were computed for tones at and above threshold within ±0.5 octaves of central frequency. Steady-state responses to noise were estimated using a 50 ms baseline window (Fig. 1C) beginning 225 ms after changing the noise level and ending just before the presentation of the tone. Steady-state responses, including PSTHs and average firing rates, were calculated separately for different SNR conditions, even when the noise levels were identical (e.g., 32 dB occurring in different contexts). Averages were taken over 65 repetitions of each noise and SNR condition.
We evaluated the discriminability of neural responses to different tonal stimuli using Euclidean distance-based spike train classifiers (Foffani and Moxon, 2004). Like similar techniques that have also been used to compute spike train distances (Victor and Purpura, 1996; van Rossum, 2001), these techniques can be used to provide an estimate of the lower bound of the mutual information between a set of stimuli and a set of neural responses to them (Schnupp et al., 2006). Because we recorded only a single trial at each frequency-level combination in the FRA, we combined responses across sets of similar parameter values into stimulus “classes.” For each recording site and SNR condition, stimuli were broken up into classes of different intensity levels within 0.5 octaves of central frequency (i.e., 11 responses in each class) or different frequency (0.5 octaves) × level (10 dB) bins (i.e., 10 responses in each class). The classifiers compare the spiking pattern on each individual trial against a set of PSTHs “templates” constructed from PSTHs averaged across trials for each stimulus class. The temporal resolution for this analysis is defined by the choice of bin size applied to the spike train to be classified and the PSTH templates.
For each site and SNR condition, the bin size that maximized decoding performance was selected from a standard set (0.5, 1, 2.5, 5, 10, 20, and 50 ms). Because the analysis window duration was 50 ms, the 50 ms bin size corresponds to a simple firing rate code, since all spikes fall within a single bin. We used the optimal bin width for each site and condition to make our analysis more conservative. If there were systematic differences in the optimal bin width across SNR conditions, or different stimulus levels, for example, the use of a single temporal resolution for the analysis could advantage particular conditions of levels, resulting in spurious differences in decoding performance. This approach is standard in much of our past work using these techniques (Malone et al., 2007, 2010, 2013, 2014, 2015a,b).
To determine how the temporal resolution of the spike train classification process might affect our results, we compared population averages of decoding performance for bin sizes of 0.5, 1, 2, 5, 10, and 50 ms. The resulting performance curves (percentage correct) for discriminating signal intensity were relatively flat, with the largest deviations in performance occurring at 0.5 and 50 ms, which were typically associated with the worst discrimination. We observed significant main effects for bin size (one-way ANOVA: F(5,2172) > 50; p < 0.001). Rate classifiers based on a single analysis bin equal to the analysis window (e.g., 50 ms) performed significantly worse than classifiers at the optimum temporal resolution (Wilcoxon signed-rank test, p < 0.001; median, 21.6% vs 28.4%). We obtained a similar result when comparing discrimination using a fixed 5 ms bin for all sites against the rate classifier (Wilcoxon signed-rank test, p < 0.001; median, 26.1% vs 21.6%). Frequency discrimination performance produced a qualitatively similar pattern of results across the three different intensities tested (0–10, 20–30, and 40–50 dB above threshold), including significant main effects of bin size (three-way ANOVA: F(5,2012) > 20, p < 0.001). As was true for intensity discrimination, the smallest (0.5 ms) and largest (50 ms) bin sizes were consistently associated with the worst decoding performance at all tested SNRs.
All decoding was performed via complete cross-validation, such that each spike train to be classified was always excluded from the PSTH templates used for classification, and the spike trains from all other trials were included. Each spike train was assigned to the stimulus class that yielded the PSTH template that was “nearest” the spike train via Euclidean distance. Classifications based on this procedure were used to construct confusion matrices indicating how often the spike trains were classified as belonging to the stimulus class that had elicited them. To avoid comparing a spike train to a template that contains it, the contribution of the classified spike train to be classified was removed from its corresponding template before calculating the distances.
The results of this procedure were used to construct confusion matrices. The confusion matrices that were generated are normalized two-dimensional histograms whose columns correspond to the actual stimulus presented and whose rows correspond to the stimulus estimated by the classifier. Correct assignments of spike trains to the stimuli that elicited them are indexed by the concentration of values along the diagonal. Incorrect estimates fall outside of the diagonal in the column corresponding to the actual value.
In cases where the stimuli vary along parameter axes with meaningful ordinal values (e.g., decibels or frequency), it is also possible to evaluate decoding performance in terms of the “error cost” (Malone et al., 2007) instead of the percentage of correctly decoding trials. In effect, this evaluates how closely the estimate of the stimulus was to the actual stimulus. For example, 40 dB is a better estimate than 50 dB when the actual value of the stimulus is 30 dB. The cost function can be flexibly defined. In our implementation, correct estimates were assigned an error cost of zero; incorrect estimates were assigned a cost linearly proportional to the distance of the estimated value from the diagonal (i.e., a cost of 1 for values just off the diagonal, a cost of 2 for values removed two values from the diagonal, and so on). The error cost can be expressed as a fraction of the theoretically maximal error cost for a confusion matrix of equivalent size and assigned significance by comparison to a distribution of error costs generated by Monte Carlo techniques (Malone et al., 2007).
All linear regressions are ordinary least squares. In figures, shaded regions and error bars represent the mean ± SEM. ANOVA measures were used to identify significant main effects and interactions between stimulus level and SNR. In particular, a significant interaction term means that the effect of noise at a fixed SNR varies with stimulus level. Because the nonlinear effects of noise on speech intelligibility are a type of interaction between SNR and stimulus level, neural correlates of these psychophysical effects are likely to have interactions between SNR and stimulus level.
Results
Firing rates to CF tones diminished with joint increases in tone and noise levels
We constructed FRAs from 126 primary auditory cortex multiunit sites (each representing the responses of a handful of nearby neurons) for six positive SNR conditions (SNR+3 dB to SNR+43 dB in 10 dB steps, and a noiseless condition denoted as SNR+∞dB) for 25 ms tones by averaging firing rates within a 50 ms window beginning at tone onset. To maintain a fixed SNR, the amplitude of the noise was changed on each trial, 275 ms before presentation of the next tone (Fig. 1C). The FRAs were thresholded with respect to baseline activity to the noise (see Materials and Methods) to produce tFRAs delineating the frequency-level pairs that evoked robust spiking activity (Fig. 2A,B). The sampled characteristic frequencies (CF, the median responsive frequency at threshold) ranged from 0.6 to 36 kHz with a population median of 6.2 kHz. For comparison, the audiogram of the albino rat indicates improving thresholds from ∼750 Hz to the minimum of 0 dB SPL at 8 kHz, and rapidly worsening thresholds above ∼40 kHz to a limit of ∼64 kHz (Kelly and Masterton, 1977). Thus, the population median in our sample falls relatively close to the range of frequencies where rats are most sensitive.
Different experimental designs for tone-in-noise presentations in tone frequency, tone level, and noise level space. A, Fixed noise level. Left, The amplitude of the noise (gray area) and tone (black waveform) for a fixed noise level condition. Only the tones change amplitude between trials. Trial duration is 450 ms (orange), and is composed of an initial noise-only period (275 ms), a tone-in-noise period (25 ms), and a final noise-only period (150 ms). Right, Fixed noise experiments are slices along a single noise level in the tone level/tone frequency/noise level stimulus cube. B, Fixed tone levels. Left, The noise changes amplitude between trials but not the tones. Right, Fixed tone level experiments are slices along a single tone level in the stimulus cube. C, Fixed SNR. Left, Diagram as in A but for fixed SNR condition. Both tone and noise levels change between trials. The blue and red windows indicate time periods for analysis (50 ms). Right, Constant SNRs are diagonal slices along both tone and noise level in the stimulus cube. D, The relationship between tone level and noise level for a fixed SNR condition at the different SNR levels used in our study. The diagonal lines are the studied conditions in the tone level/noise level plane of the stimulus cube.
FRAs obtained for different SNR conditions. A, The unsmoothed FRA for an example site in the SNR+∞dB condition. Lighter colors represent higher firing rates, as defined in B. B, Top row, The smoothed FRAs for the same example site across several SNR conditions. A gray box (i.e., SNR+3 dB) indicates that no data were collected at that SNR. Bottom row, The results of the thresholding procedure indicate the frequency-level region that is considered responsive for subsequent analysis. C, D, Another example site, with panels as in A and B. E, Population average FRAs (N = 126 multiunit sites) aligned by characteristic frequency (CF) and threshold for each site.
Two example sites are illustrated in Figure 2A–D. In the first example, the site responds to a relatively wide range of frequencies at high intensities in the SNR+∞dB condition, and a relatively narrower range at lower intensities. The binarized tFRAs shown in black and white in the bottom row of Figure 2B illustrate these changes more clearly. As the SNR decreases and the noise becomes louder relative to the tones, the frequency response range at high tone levels (shown near the top of each FRA and tFRA) is severely restricted. The second example shows a similar trend, such that the response strength and bandwidth at high intensities decreases and eventually disappears as the SNR decreases (Fig. 2D). To demonstrate that these trends held for the population, we constructed population average FRAs by aligning each FRAs obtained at each site by the CF and the response threshold obtained without background noise (N = 126; Fig. 2E). FRAs were most strongly affected by the presence of background noise at high levels, which occur at high tone intensities in low-SNR conditions. This is evident by comparing the top rows of the population FRAs, obtained at 50 dB relative to tone threshold, across SNR conditions with the row for responses 10 dB across threshold. Although the latter set of rows is diminished at lower SNRs, the reduction is small compared with that observed for louder tones.
We quantified the changes in FRAs by examining RLFs. Because we sampled the FRA densely, we defined the RLF for this dataset as the average firing rate for tones within ±0.5 octaves of CF across intensities (Fig. 3A–D). The example site in Figure 2A fired more action potentials as the tone level increased at high SNRs (dark lines), but this site responded only weakly for tone levels >40 dB SPL when the noise levels were also high (i.e., at low SNRs; Fig. 3A, light lines). A second example site (Figs. 2C, 3B) responded well to all tone levels tested at high SNRs, but at low SNRs response strength decreased for tone levels above ∼50 dB SPL. These trends were present in the population average RLF (Fig. 3C) and are particularly clear when each RLF is aligned by threshold (Fig. 3D). In this case, the most adverse SNRs are associated with decreased responses as the tone and noise levels increase (e.g., >10 dB above threshold). As a result, the nonmonotonicity index increased as a function of the SNR (Fig. 3E).
RLFs for different SNRs. A, RLFs for same example as in Figure 2A. Lines represent different SNR conditions coded to colors in E and listed at the right. These colors are used throughout. B, RLF for same example as in Figure 2C. C, Population mean of RLFs. D, Population mean of RLFs after aligning the threshold of each RLF to 0 dB. E, Population mean nonmonotonicity index (0, monotonic increase of RLF; 1, highly nonmonotonic RLF). F, Population mean response thresholds. G, Population average slopes of firing rate near CF by the level above threshold for each SNR condition. Light gray lines are the individual slopes of each site across SNRs. The black dots are the population average slopes. The black line through these dots is the regression line across sites and SNRs. H, Population firing rates near CF by stimulus level above threshold (hatching of bars) and SNR condition (color of bars). The black line above the bars from each SNR condition corresponds to the average slope of the RLF from G.
Across the population, we found no consistent effect on the tone thresholds across the tested SNR conditions (Fig. 3F; one-way ANOVA: F(5,543) = 1.2, p = 0.31). Tone thresholds in the presence of noise remained similar to tone thresholds in the absence of noise, at least for the range of positive SNRs that we tested. Increasing the noise to any level up to the threshold of a site in the noiseless condition did not, on average, change the tone threshold. Thus, for positive SNRs, average tone response thresholds remained unaffected, although individual sites could show small threshold shifts in either direction.
Across the population, firing rates for CF tones depended on the stimulus level above threshold, the SNR, and on the interaction between stimulus level and SNR (Fig. 3G; two-way ANOVA: main effect of SNR, F(5,2587) = 14.17, p < 0.001; main effect of level above threshold, F(4,2587) = 4.89, p = < 0.001; interaction term, F(20,2587) = 2.64, p = 0.002). Comparing the leftmost bar in each group of Figure 3H, firing rates near the threshold did not change significantly across SNR conditions (one-way ANOVA: F(5,543) = 1.12 p = 0.34). However, responses at levels well above threshold (i.e., the rightmost bars in each SNR-specific group; Fig. 3H) decreased significantly for the most adverse SNRs.
The interaction between stimulus level and SNR is evident when comparing the pattern of firing rate changes within each SNR condition across the different SNR conditions. Each set of histogram bars represents the population-averaged RLF relative to the threshold of each site. For SNR+∞dB (Fig. 3F, black bars), the population average firing rates increased ∼20% as the stimulus level increased. At SNR+13 dB or SNR+3 dB (Fig. 3F, light bars) by contrast, the firing rates decreased ∼50% as the stimulus level increased. In the ANOVA, the significant interaction term represents this effect (F(20,2587) = 2.64, p = 0.002). Thus, the effects of joint changes in tone and noise levels are different for different SNR conditions.
We estimated the slope of the line of best fit for the RLFs or firing rate growth function (Fig. 3A) for each site and SNR condition. Although the RLFs are rarely strictly linear, the slope of the line of best fit for the threshold-aligned population data (Fig. 3D) provides a compact estimate of how firing rates vary with stimulus level for a given SNR condition. We determined the slope of these functions for each recording site and plotted them as a function of SNR (Fig. 3G, gray lines). The mean RLF slopes for the different SNR condition are indicated by the black dots fit by the black line in Figure 3G. Each dot in Figure 3G describes how the firing rate changes as the tone level increases across each different SNR condition. These slopes were used to draw the lines shown above each group of histograms in Figure 3H and to match the population-averaged RLFs quite well. Decreasing the SNR resulted in a linear decrease in the RLF slopes from positive to negative values (Fig. 3G; r = −0.42, n = 541, p < 0.001).
To clarify these results, we describe the implications of the population trends we have identified for a canonical cortical site that exemplifies them. We can infer from Figure 3F that the average site threshold is ∼30 dB SPL. We can estimate the shape of the average RLF for tones above threshold from Figure 3G. In the absence of noise, a 70 dB tone should evoke a firing rate of ∼80 Hz. For a CF tone presented at 70 dB (i.e., 40 dB above threshold), increasing the level of continuous white noise should decrease the response rate. When continuous noise is presented at 50 or 60 dB, however, the tone-evoked firing rate would decrease to ∼40 Hz. If, on the other hand, the tone level were decreased to near the threshold, the effect of noise at an identical SNR would diminish, and eventually disappear. For tones at 30 dB (i.e., at the threshold), noise at any level <30 dB would not affect the tone-evoked firing rate: regardless of the SNR, the canonical site always responds at ∼70 Hz. The firing rate of the canonical site to tones in noise would appear to be very robust at low stimulus levels across the range of tested SNRs (+3 to +43 dB). At high tone levels, by contrast, this robustness is lost for more adverse (but still positive) SNRs.
Noise at high intensities suppresses firing rates to CF tones more strongly than predicted by a 1:1 shift
A 1:1 shift of a tone RLF implies that when the background noise level increases by a certain amount, the RLF is shifted laterally, by the same amount, toward louder tones, provided the initial noise level is sufficient to have an appreciable effect (Phillips and Cynader, 1985; Phillips, 1985; Phillips and Hall, 1986; Liang et al., 2014). For convenience, we use the term 1:1 for simplicity even though the exact shift value for the compensatory shift appears to be nucleus and species specific, and ranges between ∼0.3 and 1.1 (Liang et al., 2014). The most relevant feature of the response is that the ratio of the shift is fixed at a constant value. For computational ease, however, we illustrate predictions based on a fixed 1:1 ratio.
An illustration of the predicted changes in firing rates for such a 1:1 shift is provided in Figure 4A. The sigmoidal RLF in the SNR+∞dB condition (Fig. 4A, black line at far left) begins to be shifted in a 1:1 fashion once the noise exceeds some threshold, which in this illustration is set to 30 dB SPL (Fig. 4A, red line). Fixed SNR conditions are represented as diagonal lines across the noise and tone level space. Once the noise exceeds the 30 dB threshold, joint increases in the tone and noise levels shift the tonal threshold by the same amount. As a consequence of the threshold shift, increasing the tone level does not increase the firing rate: the firing rate flattens (Fig. 4A, kink in the horizontal colored lines). The SNR condition determines where along the RLF this flattening occurs.
A, Theoretical predictions for a 1:1 threshold shift with SNR change on the growth of CF firing rate. As SNR decreases, the RLF flattens at lower intensities above threshold. B, Predicted population RLF at CF for different SNRs, assuming a 1:1 threshold shift. C, The observed change in CF firing rate in the SNR+3 dB condition reproduced from Figure 3H.
We illustrate the consequences of the firing rate flattening for the canonical cortical site, with the RLF taken from the SNR+∞dB condition of Figure 3F in Figure 4B. For simplicity, we assume that the response of the canonical site threshold to tones equals its suppression threshold, where noise begins to shift the RLF. In the SNR+∞dB condition (Fig. 4B, leftmost bars), increasing the tone level increases the firing rate of the site. At the other extreme, in the SNR+3 dB condition (Fig. 4B, rightmost bars), increasing the tone level above threshold does not increase the firing rate of the site because increasing the tone level by 10 dB also results in a 10 dB increase in threshold (Fig. 4A). The firing rate remains at the firing rate at threshold (Fig. 4B). The other SNR conditions demonstrate an intermediate effect—the firing rate still flattens, but, in more favorable SNR conditions, the flattening occurs at higher tone levels relative to threshold. This schematic provides a visual explanation of why the slopes of the lines in Figure 4A decrease with decreasing SNRs, but asymptote at zero. Of course, this model is only approximate since noise effects can be difficult to predict (Liang et al., 2014).
The predictions of ∼1:1 shifts illustrated in Figure 4B are quite different from what we actually observed (Fig. 3G). Instead of noise flattening the firing rate, the firing rate was suppressed to ∼50% below the rate near the threshold, particularly at the lowest tested SNR (Fig. 4C). As Figure 4 demonstrates, a 1:1 threshold shift would not decrease the firing rate to such a degree for joint increases in the tone and noise levels. Thus, the negative slopes of the firing rate versus tone level functions indicate that the response reduction is stronger than would be predicted by a 1:1 shift and suggest the operation of a more active suppression mechanism (Fig. 3G).
Frequency tuning bandwidths narrow with joint increases in tone and noise levels
To quantify FRA shape changes, we calculated the half-height bandwidth from 10 to 40 dB above threshold for the different fixed SNRs. Example iso-level frequency tuning functions at 10 and 40 dB above threshold for different SNRs are shown in Figure 5. At 10 dB above threshold, the site responded similarly across frequencies regardless of the SNR (Fig. 5A, top row, left column). At 40 dB above threshold, however, decreasing the SNR reduced the peak values of each iso-level tuning function with a concomitant reduction in the range of frequencies that elicited robust responses (Fig. 5A, bottom row). The second example site (Fig. 5B, middle column) was similar—changes across SNRs were small at 10 dB above threshold but large at 40 dB above threshold. Much of the decrease in the FRA bandwidths is explained by the reduced tone-evoked firing rates at high noise levels (Figs. 3, 5).
Frequency selectivity as a function of SNR. A, Smoothed iso-intensity frequency response profile across different SNRs (colored as in D and as listed at right) for an example site at 10 dB above threshold (top) or 40 dB above threshold (bottom). B, Another example site, as in A. C, Population average frequency responses across SNRs. Shaded area for one example condition (SNR, +23 dB), SEM. D, Population FRA bandwidths at different stimulus levels above threshold (hatching) and SNR conditions (color). Black line above the bars for each SNR condition corresponds to the mean bandwidth change with tone levels from E. E, Population average slopes of bandwidth by tone level above threshold for each SNR condition. Light gray lines are the individual slopes of each site across SNRs. Black dots are the mean slope within each SNR. The black line is the regression line through those dots across sites and SNRs.
Analogous response changes are evident in the population averages (Fig. 5C). Across the population, both the iso-level tuning curves and bandwidths changed very little at tone levels 10 dB above threshold as the SNR decreased. At higher tone levels above threshold, however, the iso-level tuning curves and, thus, their bandwidths, narrowed with decreasing SNRs (Fig. 5C, bottom row).
Statistical verification (two-way ANOVA) confirmed that FRA bandwidth depends on the tone level (main effect of level above threshold: F(3,2044) = 7.10, p < 0.001), SNR (main effect of SNR: F(5,2044) = 39.45, p < 0.001), and their interaction (interaction term: F(15,2044) = 6.81, p < 0.001). Bandwidths 10 dB above threshold did not change with the SNR (one-way ANOVA: F(5,543) = 1, p = 0.42), but bandwidths at higher intensities did depend on the SNR. The significant interaction term in the ANOVA indicates that the effects of tone level and SNR on frequency tuning bandwidths are not separable. At high SNRs, bandwidths 40 dB above threshold were about twice as wide as they were at 10 dB above threshold. At low SNRs, by contrast, bandwidths 40 dB above threshold were about half as wide as they were at 10 dB above threshold.
To investigate the interaction observed in the ANOVA further, the FRA bandwidths for different tone levels relative to threshold were grouped across SNR conditions and compared in Figure 5D. The format of this analysis is analogous to that described for firing rates in Figure 3H. The trends are also analogous: at high SNRs, the population-averaged bandwidths increase with increasing tone levels. At low SNRs, however, with much higher absolute noise levels, FRA bandwidths decrease when the tone levels increase.
For each site and SNR condition, we estimated the slope of the line of best fit for the bandwidth versus tone level functions, depicted as gray curves in Figure 5E. The population averages indicate that decreasing the SNR decreased the slope at most sites (Fig. 5E, black dots). The slope values corresponding to the black dots in Figure 5E were used to add the illustrative black lines in Figure 5D. We fit a line to the slope values across different SNR conditions (Fig. 5E) and found that decreasing SNRs lead to a nearly linear decrease in the slopes of the bandwidth versus tone level functions. (Fig. 5E; r = −0.47, n = 531, p < 0.001). The negative slope of the line of fit in Fig. 5E indicates that FRAs narrow rather than expand when both the tone and noise are loud, as occurs for small positive SNRs. Similar to the effects observed for the RLFs, the bandwidth changes support the conclusion that the effects of fixed SNRs on frequency processing are not separable from changes in tone level.
As we observed for firing rates in the previous section, changes in bandwidth are more pronounced than a fixed 1:1 shift in threshold would predict. For example, a 1:1 shift implies that the bandwidths in the SNR+3 dB condition should be identical across all tone levels. Instead, the bandwidths in the SNR+3 dB condition (yellow bars) decreased with stimulus amplitude.
Because we know both the average site threshold (Fig. 3E) and the average bandwidth near threshold (Fig. 5D), we can ask what the interaction between SNR and stimulus intensity implies for the canonical cortical site. For tones presented at 70 dB (i.e., 40 dB above the average threshold of 30 dB), increasing the noise would decrease the bandwidth. The bandwidth of the canonical site would be more than an octave wide in the absence of noise, but for noise levels approaching 60 dB, the bandwidth would shrink to less than half an octave. As the tone level decreases, however, the interaction with SNR diminishes and eventually disappears. For tones at 40 dB (i.e., 10 dB above threshold), the noise level is irrelevant provided it is <30 dB—the bandwidth will be constant at ∼0.65 octaves. Thus, the bandwidth of the canonical site appears to be very robust to changes in the SNR at low tone levels. To some extent, this is expected, since noise below the threshold for detection would not be expected to have much effect, similar to the way that the lines in Fig. 4A do not flatten until the threshold is reached. At levels well above tonal threshold, however, this apparent robustness is lost—the bandwidth of the canonical site substantially decreases for the different SNRs. Effectively, the frequency selectivity at low SNRs and high signal levels is increased, but sensitivity, indexed by firing rates, is reduced.
Information about tone level encoded by cortical spiking patterns decreases as tone and noise levels increase together
We used a spike train classifier to test whether decreasing the SNR decreased the discriminability of tone levels by evaluating the SNR-induced differences among spike trains elicited by different tone and noise levels (see Materials and Methods). Spike train distances were quantified as the Euclidean distances between vectors of binned spike counts representing the responses. We constructed PSTH templates from spike trains at each site, tone level, and SNR condition. Each tone level template was constructed from PSTHs compiled from responses to tones within ±0.5 octaves of center frequency. The bin size used for response decoding for each site and condition was optimized within each SNR condition to maximize classification performance (see Materials and Methods). Each spike train was classified by assigning it to the same tone level as the PSTH template to which it was closest in a response space with a dimensionality equal to the number of bins used to represent the spike trains and the PSTH templates. An example is shown in Figure 6A, where accurate predictions create the trio of white squares along the diagonal at 35, 25, and 15 dB SPL. By contrast, inaccurate classifications create the weakly structured pattern of colors in the left half of the confusion matrix corresponding to tone levels >45 dB SPL (Fig. 6A).
A, Confusion matrix for tone level classification for an example site in the SNR+∞dB condition. Lighter colors indicate a higher proportion of spike trains from that actual decibel level (column) were classified as a given decibel level (row). The proportion color scale is defined at the right of B. B, Population average confusion matrices across different SNR conditions. C, Average percentage correct classification across SNR conditions. Dashed line indicates chance performance.
Population average confusion matrices (Fig. 6B) were constructed by averaging confusion matrices computed for individual recording sites. These showed a concentration of values (brighter colors) along the diagonal in the SNR+∞dB condition, but more distributed patterns of values for smaller SNRs. We quantified classification performance in terms of the percentage of spike trains that were successfully associated with the correct tone level by computing the percentage of values in the full matrix that fall along the diagonal. The worst discrimination performance was seen for low tone levels (lower right corner of confusion matrices) for all tested SNRs, as indicated by the spread of values in a “box” at the bottom right corner of each confusion matrix. More importantly, the average percentage of correctly classified tone levels across all tone levels significantly decreased as the SNR decreased (Fig. 6C; r = −0.32, n = 549, p < 0.001). Thus, SNRs significantly affect how well different tone levels embedded in noise can be decoded based on cortical spike trains. Similar results were obtained using a graded error cost metric (see Materials and Methods) in place of the binary (i.e., correct or incorrect) percentage correct metric (data not shown).
Information about tone frequency encoded by cortical spiking patterns decreases as tone and noise levels increase
We used a spike train classifier to test whether decreasing the SNR also decreases the differences among spike trains for different tone frequencies, and to investigate whether these differences were influenced by tone level. The spike train classifier for tone frequency was constructed in a manner similar to the classifier for tone level. Instead of CF tones at different levels, however, the templates were constructed from blocks of tone frequencies and intensities spanning 0.5 octaves and intensities spanning 10 dB within each SNR condition (e.g., 0–10, 20–30, and 40–50 dB above threshold).
In principle, the classification performance for each site is limited by the frequency tuning bandwidth for a given tone level: tone frequencies that fail to elicit spikes because they are outside the response area of a given site cannot be discriminated because they share a common spiking pattern (i.e., an absence of spiking). Thus, the expected performance of this classifier should be correlated with FRA bandwidth when the tested frequency range exceeds that bandwidth. Because we have shown that bandwidth depends on the interaction of SNR and tone level (Fig. 5), we expect that spike train classification of tone frequency will also depend on the interaction of tone and noise level.
This effect can be seen in the example confusion matrices shown in Figure 7A. For tones 0–10 dB above threshold (Fig. 7A, top), only a small region between 8 and 32 kHz is classified successfully (brighter colors), although there are multiple errors. This region in the confusion matrix corresponds to the FRA for low-level tones. Outside of this region, responses were too weak or inconsistent for successful spike train classification (black regions). For tones 40–50 dB above threshold (Fig. 7A, bottom), spikes were elicited across a wider frequency range, reflecting a broader FRA bandwidth at those levels, and yielding better classification performance, which is reflected in a broader frequency range where values are clustered along the diagonal of the confusion matrix.
A, Confusion matrices for frequency classification for an example site in the SNR+∞dB condition at 0–10 dB above threshold (top) and 40–50 dB above threshold (bottom). Lighter colors indicate a higher proportion of spike trains from the actual frequency block (column) that were classified as a given frequency block (row). Proportions are defined by the color bar at the right of B. B, Population average confusion matrices across SNR conditions for the same levels as A. C, Average percentage correct classification across tone levels relative to threshold and SNR conditions. The dashed line indicates chance performance. The black lines above the bars from each SNR condition correspond to the average slope values from D (black dots). D, Population average slopes of bandwidth by intensity above threshold for each SNR condition. The light gray lines are the individual slopes of each site across SNRs. The black dots correspond to the mean slope in each SNR (see C). The black line is the regression line through those dots across sites and SNRs.
Given the significant interaction between SNR and tone level, we observed for tuning (Fig. 5), we expected to observe similar interactions for spike train discriminability due to SNR or tone level-induced changes in spectral tuning width. We evaluated the interaction between tone and noise levels by computing population-averaged confusion matrices (Fig. 7B). The weak performance of the classifier at very low frequencies—corresponding to the top two rows in each confusion matrix—is likely due to undersampling of this stimulus frequency range in the cortex when selecting sites for recording. In the SNR+∞dB and SNR+43 dB conditions (Fig. 7B, two leftmost columns), increasing the tone level from 10 dB above threshold (Fig. 7B, top) to 40–50 dB above threshold (Fig. 7B, bottom) increases decoding performance, as indicated by the increased brightness of diagonal values (Fig. 7B, bottom). By contrast, in the SNR+13 dB and SNR+3 dB conditions (Fig. 7B, two rightmost columns) joint tone and noise level increases did not improve discrimination. We quantified classifier performance by averaging the percentage correct across sites for each SNR condition and each set of levels above threshold (Fig. 7C). Classification performance depended on the stimulus level (two-way ANOVA: main effect of intensity above threshold, F(2,1427) = 13.64; p < 0.001), SNR (F(5,1427) = 30.20; p < 0.001), and their interaction (F(10,1427) = 5.88; p < 0.001). As the SNR decreases, the slopes of the percentage correct versus tone level functions decrease from positive values to zero or below (Fig. 7D; r = −0.30; n = 564; p < 0.001; these slope values were used to plot the lines in Fig. 7C). A similar pattern of results was observed when using the error cost metric instead of the percentage correct. Thus, the spike train information about tone frequency identity that is available to the decoder fails to increase with increasing tone level if the noise levels are similarly high, echoing the prior pattern of results we observed for firing rates and tuning bandwidth.
Decreasing SNRs delays timing of onset responses independent of tone level
The preceding analyses of FRA bandwidth and firing rate were based on responses in a 50 ms window beginning at tone onset; to quantify how firing patterns changed over time relative to tone onset, we generated PSTHs across different SNR conditions based on responses to tones within ±0.5 octaves of CF for distinct tone levels from 10 to 40 dB above threshold. In these cases, however, we computed the firing rates in bins of only 1 ms, resulting in very high apparent firing rates in the examples shown in Figure 8A,B. Given this, it is perhaps more useful to think of the PSTHs as indicating the probability of a spike occurring in each bin, where 1000 Hz is a probability of 100%. Decreasing the SNR resulted in shifts to longer latencies and reduced firing rates, particularly when both tone and noise levels were well above threshold. Latency shifts are similar for low and high tone levels, while firing changes are expressed for high tone levels (Fig. 8A,B, bottom). Similar patterns are evident in the population results (Fig. 8C).
Response timing. A, Smoothed PSTHs across SNRs (colored as in D and listed at right) for an example site at threshold (top) and 40 dB above threshold (bottom). The black bar above the PSTH indicates the tone presentation period. B, Another example site, as in A. C, Population average PSTHs across SNRs. Shading behind one example condition (SNR, +23 dB), SEM. Firing rates in A–C were calculated using 1 ms bins, such that 1000 Hz indicates that a spike occurred in that bin on every trial. D, Population onset latencies (at half-height of rising PSTH) by stimulus level above threshold (hatching) and SNR condition (color). Black line above the bars from each SNR condition corresponds to the average slope for each condition in E (black dots). E, Population average slopes of onset latencies by time-level above threshold for each SNR condition. The light gray lines are the individual slopes of each site across SNRs. Black dots indicate the mean slope in each SNR condition. The black line is the regression line through those dots across sites and SNRs.
PSTH peak response rates depended on the SNR, the tone level, and their interaction (two-way ANOVA: main effect of SNR, F(5,2587) = 42.48, p < 0.001; tone level above threshold, F(4,2587) = 9.11; p < 0.001; interaction, F(20,2587) = 6.03, p < 0.001). Thus, PSTH peak height depended on the tone level even when holding the SNR constant.
In contrast, the effect of SNR on response onset latencies estimated from the half-height of the PSTH was separable from the stimulus level. This is evident from the population-averaged latencies for different stimulus levels and SNRs (Fig. 8D). Within each SNR condition, the histogram patterns were similar, such that increasing the tone level decreased the latency (Fig. 8D). Decreasing the SNR shifted onsets to longer latencies, however, so the average height of the sets of histograms in Figure 8D increases from left to right. The latency effects of SNR and stimulus level are both statistically significant, but the interaction between them is not (two-way ANOVA: main effect of SNR, F(5,2570) = 40.73, p < 0.001; main effect of dB above threshold, F(4,2570) = 52.98, p < 0.001; interaction term, F(20,2570) = 1.24, p = 0.21). Analogously, the slopes of the latency versus tone level functions did not change with the SNR (Fig. 8E; r = 0.012, n = 541, p = 0.78). The statistically insignificant ANOVA interaction term and relative consistency of the slopes of the latency versus tone level functions across SNR indicate that the effects of stimulus intensity and SNR on response latency were separable.
Even though there was no change in the firing rate at tone levels near threshold (Fig. 3H, left bars in each histogram), as the SNR decreased, onset latencies increased relative to the no-noise condition, even for near-threshold responses, as is evident when comparing the leftmost bars in each set of histograms in Fig. 8D. At tone levels well above threshold, the same latency shift was observed (Fig. 8D, bars to the right in each histogram). Recall that decreasing the SNR significantly reduced tone evoked firing rates (Fig. 3H, bars to the right in each histogram). The pattern of effects of the SNR on response latency differs from the pattern of its effects on firing rate. In addition, the foregoing separable effects of stimulus level and SNR on response latency differ from the response predicted by a simple threshold shift. A threshold shift model predicts a significant interaction term when noise increases thresholds that “flatten” responses arranged by tone level (Fig. 4A), but we observed no interaction for the effects of SNR and tone level on response latency.
As before, we estimated what the independent effect of SNR on response latency implies for the latency of the canonical site with a threshold of 30 dB SPL (Fig. 3F). Regardless of the amplitude of the tone, decreasing the SNR increases the response latency. Even for tones at or near threshold in SNR+43 dB noise (where the noise level is approximately −13 dB SPL), response latencies still increased. The observed effect of SNR on response latency, even at low overall noise levels, may be a reflection of long-term adaptive effects in the stimulus presentation scheme due to the random switching between high and low noise levels during the acquisition of an FRA at a fixed SNR.
Context modulates steady-state firing to background noise
We define the “steady-state” firing rate as the level of neural activity elicited by noise alone, after any transient changes in firing rate associated with switching the noise level are likely to have stabilized. For the no-noise condition, the steady-state firing rate is simply the spontaneous firing rate of the site. Of course, the background noise can often elicit significant responses, which is why we defined tone thresholds with respect to those responses in previous sections. To demonstrate that noise-driven responses have stabilized within the baseline window (Fig. 1A), we first demonstrated that the population firing rates in the interval 40–50 ms before the tone were no different from firing rates 0–10 ms before the tone (paired t test: mean difference, −0.0071 ± 0.14, n = 619, p = 0.96). This finding indicates a stable noise response behavior in the 50 ms preceding the tone onset.
Response rates increased with increasing noise level across different SNR conditions (Fig. 9A). Importantly, however, the magnitude of responses to noise also depended on the SNR condition, which can be observed by comparing responses to equivalent levels of noise (indicated by the abscissa) across the differently colored curves corresponding to the different SNR conditions. We analyzed the relationship between these changes in noise response rates using the methods used in prior sections. The effects of SNR and noise level did not have a significant interaction (two-way ANOVA from 22 to 42 dB of noise; main effect of noise level, F(2,1830) = 6.99, p < 0.001; main effect of SNR, F(5,1830) = 12.09, p < 0.001; interaction, F(10,1830) = 0.86, p = 0.86). Thus, the SNR condition influenced the overall strength of the noise response behavior but had little influence on the shape of the growth functions across different noise levels (Fig. 9A).
Noise-evoked firing rates. A, Mean steady-state firing rates (i.e., noise-only or spontaneous firing for the no-noise condition) across tone level and SNR conditions. The dashed horizontal line indicates the spontaneous firing rate in the no-noise condition. SNRs are colored after the bars in B. Inset, Average largest firing rate difference within an SNR condition for steady-state firing rates (white bar) compared with the largest difference within an SNR condition for CF tone-evoked firing rates (black bar). B, Mean steady-state firing rates for a 32 dB noise across different SNR conditions. Dashed line, Spontaneous firing rate. B corresponds to the vertical slice (dashed gray line) of A.
The mean firing rates for a fixed noise level of 32 dB SPL, for example, obtained from FRAs recorded in different SNRs showed a decrease of the response with decreasing SNR (Fig. 9B, corresponding to the vertical line in Fig. 9A). The response to a 32 dB noise in the +43 dB SNR condition (dark brown bar) had steady-state firing rates ∼50% higher than in the no-noise (i.e., spontaneous) condition, as indicated by the dashed black line in Figure 9B. However, the steady-state firing rate for 32 dB was reduced with decreasing SNRs, approaching that of the no-noise condition for the lowest tested SNRs (+13 and +3 dB). In the context of the experimental design, where the noise level was varied pseudorandomly on each trial for fixed SNR values and a fixed limit on the loudest tone level (85 dB), the largest SNR values, on average, are associated with the lowest noise levels. As a result, the trials with 32 dB noise in low-SNR conditions are statistically more likely to involve a reduction in the noise level relative to the previous trial. The reverse is true for the highest SNR values. For a given noise level, then, there is a structured bias in the expected noise level on the previous trials across SNR conditions. Because the same absolute noise level evokes different steady-state firing rates in different SNR conditions, some form of adaptation with a time constant on the order of the trial length (Malone et al., 2002; Bartlett and Wang, 2005; several hundreds of milliseconds) is necessary to account for these differences.
Nevertheless, the observed noise condition-specific changes in steady-state firing rates should have no substantial effect on any of the preceding results for tone responses because both the average steady-state activity and changes in steady-state activity with SNR were small relative to the changes in tone-evoked responses. The largest difference between steady-state firing rates within an SNR condition was ∼6.5 Hz (Fig. 9A, inset), compared with the largest change in CF tone-evoked firing rates within an SNR condition of ∼38 Hz (Fig. 3F). As the changes in tone-evoked firing rates were >500% of the changes in noise-only firing rates, the changes in steady-state firing rates alone cannot account for the changes in tone-evoked firing rates.
Fixed SNR noise decreases human psychophysical speech reception performance most strongly at higher sound pressure levels
To test the generality of our neurophysiological account of level-dependent FRA changes for different SNRs in rats, we performed psychophysical experiments in humans on a more naturalistic discrimination task, the Freiburg Monosyllabic Speech Test. We presented this test at fixed SNRs at multiple joint speech and noise levels for 20 subjects with normal audiograms (Fig. 10A). For 60 dB speech, noise is presented at 50 and 70 dB, respectively, in the SNR+10 dB and SNR−10 dB conditions. This range of noise levels encompasses the noise levels previously reported to degrade speech reception (French and Steinberg, 1947). Because performance was at chance levels (0%) for all stimulus levels in the SNR−10 dB condition, this condition was not included in subsequent analyses.
Speech detection of human subjects with normal hearing at four different signal levels and five different SNRs. A, Population percentage correct by speech level and SNR condition. The black line above the bars from each SNR condition corresponds to the average slope from B. B, Population average slopes of percentage correct by speech level for each SNR condition. The light gray lines are the individual slopes of each subject across SNRs. Black dots are the average slope in each SNR condition. The black line is the regression line across subjects and SNRs.
This speech task is challenging, and performance depended on the speech level even in the most favorable tested condition (SNR+10 dB). For example, increasing the speech level from 40 to 60 dB improved performance by ∼20% (Fig. 10A, striped black bars). However, when the SNR was −5 dB, increasing the stimulus intensity over a similar range increased performance by only ∼5% (Fig. 10A, light brown bars). Thus, the performance improvements observed at positive SNRs were not maintained to the same degree as for more adverse SNRs, which entail higher absolute noise levels. Thus, human speech reception in noise is not as robust at unfavorable SNRs under conditions in which both the speech and noise levels are relatively high.
Similar to results for firing rates elicited by CF tones, bandwidths, and spike train discrimination of tone frequency, human psychophysical discrimination depended on stimulus level, SNR, and their interaction (Fig. 10A; two-way ANOVA: main effect of stimulus intensity, F(2,1427) = 13.64, p < 0.001; main effect of SNR, F(5,1427) = 30.20, p < 0.001; interaction term, F(10,1427) = 5.88, p < 0.001). Increasing the amplitude of the signal improved performance at high SNRs, but improved performance only slightly or not at all for low SNRs. The slopes of the percentage correct versus intensity functions significantly decreased as the SNR decreased (Fig. 10B; r = −0.72, n = 100, p < 0.001). Thus, our human psychophysical results complement our neurophysiological results showing that the effects of SNR are not separable from stimulus amplitude.
Discussion
Multiunit recordings from anesthetized rat A1 and human psychophysics demonstrate that the effects of background noise on neural and behavioral responses to simple (tones) and complex (speech) signals depend on both the SNR and the overall sound level. At unfavorable SNRs, increasing the absolute tone and noise levels changed FRAs by reducing firing rates for CF tones and narrowing FRA bandwidths; at higher SNRs, however, increasing tone and noise levels increased firing rates and expanded bandwidths, the behavior typically observed for FRAs obtained without background noise. Even at very favorable SNRs (e.g., +43 dB), the presence of noise resulted in significant FRA changes, though these effects required that the noise levels be sufficiently high to affect the response (i e., the tone levels were quite high, given the large fixed SNRs). These cortical tuning changes also manifested in significant decreases in how effectively the identity of different tonal stimuli could be decoded, particularly at the least favorable SNRs when both the tone and noise levels were high. Speech discriminability performance in human psychophysics showed qualitatively similar trends (i.e., performance on a speech identification task depended on the interaction between overall sound level and SNR).
The presence of background noise strongly affected tone-evoked firing rates and FRA shapes. At the least favorable SNRs we tested (i.e., +13 and +3 dB), firing rates elicited by CF tones at levels above threshold were suppressed below the firing rates obtained near threshold (Fig. 3). The effective response bandwidths 40 dB above threshold were narrower than bandwidths 10 dB above threshold (Fig. 5), largely reflecting reductions in firing rates, such that fewer frequencies were associated with significant responses relative to the baseline firing rates measured in the background noise context. These bandwidth-related changes differed in high-SNR conditions, where joint increases tone and noise levels generally increased firing rates to CF tones (Fig. 3) and increased bandwidths (Fig. 5), the pattern of results observed without noise. The interaction we observed between SNR and tone level could not be explained by noise-driven linear threshold shifts (Fig. 4)—essentially, louder noises were not compensated for as effectively as quieter noises. Thus, measurements of system behavior across SNRs at a fixed tone or noise level cannot encompass the complexities of the behavior we observed neurophysiologically.
Although our data do not provide a direct link between reductions in the information about tone identity in the presence of loud background noise and decreases in speech recognition performance under similar conditions, the inseparable relationship between SNR and absolute level we demonstrate here for neurophysiology and psychophysics suggests a potential link. Given the obvious differences in the stimuli (tones vs speech) and the response measures (spiking patterns and behavioral reports of perceived words), the similarity in the relationships between SNRs and sound levels for the two experiments is remarkable. Decreasing the SNR decreased spike train decoding of tone frequency more strongly at levels well above the tone-in-noise threshold than at levels at or near it (Fig. 7). Human speech discrimination performance in noise at fixed SNRs depended similarly on the interaction between the speech level and SNR (Fig. 10). For example, increasing speech levels >40 dB increased speech discriminability at SNR+10 dB, but not at SNR−5 dB, referenced to SNR values, the psychophysical performance we observed (which included negative SNRs) exceeded spike train decoding performance at all tested multiunit sites and stimulus levels. This result is not surprising, perhaps, since humans presumably use much larger neural populations to perform the task, and increased population sizes are commonly associated with increased decoding performance (Ince et al., 2013; Malone et al., 2015a). However, we should stress that the discrimination of static tone frequency and level is likely subserved by different mechanisms and structures than the recognition of words in noise.
In contrast to the inseparable interactions described in previous paragraphs, response onset latencies to tones and noise-driven steady-state firing rates did not demonstrate an interaction between absolute levels and SNRs. Increasing tone and noise levels resulted in shorter latency responses regardless of the SNR (Fig. 8). Decreasing the SNR resulted in longer latency responses (Fig. 8) at all absolute levels. Thus, not all features of cortical responses are inseparable with respect to absolute levels and the SNR.
Our findings cannot easily be explained by threshold-shift models of noise influences based on prior neurophysiological experiments. Previous studies in auditory cortex proposed that noise shifts neural response thresholds in a ∼1:1 fashion (Phillips, 1985, 1990; Phillips and Cynader, 1985; Phillips and Hall, 1986; Ehret and Schreiner, 2000; Liang et al., 2014). We observed a noise suppression greater than that predicted by a 1:1 shift (Fig. 4) for the loudest noises we tested. Importantly, our study used a wider range of noise levels that were typically used in most prior studies. Human psychophysical studies have shown the behavioral effects of noise to be similar to 1:1 shifts at low sound levels but increase nonlinearly as the stimulus level increases of >50 dB (French and Steinberg, 1947; Studebaker et al., 1999; Dubno et al., 2005). EEG studies in humans have found limited support for a neural interaction between stimulus level and SNR (Whiting et al., 1998; Baltzell and Billings, 2014), but the studies either did not use an experimental design that enabled quantifying the interaction (Whiting et al., 1998) or only tested a small difference in stimulus levels (Billings et al., 2009; Baltzell and Billings, 2014). Rats perform similarly to humans for speech-in-noise tasks (Shetake et al., 2011), so providing a detailed description of the interaction between stimulus level and SNR at the neural level may aid in confirming corresponding neural correlates in humans, including those suggested by our own psychophysical results.
Tones or speech in noise are not the only stimuli for which neural and behavioral responses are determined by the interaction between stimulus level and other parameters. In the auditory system, binaural interactions (Semple and Kitzes, 1993), stimulus history effects (Malone and Semple, 2001; Malone et al., 2002; Bartlett and Wang, 2005; Scholl et al., 2008), and PSTH shape (Malone et al., 2007, 2010) all depend on the stimulus level. In the visual system, an analogous interaction has been observed for the perception of gratings as a function of the interaction among contrast, luminance, and motion (Gepshtein et al., 2013). Cortical and behavioral responses across stimuli and sensory modalities are not defined only by which stimulus is presented but also by the strength, or intensity, of presentation.
Several potential mechanisms for the interaction between stimulus level and SNR are possible. Subcortical auditory stations respond nonlinearly to stimuli in noise (Costalupes et al., 1984; Gibson et al., 1985), so the changes we observed in the cortex may result from changes in subcortical activity alone or in combination with cortical effects (Barbour and Wang, 2003; Kvale and Schreiner, 2004; Dean et al., 2005; Rabinowitz et al., 2011, 2013).
Context-dependent changes to the functional architecture could support the concept of dynamic neural reweighting of excitatory and inhibitory strengths within receptive fields, which is likely invoked in response to figure–ground separation (Calford et al., 1993), and in reaction to mild (Rajan, 1998; Rajan and Irvine, 1998) or severe (Irvine and Rajan, 1997) hearing loss. However, synaptic inhibition is less likely to underlie the nonlinear interaction because, at least at low noise intensities, fast-spiking inhibitory interneurons and pyramidal neurons respond similarly to tones in noise, suggesting that noise suppresses responses before the recruitment of cortical inhibition (Liang et al., 2014). Discriminating between these mechanisms will require determining the input currents in the cortex and subcortical stations.
The average response properties reported here may obscure some local diversity in noise influences. Background noise effects can differ systematically between functionally distinct A1 neuron types (Ehret and Schreiner, 1997, 2000). Therefore, our reported findings—based on multiunit recordings—should only be considered at a population level.
Although the auditory system is remarkably capable of handling a range of signal and noise levels, noise at high levels is more difficult to overcome, even at similar or identical SNRs. Therefore, the best sound level for speech discrimination depends on the SNR. In the absence of noise, increasing the speech level from low to moderate values can increase discrimination. At high SNRs, when the noise is low, jointly increasing the speech and noise levels increased discrimination; at low SNRs, when the noise is high, jointly increasing the speech and noise levels did not (Figs. 7, 10) and instead degraded the neural representation of sounds (Figs. 3, 5). Therefore, noise at high levels is more difficult to overcome for a given SNR.
We conducted our study in humans and rats with normal hearing, but speech-in-noise discrimination is more difficult for humans with reduced dynamic ranges, such as patients with hearing loss or presbycusis. Further work is necessary to understand how the interaction between speech level and SNR changes following a reduction in dynamic range. By understanding the basis by which noise disrupts neural representations, we may uncover better strategies to improve understanding speech in noise for the hearing impaired, and inspire changes to audiologic practice that assume the effects of a range of SNRs can be generalized across varying absolute sound levels.
Footnotes
↵*M.J.T. and B.A.S. are joint lead authors.
This research was supported by Deutsche Forschungsgemeinschaft Grant TE 845/1-1 (M.J.T.); National Institutes of Health (NIH) Grants F31-DC-012719 (B.A.S.), RO1-DC-011843 (B.J.M.), and RO1-DC-002260 (C.E.S.); Hearing Research Inc. (San Francisco, CA); and the John C. and Edward Coleman Memorial Fund.
The authors declare no competing financial interests.
- Correspondence should be addressed to Christoph E. Schreiner, 675 Nelson Rising Lane, Room 514C, Center for Integrative Neuroscience, School of Medicine, University of California, San Francisco, San Francisco, CA 94158-0444. chris{at}phy.ucsf.edu