Abstract
Neurons in the auditory system respond to recent stimulus-level history by adapting their response functions according to the statistics of the stimulus, partially alleviating the so-called “dynamic-range problem.” However, the mechanism and source of this adaptation along the auditory pathway remain unknown. Inclusion of power-law dynamics in a phenomenological model of the inner hair cell (IHC)–auditory nerve (AN) synapse successfully explained neural adaptation to sound-level statistics, including the time course of adaptation of the mean firing rate and changes in the dynamic range observed in AN responses. A direct comparison between model responses to a dynamic stimulus and to an “inversely gated” static background suggested that AN dynamic-range adaptation largely results from the adaptation produced by the response history. These results support the hypothesis that the potential mechanism underlying the dynamic-range adaptation observed at the level of the auditory nerve is located peripheral to the spike generation mechanism and central to the IHC receptor potential.
Introduction
Although rate-level functions of individual auditory neurons show a restricted dynamic range (20–40 dB) (Sachs and Abbas, 1974), the human auditory system encodes sound levels with remarkable accuracy over a wide range of sound intensities (100–120 dB) (Viemeister, 1988). The natural acoustic environment is made up mostly of transients rather than constant stimuli. Thus, to encode efficiently using only firing rates, a neural system must change its coding strategy as the level distribution of stimuli changes. Recently, studies in the auditory midbrain (Dean et al., 2005, 2008) and cortex (Watkins and Barbour, 2008) as well as in the peripheral auditory system (Wen et al., 2009) have shown that neurons respond to recent stimulus history by adapting their rate-level functions according to the statistics of the stimulus level, substantially improving the precision of the neural population code near the region of most commonly occurring sound levels. As a result, the effective dynamic range of neurons is extended. However, the origin and mechanisms underlying adaptation of the dynamic range remain unknown.
To examine the detailed dynamics of dynamic-range adaptation, Dean et al. (2008) studied the responses of guinea pig inferior colliculus (IC) neurons using a paradigm in which the stimulus switched repeatedly between two distributions of sound levels differing in mean level. They observed that a prominent component of adaptation occurs rapidly with a time course of several hundred milliseconds; adaptation to an increase in mean level occurs more rapidly than to a decrease in mean level. The same paradigm in the auditory nerve (AN) of cats showed a similar time course of adaptation (B. Wen, personal communication). In general, the magnitude of adaptation is weaker in AN fiber responses than in the IC. However, both studies reported that the adaptation in rate-level functions occurs within ∼1 s, which clearly has implications for real-world listening conditions.
At the level of the AN, the source of adaptation in discharge rate is believed to be associated mainly with the inner hair cell (IHC)–AN synapse (Furukawa et al., 1978; Moser and Beutner, 2000; Goutman and Glowatzki, 2007). To capture multiple time courses of adaptation observed at the level of the AN, Zilany et al. (2009) recently developed a phenomenological AN model with an IHC–AN synapse section that has both exponential and power-law adaptation (PLA) functions. Although the source of power-law adaptation is not known, it was included in the synapse section of the model for simplicity. This PLA AN model accurately predicts AN responses to a wide variety of stimuli (both simple and complex) spanning the dynamic range of hearing. The PLA AN model also captures those phenomena (e.g., rapid onset adaptation and slow recovery after the stimulus offset, responses to forward masking paradigms, adaptation to increments and decrements in the amplitude of an ongoing stimulus) that have direct implications for dynamic-range adaptation. In this study, the PLA AN model was used to test the hypothesis that power-law-like adaptation in the periphery could successfully account for the amount and dynamics of the dynamic-range adaptation observed in the AN.
Materials and Methods
In this study, the responses of model AN fibers were simulated to dynamic stimuli with sound-level statistics that varied with time. The simulated responses were then compared to physiological data obtained using the same stimulus paradigms [IC: Dean et al. (2008); AN: Wen et al. (2009)].
Stimuli.
Briefly, a 5 min duration continuous tone or white Gaussian noise (bandwidth ∼25 kHz) had levels that were set every 50 ms to a new value randomly chosen from a defined distribution (as in Fig. 1A). The range of sound levels was 0–80 dB sound pressure level (SPL) for tones and 20–100 dB SPL for noise, in steps of 2 dB. The distribution of sound levels had a high probability region (HPR) of 12 dB, from which the levels were drawn with an overall probability of 0.8, and the remaining levels were selected with an overall probability of 0.2. An example sequence of sound levels is shown in Figure 1B for the distribution in Figure 1A. Rate-level functions were computed from the corresponding responses of the model AN fiber, and were fitted with the five-parameter model of Sachs and Abbas (1974) and Winslow and Sachs (1988). According to this model, the average discharge rate r (spikes/s) of an AN fiber is expressed as a function of sound pressure, P (in pascals):
where Rmin and Rmax represent the minimum and maximum firing rates, respectively, N is an exponent of sound pressure denoting the steepness of the growth in firing rate, and θ1 and θ2 are parameters describing the rate function's position along the level axis.
Stimulus paradigm. A, An example probability distribution of sound levels for the dynamic stimuli with HPR mean at 48 dB SPL. The distribution spans an 80 dB range in steps of 2 dB and contains a 12-dB-wide HPR (from 42 to 54 dB SPL) in which levels occur with an overall probability of 0.8. B, Sound level as a function of time drawn from the distribution shown in A. The sequence is shown only for 20 s. Sound levels were varied every 50 ms with no silent interval between consecutive levels of the stimulus. C, Variation of sound levels with time for the baseline paradigm. Here sound levels were drawn from a uniform distribution, and there was a 300 ms silent period between consecutive stimuli. D, Sequence of sound levels drawn from a switching stimulus. Sound levels were chosen from one distribution for 5 s before switching to the other one, giving a switching period of 10 s. The two distributions had HPR means at 75 and 51 dB SPL. Two cycles of the sequence are shown.
To quantify the amount of dynamic-range adaptation, responses were simulated for four (noise) or five (tone) different distributions with non-overlapping HPRs. For comparison, baseline rate-level functions were also constructed; noise or tone bursts (50 ms duration) in this case were separated by a 300 ms silent period, and the levels were chosen randomly from a uniform distribution (10 repetitions of each level). The range of sound levels was matched to the span of levels used for the dynamic tone or noise stimulus (Fig. 1C).
To facilitate the study of the detailed dynamics of adaptation, the stimulus was abruptly switched between two different distributions of sound levels differing in mean level (75 vs 51 dB SPL). The sound levels were chosen from one distribution for 5 s before switching to the other, which produced a switching period of 10 s. The sequence of sound levels for two cycles is shown in Figure 1D. Rate-level functions corresponding to each individual distribution were computed, and the time courses of adaptation of the mean firing rate and also of the dynamic range were examined.
Model of the auditory periphery.
The auditory-periphery model used in this study to simulate the responses to the above stimuli was developed by Zilany et al. (2009). Figure 2 shows a schematic diagram of the PLA AN model. Each section of the model, motivated by relevant physiological studies (mostly from cats), provides a phenomenological description of the major functional components of the auditory periphery, from the middle ear to the AN fiber.
Schematic diagram of the model for the auditory periphery [Zilany et al. (2009), their Fig. 2, reprinted with permission]. A, The input to the model is an instantaneous pressure waveform of the stimulus and the output is a series of AN spike times. Middle-ear filtering is followed by a signal-path (C1) filter and a parallel-path (C2) filter. The gain and bandwidth of the C1 filter are controlled by a feedforward control-path output. The IHC output drives the synapse model and the spike generator. OHC, Outer hair cell; LP, low-pass filter; NL, static nonlinearity; INV, inverting nonlinearity. COHC and CIHC are scaling constants that specify OHC and IHC status, respectively. In this study, all model responses were for the healthy cochlea (i.e., COHC = 1, CIHC = 1). B, IHC–AN synapse model: exponential adaptation [3-store diffusion model by Westerman and Smith (1988)] followed by parallel power-law adaptation (slow and fast) models. Fractional Gaussian noise is added to the slow power-law adaptation path to produce the desired distribution of SRs with only three true SR fibers (low, medium, and high). For details about the model components, see Zilany et al. (2009).
The input to the middle ear is the instantaneous pressure waveform of the stimulus (in pascals), sampled at 100 kHz. The middle-ear filter is followed by three parallel filter paths: the C1 and C2 filters in the signal path and the broad-band filter in the control path. The feedforward control path regulates the gain and bandwidth of the C1 filter (analogous to basilar membrane filtering) to account for several level-dependent properties in the cochlea (Zhang et al., 2001; Bruce et al., 2003). Based on the Kiang's two-factor cancellation hypothesis (Kiang, 1990), the output of the C2 filter is phase-shifted by 180°; this signal then provides the input to the C2 transduction function. The combined response of the two transduction functions following the C1 and C2 filters provides the input to a seventh-order IHC low-pass filter (Zilany and Bruce, 2006, 2007). The IHC output drives the model for the IHC–AN synapse, and finally the discharge times are produced by a renewal process that includes refractory effects (Carney, 1993). This model captures most of the AN nonlinearities (e.g., nonlinear tuning, two-tone suppression, shift in the best frequency with level, C1/C2 transition at high levels, adaptation) reported in the literature. The model responses were validated against measured AN responses to stimulus paradigms with a wide range of frequencies and intensities spanning the dynamic range of hearing.
One of the important AN nonlinearities relevant to the present study is the adaptation in the discharge rate of an AN fiber. Adaptation to sustained tones in mammalian AN fibers involves at least three timescales: rapid adaptation on the scale of milliseconds, short-term adaptation on the scale of several tens of milliseconds (Westerman and Smith, 1984), and slow adaptation on the scale of seconds (Kiang, 1965). In addition, after the stimulus offset, the time constant of recovery of the spontaneous rate (SR) is scaled according to the duration and level of the stimulus (Kiang, 1965). Similarly, in forward masking paradigms, the time constant of recovery of the probe responses depends on the duration and level of the masker stimulus (Young and Sachs, 1973; Harris and Dallos, 1979).
In general, the responses of a model with exponential adaptation to a unit step function settle to a steady-state value with a fixed time constant, regardless of the stimulus timescale, and the time course of recovery to SR after the stimulus offset is governed by the same time constant (Zhang and Carney, 2005). Thus, pure exponential adaptation cannot fully account for the observed timescales described above, particularly those dependent on the level and duration of the stimulus. On the other hand, power-law adaptation, which is characterized by an adaptation of response that follows a fractional power of time or frequency rather than an exponential decay (Chapman and Smith, 1963), does not have a fixed time constant. In fact, if a conventional time constant is forced on the data, the value depends on the duration of the responses being fit (Drew and Abbott, 2006). To illustrate a general model of power-law adaptation, suppose a stimulus s(t) generates a response r(t) that feeds back into an integrator I(t). The integrator suppresses the response such that the adapted output, r(t) = max[0, s(t) − I(t)], and
where α is a dimensionless constant, and β is a parameter with units of time (Drew and Abbott, 2006). The suppressive effects on the responses, I(t), is affected by past responses in a cumulative fashion, in which past responses are “forgotten” over a time course determined by the power law; this time course is intermediate between perfect (never forgotten) and exponential processes that are forgotten over a fixed time course (Drew and Abbott, 2006).
Mathematically, the long tail of the power-law kernel, f(t), provides a longer memory for past responses than does exponential adaptation. In the case of exponential adaptation,
the equivalent of α (in power-law adaptation) has units of frequency (1/τa, where τa is the time constant in seconds); thus, the transition between transient and sustained responses in exponential processes is fixed in time (Drew and Abbott, 2006). In contrast, the dimensionless constant α in power-law dynamics controls the amount of adaptation, and there is no well defined transition between transient and sustained responses. In fact, power-law dynamics can be approximated by a combination of a large number of exponential processes with a range of time constants (Thorson and Biederman-Thorson, 1974). Adaptation often shows power-law-like dynamics over longer timescales, implying the coexistence of multiple timescales in a single adaptive process (La Camera et al., 2006). Thus, power-law dynamics possess the properties that can potentially account for adaptation timescales that depend on the level and duration of the stimulus.
Although many biological systems exhibit power-law rather than exponential dependence on time, in some cases power-law adaptation alone underestimates the amount of adaptation at short-times (Drew and Abbott, 2006), and the model requires additional exponential adaptation components with small time constants to fully explain the behavior over short timescales. To include all of the timescales observed at the level of the AN, the IHC–AN synapse model has power-law adaptation following short-term exponential adaptation.
Westerman and Smith's (1988) three-store diffusion model was used to implement exponential adaptation in the synapse model. The onset response of the model AN fiber is thus governed by exponential adaptation with two time constants (rapid and short-term: 2 and 60 ms, respectively). The other parameters of the three-store diffusion model were set to produce spontaneous activity in the absence of a stimulus and rate saturation at moderate to high stimulus levels (Zhang et al., 2001). Two parallel power-law functions (slow and fast) follow the exponential adaptation in the synapse model (Fig. 2). The parameters of the slow power-law function were chosen to improve the long-term dynamics of the model AN fiber responses (e.g., recovery after the offset of a tone, responses to long duration tones, etc.) without significantly affecting the onset dynamics set by the exponential adaptation.
Several studies have demonstrated that the process of AN short-term adaptation is “additive” in nature (Smith and Zwislocki, 1975; Smith, 1977); that is, the change in firing rate (using a window of ∼10 ms or longer) in response to an increment/decrement in stimulus level does not greatly depend on the time between onset and the subsequent increment/decrement. Smith et al. (1985) further showed that this property of “additivity” also holds if the incremental responses were analyzed with a very small window length of ∼1–2 ms centered on the increment in the response. Generally, models with exponential adaptation exhibit “additivity” only for window lengths of ∼10 ms or longer [Zilany et al. (2009), their Fig. 9]. Consequently, the slow power-law function, which closely follows the onset dynamics set by exponential adaptation, fails to account for incremental “additivity” over small timescales. A second power-law function with faster adaptation was therefore introduced in the AN model (Zilany et al., 2009); this function adapts quickly, is very responsive to increments in amplitude of an ongoing stimulus, and remains mostly unresponsive over the remaining duration of the stimulus. Thus the change in discharge rate in response to an increment remains almost the same regardless of the delay between stimulus onset and the increment [Zilany et al. (2009), their Fig. 9], therefore demonstrating “additivity” in the model responses.
In general, it is the slow power-law function that is mainly responsible for the overall adaptation in the responses of the model AN fibers and thus contributed most to the responses simulated in the present study. The fast power-law component contributed very little to the overall responses of the model AN fiber, except in the increment/decrement paradigm described above. The complete AN model developed in the previous study, with both power-law components, was used without change in the simulations presented here, except where noted (e.g., to compare responses for a model with power-law and exponential adaptation to one with only exponential adaptation).
Although individualized sets of model parameters might be required to predict individual AN fiber responses accurately, the goal of this study was to determine a single parameter set that qualitatively addressed a wide range of response properties of AN fibers (Zilany et al., 2009). The selection of parameters of the power-law functions was challenging and complicated by the fact that, in contrast to exponential adaptation, the power-law has no well separated transient or sustained responses. However, two particular datasets were used to set the parameters of the slow power-law function; both of them required adaptation with longer memory, and thus had relevance to the power-law dynamics. The first one was the offset responses to a pure-tone stimulus across several sound levels (Kiang, 1965), and the other was the responses to a probe in a forward-masking stimulus paradigm (Harris and Dallos, 1979). Once the parameters for the slow power-law component were set, the parameters of the fast power-law function were then chosen by qualitatively matching the model responses with the physiological data for the increment/decrement paradigm (Smith et al., 1985). The responses of the model fibers to all of the above stimulus paradigms were compared to their physiological counterparts in Zilany et al. (2009).
Fractional Gaussian noise (fGn) added to the slow power-law adaptation path results in the desired distribution of SRs; although only three spontaneous rates are specified in the model (low, medium, and high), the variation in rate over time resulting from the fGn yields the appropriate distribution of SRs (Jackson and Carney, 2005). In this study, the same fixed set of model parameters that were used to capture a wide range of AN responses to various stimulus paradigms (Zilany et al., 2009) were used, and the responses were simulated for single AN fibers with characteristic frequencies (CFs) and spontaneous rates selected to match those in the physiological studies used for comparison. The PLA AN model is described in more detail in Zilany et al. (2009). Model code is available at the following website: www.bme.rochester.edu/carney.
Results
In this section, simulated responses of model AN fibers are presented for the stimulus paradigms described above. In addition, model responses to several simple stimuli are shown to demonstrate the dynamics of adaptation, particularly during the period after the stimulus offset.
Dynamics of recovery of spontaneous activity: effects of duration and level
At the offset of a tone pip, the AN discharge rate may drop below the SR, sometimes to the point where there is a cessation of firing, followed by a recovery with a time course on the order of several tens of milliseconds (Smith, 1977; Harris and Dallos, 1979). The magnitude of reduction in rate and the exact nature of recovery depend on the intensity and duration of the preceding stimulus (Yates et al., 1985), previous response history, and the fiber's SR (Relkin and Doucet, 1991). In general, low-spontaneous-rate (LSR) fibers show a longer recovery from prior stimulation than high-spontaneous-rate (HSR) fibers (Relkin and Doucet, 1991).
To illustrate the dependence of the dynamics of recovery after stimulus offset on stimulus parameters, Figure 3 shows the output of the PLA AN model in response to tones with various durations and levels. To avoid fluctuations in the output and to emphasize the relevant response details for these simulations, fractional Gaussian noise was not included in the model for this illustration. Note that for the purpose of illustration, the output of the synapse is shown here, rather than the output of the discharge generator; however, for all other figures, the output of the discharge generator is illustrated. Figure 3A shows the responses of the model to the tone stimulus (tone at CF = 10 kHz, 12 dB above threshold) with different durations, but with a fixed interstimulus interval of 200 ms. Note that the next repetition of the signal was always delivered after the interstimulus interval of 200 ms, regardless of the recovery from adaptation during the preceding silent period. The rationale behind using a fixed interstimulus interval was to examine whether signal duration had any effect on the relatively steady-state part of the PLA model responses, in addition to the expected effect on the dynamics of recovery during the interstimulus interval period. The signal durations were 50, 100, 200, 500, and 1000 ms. Responses to 10 repetitions of the stimulus were averaged. The dotted line indicates the SR of the fiber. For signals with durations less than ∼200 ms, a 200 ms silent interval was adequate for near-complete recovery to SR, whereas longer duration signals required longer interstimulus intervals for complete recovery from adaptation. Because power-law adaptation has a long memory for past responses, the accumulated suppressive effects are stronger for longer duration signals. Thus, the time course of recovery after signal offset for the PLA model varied according to the duration of the signal, although there was no substantial difference in the time course of adaptation at the onset. In contrast, the recovery to SR in the exponential adaptation model (results not shown) occurs over a constant time period regardless of the duration of the signal, although the amount of adaptation during the signal may vary, especially for stimuli shorter in duration than the exponential time constant. Note that the relatively steady-state part of the PLA model response was noticeably reduced in response to longer duration signals because the responses did not fully recover from adaptation during the interstimulus interval. This effect of stimulus history on the average response rates underlies the dynamic-range adaptation illustrated below.
Illustrations of the effects of duration and level of the signal on the recovery to spontaneous activity. A, Model synapse output in response to the tone at CF (10 kHz) with durations varying from 50 ms to 1 s, but with a fixed interstimulus interval of 200 ms. The stimulus level was 12 dB above threshold. Dotted line represents the SR of the fiber. The time course of recovery is scaled according to the duration of the stimulus. B, Response of the synapse model to a tone at CF with levels varying from 30 to 80 dB SPL in steps of 10 dB. The CF of the fiber was 550 Hz, and it had a medium SR. The duration of the tone was 50 ms with an interstimulus interval of 400 ms. Only the recovery part of the response (i.e., 50 to 450 ms) is shown. Dotted line indicates the SR of the fiber. Again, the time course of recovery is scaled according to the level of the stimulus.
Figure 3B shows the responses of the synapse model to a tone (tone at CF = 550 Hz, medium SR fiber) with levels varying from 30 to 80 dB SPL in steps of 10 dB. The duration of the tone signal was 50 ms, and the stimulus was repeated every 450 ms (i.e., the silent period was 400 ms). Model responses were averaged for 10 repetitions of the stimulus. To show the dynamics of recovery after stimulus offset, responses are shown only for the 50–450 ms time window. The time course of recovery after stimulus offset was also scaled here according to the level of the stimulus, because higher-level signals had stronger suppressive effects on the responses after the stimulus offset. Further, responses to lower-level signals recovered quickly to SR, and thus the subsequent response to the stimulus was not affected by previous stimulation history. On the other hand, responses to higher-level signals did not fully recover by the end of the interstimulus interval, and the subsequent response was thus affected.
Figure 4A shows the poststimulus time histograms of an HSR AN fiber with CF equal to 1.82 kHz on the left, and an LSR AN fiber with CF equal to 10.34 kHz on the right (from Kiang, 1965). The stimulus was a 500 ms tone followed by a 500 ms silent period, and the histograms are shown for 120 repetitions of the stimulus. The responses of the model (i.e., discharge generator output) to the same stimulus paradigm are shown in Figure 4B. In general, model responses closely resembled the physiological data, including the observation that the time course of recovery for the LSR fiber was longer than that of the HSR fiber. However, in contrast to the physiological responses, the steady-state rate of the model LSR fiber was lower than that of the model HSR fiber. The difference in rates is due to the fact that for both model fibers, responses were simulated to tones (at CF) for a level of 25 dB SPL, although the threshold of the model HSR fiber is ∼15 dB lower than that of the model LSR fiber.
Poststimulus time histograms of two AN fibers with CFs of 1.82 kHz (HSR) and 10.34 kHz (LSR). A, The physiological data from cat [Kiang (1965), their Fig. 6.1, reprinted with permission]. B, The model responses to the same paradigm. The stimulus was a 500 ms tone at CF followed by an interstimulus interval of 500 ms. The responses are shown for 120 repetitions of the stimulus. Model histograms are shown for a tone level of 25 dB SPL [Zilany et al. (2009), their Fig. 5, reprinted with permission].
Responses of an AN fiber to stimuli with different sound-level statistics
Figure 5 shows the responses of a high SR AN fiber (CF at 550 Hz) to pure tones at CF and noise stimuli for various stimulus-level distributions with different HPR levels as well as for the baseline level distribution. Figure 5A shows the physiological data from a cat AN fiber to tone stimuli, B and C represent the AN model responses to tones for a synapse model with only exponential adaptation and with power-law functions following exponential adaptation, respectively, and D shows the model responses to broadband noise stimuli. Because the threshold of the model fiber was substantially lower than that of the physiological AN fiber, an additional stimulus-level distribution with HPR mean at 24 dB SPL was simulated. The rate-level functions in the top row were fitted with the five-parameter model of Sachs and Abbas (1974) and Winslow and Sachs (1988). For both physiological data and model responses, the baseline paradigm showed the highest firing rates at all levels, and the rate decreased as the HPR mean level increased. The rate decrement for the synapse model with only exponential adaptation was asymmetric; lower sound levels showed less decrement from the baseline firing rate compared to the decrement at higher sound levels. This observation is consistent with the fact that adaptation from the preceding sound level mostly affects the onset response of the subsequent stimulus (for the model with only exponential adaptation, the time constant of recovery is very small compared to the duration of the signal of 50 ms). Because the onset responses were more prominent for higher levels than for levels near or below threshold, higher-level responses showed more suppression than lower-level responses for the model with only exponential adaptation. Note that it was not possible to make analogous comparisons using a model with only power-law adaptation because such a model does not exhibit appropriately saturating rate-level curves.
Responses of an HSR AN fiber with CF of 550 Hz to stimuli with differing HPR levels. A, The physiological data from cat [Wen et al. (2009), their Fig. 2, reprinted with permission]. B–D, The responses of the AN model to tone (B, C) and broadband noise (D) stimuli. B and C represent the responses of the AN with a synapse model having only exponential adaptation and power-law following exponential adaptation, respectively. Shown are rate responses to four (A) or five (B, C) different sound-level distributions for tones with HPR means of 24, 36, 48, 60, and 72 dB SPL, and for noise stimuli (D) with HPR means of 48, 60, 72, and 84 dB SPL. Because the threshold of the model fiber is substantially lower than the threshold of the physiological fiber, model responses were simulated to tones with an additional sound-level distribution with HPR mean of 24 dB SPL. For comparison, the baseline rate-level function is also shown. Upper panels, Dots represent measured (A) and simulated (B–D) rates; lines, rate-level functions fitted with the five-parameter model. Black dots and lines indicate baseline responses; the rate-level functions depicted by other colors are for distributions with different HPRs marked by the same color on the level axis with a large circle indicating the mean. Middle row, Rate-level functions normalized by the maximum rate of the corresponding fitted functions. Vertical dashed line indicates the sound level at which neural firing rate reaches 50% of the maximum value. Lower panels, The sound levels corresponding to 50% rate evaluated in the middle row plotted as a function HPR mean level. Solid line indicates the least-square fit, and the dashed line represents the slope of 1. Slope of the fitted function is also shown.
Each rate-level function was normalized to the minimum and maximum of the corresponding fitted curve, as shown in Figure 5 (middle row). The rate-level functions were shifted toward the HPR levels of the corresponding distribution, implying an adaptation in the dynamic range of the neuron in addition to the classical firing-rate adaptation phenomenon, which is characterized by a reduction in rate without a change in the operating point. However, pure exponential adaptation in the synapse model showed only classical firing-rate adaptation, with no noticeable shift in the dynamic range (Fig. 5B). Thus, the adaptation in dynamic range observed in Figure 5C mostly resulted from the power-law adaptation in the model.
The sound level that elicits 50% of the normalized rate (L50) is shown in the lower panels of Figure 5 as a function of mean HPR level. For physiological data and model responses (except with only exponential adaptation in the synapse), the dynamic range shifted almost linearly with the mean of the HPR. The slope of this function quantifies the strength of adaptation in the dynamic range of the fiber. Similar to the results reported in Wen et al. (2009), rate-level functions of the model fiber to tones exhibited higher slope than the corresponding responses to broadband noise stimulus. The rate of shift produced by the AN model was smaller than the rate of shift reported in the physiological example shown here (Wen et al., 2009); recall that the model was not fine tuned to match the responses of any particular AN fiber. The model, nevertheless, captured the main characteristics of the responses to the dynamic stimuli. In general, model responses were similar across CFs and SRs (the model responses in Fig. 6 are shown for a fiber with different CF and SR than in Fig. 5). The model rate of shift to CF tones (0.20 dB/dB) was in the lower range of the physiological data (which varies from 0.16 to 0.47 dB/dB), whereas the shifts in the model responses to broadband noise (0.19 dB/dB) were near the middle of the range of AN data (0.05–0.39 dB/dB).
Precision of level coding for a model AN fiber (550 Hz, HSR) in response to broadband stimuli for four distributions of sound levels with HPR means at 48, 60, 72, and 84 dB SPL. A, Slopes of the fitted rate-level functions from Figure 5D. B, SDs of rates across repetitions (20 outside HPR levels and 388 for HPR levels) as a function of sound level. Dots represent computed SDs, and the solid lines indicate the smoothed curves using a 7-point moving average method. C, The sensitivity index δ′ at each level was obtained by dividing the slope of the mean firing rate by the corresponding smoothed rate SD. D, Neural JNDs at HPR mean level (filled circles) and minimum JNDs (open squares) for four distributions of sound levels as a function of HPR mean level. Neural JND at HPR mean of 84 dB SPL was ∼50 dB (not shown).
To quantify the precision of level coding based on the discharge rates of a single AN fiber, Colburn et al. (2003) used a sensitivity index referred to as the “sensitivity per decibel” (δ′ in dB), which is defined as the slope of the mean firing rate divided by the SD of rate, assuming that no substantial change occurs in the SD for the small increment in level. Figure 6 illustrates the different stages of computation of δ′ from the responses of an AN fiber (550 Hz, HSR) to broadband stimuli. The rate-level functions of this fiber are shown in Figure 5D. The slopes were computed from the fitted rate-level functions for four distributions of sound levels and are shown as a function of sound level in Figure 6A. In general, the magnitude of peak slope decreased with HPR mean level, and the location of peak was also shifted toward the HPR levels. Figure 6B shows the SD of rates as a function of sound level. The SDs were smoothed using a 7-point moving average method. Trends in the rate SDs were similar to those for mean rate, that is, SDs grew quickly with sound level and then saturated at higher levels. Although the dynamic range of rate SDs was shifted toward the HPR levels, there was no substantial difference across sound-level distributions when plotted against mean firing rate rather than sound levels (results not shown). For each distribution, the sensitivity index was computed at each level using smoothed SDs, and the resulting δ′ is shown as a function of sound level in Figure 6C. The level at which the peak of the sensitivity occurred was shifted toward the HPR mean level. Based on the rate information available in the responses of a single AN fiber, the neural just-noticeable difference (JND) in level can be approximated by the reciprocal of δ′ (Colburn et al., 2003). Figure 6D shows the minimum neural JND in level (the inverse of the peak of δ′) and the JND at HPR mean level (the inverse of δ′ at HPR mean level) as a function of HPR mean level. The responses to two sound-level distributions with HPR means at 48 and 60 dB SPL showed neural JNDs (at their respective HPR mean levels) close to the minimal JNDs, implying that these dynamic stimuli would be optimally coded by this particular AN fiber. On the other hand, the neural JNDs were very large at HPR means of 72 and 84 dB SPL because, in response to these stimuli, the shift in the dynamic range was not sufficient and also the mean firing rates were saturated within the HPR. These results are consistent with comparable analyses of the AN responses (Wen et al., 2009). Although the minimal neural JNDs for both measured and model AN fibers were large compared to psychophysical JNDs (Viemeister, 1983, 1988), it is important to note that the peripheral auditory system is clearly a multiple-channel structure, and thus cross-channel combination of information might be required to relate psychophysical level discrimination to peripheral physiology.
Responses to switching stimulus
Figure 7 shows the responses of a model AN fiber (medium SR and CF at 10 kHz) to the switching stimulus, in which the stimulus-level statistics were abruptly varied between two distributions. The sound levels spanned from 10 to 96 dB SPL and the two distributions of noise stimuli with HPR means at 75 and 51 dB SPL were alternated every 5 s. To study the dynamics of adaptation of the mean firing rate, discharge rates were averaged across all switching periods and are shown in Figure 7A. The time constant was determined separately from each half-period of the switching stimulus by fitting an exponential function. After switching, adaptation in mean firing occurred within several hundreds of milliseconds. Also, adaptation to an increase in mean sound level occurred more rapidly than to a decrease in mean sound level (137 ms vs 294 ms), consistent with observations from the physiological studies (Dean et al., 2008; B. Wen, personal communication). AN responses to a simple increment/decrement paradigm also show similar behavior (Smith et al., 1985).
Responses of the model AN fiber (CF at 10 kHz, medium SR) to the switching stimulus. A, Time course of adaptation in mean firing rate. Mean firing rate of the fiber averaged over 72 cycles of the switching stimulus (period 10 s). The red and blue lines indicate the single exponential fits to the firing rates during the 75 dB and 51 dB half-periods of the switching stimulus, respectively; the time constant of the fit is shown above the curve. B, Rate-level functions for the distributions with HPR means at 75 and 51 dB SPL. These rate-level functions were obtained for four consecutive 300 ms time epochs immediately after the switch (denoted by solid lines with symbols) and also from the last 3 s of the corresponding 75 dB (red solid line) and 51 dB (blue solid line) half-periods of the switching stimulus. The black solid line shows the baseline rate-level function. C, Illustration of the slow component of adaptation. Firing rates were evaluated over each switching period and are shown here as a function of time. The dotted line represents the single exponential fit to the firing rates, and the time constant is shown above the curve.
Figure 7B shows the rate-level functions of the fiber corresponding to each distribution of sound levels of the stimulus; for comparison, the baseline rate-level function is also presented. For consistency with the physiological studies (Dean et al., 2008), rate-level functions of the model fiber were obtained during the final 3 s of 75 dB and 51 dB half-periods of the switching stimulus. To reveal how quickly dynamic-range adaptation occurred, rate-level functions of the model fiber were also obtained for four consecutive 300 ms time epochs immediately after the switch for both distributions of sound levels. These 300 ms rate-level functions are shown only for the HPR levels due to the lack of a sufficient number of repetitions for the sound levels outside the HPR (during a 300 ms epoch, there were ∼50 repetitions of every sound level from the HPR compared to only ∼3 repetitions outside the HPR). For both distributions, the dynamic-range adaptation occurred within ∼1 s after the switch. This observation has implications for real world listening conditions; if the dynamic-range adaptation were too slow, neural responses would not adjust in time to achieve the improvement in coding of the HPR levels. In contrast to the asymmetry reported in the time course of adaptation of the mean firing rate, there was no substantial difference in the time course of adaptation of the rate-level function for the switches in the level distribution in the upward or downward direction, consistent with physiological results (Dean et al., 2008; B. Wen, personal communication).
Finally, to investigate whether the model responses had any slow components of adaptation, the responses of the model are plotted as a function of time (Figure 7C). Each point in the curve represents a rate response measured over a switching period (10 s). A slow component of adaptation was evident in the model AN responses, whereas in the IC, only 36% of the recorded neurons showed this long-term adaptation (Dean et al., 2008). Also the time constant of this slow adaptation was smaller in the IC than in the AN model responses.
Dynamic-range adaptation for static versus dynamic stimuli
Smith (1977) observed a constant decrease in AN firing rate without any noticeable shift in the dynamic range in response to a brief probe tone preceded by a fixed-level adapting tone. The stimulus paradigm included a long silent interval (∼1–2 s) between each presentation of the stimulus pair to avoid any long-term adaptation effects. However, other AN studies (Costalupes et al., 1984; Gibson et al., 1985) that used a continuous background stimulus reported a decrease in firing rate along with a substantial shift in the dynamic range. This shift in the dynamic range was attributed to two-tone suppression based on the observation that in a simultaneously gated background stimulus (with a long silent interval between presentations), the CF tone rate-level function produced a similar shift in the dynamic range with no substantial decrease in maximum firing rate (Costalupes et al., 1984). However, a small but significant shift in the dynamic range and a decrease in firing rate were also observed in response to an “inversely gated” background stimulus (i.e., similar to the baseline paradigm except the silent period was replaced by the static background). Gibson et al. (1985) quantified the strength of this dynamic-range adaptation by the rate of shift (L50 slope, dB/dB), and the values were largely within the range reported for the dynamic stimuli (Wen et al., 2009). This observation raises a possibility that the same mechanism may underlie the dynamic-range adaptation for both dynamic stimuli and for an “inversely gated” static background.
The stimuli used in this study had no silent periods between successive levels of the stimulus, and the sound levels varied rapidly (every 50 ms). Therefore, in addition to the adaptation produced by the response history (i.e., short- and long-term adaptations), the rapid modulation of the stimulus amplitude could also contribute to the observed dynamic-range adaptation. To examine the contributions of these two mechanisms, AN model responses were simulated for an “inversely gated” static background. This paradigm effectively removed the rapid modulation of the stimulus amplitude (50 ms vs 350 ms). For comparison with the rate-level functions for the dynamic stimuli, the background levels were chosen as the mean of the HPR levels (36, 48, 60, and 72 dB SPL). Figure 8 shows the rate-level functions for the dynamic stimuli with different HPRs (dotted line, fitted functions from the top panel of Fig. 5C) and also for the stimuli with an “inversely gated” static background of similar mean levels (solid lines with symbols). The rate-level functions were essentially the same, suggesting that the decrease in firing rate and the shift in the sensitivity produced by dynamic stimuli resulted mainly from the adaptation produced by previous responses (due to the power-law adaptation).
Comparison of the AN model responses to dynamic vs static stimuli. Responses were simulated for an HSR AN fiber with CF at 550 Hz. Black thick line indicates the fitted rate-level function for the baseline paradigm. Dotted lines are the fitted rate-level functions for the dynamic stimuli with HPR means at 36, 48, 60, and 72 dB SPL (from Fig. 5C). Solid lines with symbols represent responses to static stimuli, in which the stimulus paradigm was the same as the baseline paradigm, except the silent period was replaced by the mean levels of the HPRs.
The property of AN “additivity” refers to the fact that the change in firing rate (over a window of ∼10 ms or longer) in response to an increment/decrement in stimulus level does not greatly depend on the time between the onset and the subsequent time of the change in level. This property must hold true for the AN responses in the present study because the rate responses were computed over a window of 50 ms. The AN model used in this study also successfully captured this phenomenon [Zilany et al. (2009), their Fig. 9]. Thus, it is unlikely that the fluctuation of sound levels, at least for the stimulus durations used in this study, contributed to dynamic-range adaptation. This argument is further supported by the observation that, despite the rapid modulations of stimulus amplitude, the responses of the model fiber with only exponential adaptation did not show any dynamic-range adaptation other than classical firing-rate adaptation.
Discussion
In this study, AN responses to dynamic stimuli with different HPRs were simulated in an effort to identify the potential source and mechanisms that underlie the adaptation in dynamic range. AN fibers, regardless of their CFs or SRs, show dynamic-range adaptation by shifting their rate-level function toward the most frequently occurring sound levels for both CF tone and broadband noise stimulus. The adjustment in the dynamic range significantly improves the coding precision of the HPR levels of the dynamic stimulus (Dean et al., 2005; Wen et al., 2009). Adaptation of the mean firing rate was faster compared to the time course of adaptation of the dynamic range; however, these time courses were independent of the durations over which sound levels (50 ms) or sound-level statistics (5 s) varied. Inclusion of power-law-like dynamics in the phenomenological model of the IHC–AN synapse successfully explained much of the dynamic-range adaptation, including the time course of adaptation.
Implications for the dynamic-range problem
Because natural stimuli vary over a wide range of timescales, it is difficult to predict what stimulus duration or level will be encountered in any given situation. The best way to deal with such variety is to let the temporal statistics of the stimuli encountered set the dynamics of adaptation. Power-law adaptation, in which the influence of past activity decays but is not forgotten, inherently possesses this flexibility, and thus the recovery time is not fixed but is scaled with the duration and strength of the stimulus.
Although the shifts in the AN dynamic range seem inadequate to provide robust coding of changes in sound levels over a wide range, higher auditory centers can further extend the sensitivities of the neurons (Dean et al., 2008; Watkins and Barbour, 2008). The rate-level functions of AN fibers are universally monotonic, whereas nonmonotonic rate-level functions arise centrally. Neurons with monotonic and nonmonotonic rate-intensity functions may use different strategies for encoding a wide range of input levels. Cortical neurons with monotonic rate-intensity functions exhibit dynamic-range adaptation and are thus able to encode higher sound levels at the expense of encoding low sound levels, whereas nonmonotonic neurons in auditory cortex (which are intensity tuned) maintain fidelity for encoding relatively lower sound levels (Watkins and Barbour, 2008). Together these neurons can represent a wider range of sound levels spanning the dynamic range of mammalian hearing.
In contrast to the adaptation in an individual neuron, Pouille et al. (2009) argued that the recruitment of different sets of neurons as a function of the input strength would enable the population as a whole to represent a wider input range. They found that the amplitude of the EPSC necessary for rodent hippocampal pyramidal cells to reach the threshold for an action potential was dynamic and increased with the strength of the input. This dynamic response property was achieved instantaneously through a feedforward inhibitory circuit, rather than relying on the previous history of the network through a negative feedback mechanism (Pouille et al., 2009).
Mechanisms for dynamic-range adaptation
Although power-law dynamics are increasingly common in descriptions of sensory adaptation, their physical basis remains unknown. There is no direct evidence to conclusively establish the mechanism underlying AN dynamic-range adaptation; however, a number of potential sources can be speculated on the basis of the characteristics of adaptation. Stimulation of medial olivocochlear (MOC) efferents (Guinan, 2006) and two-tone suppression (Javel, 1981; Delgutte, 1990) can shift the dynamic range of AN fibers; however, they are discounted as being a potential source because of their dependence on cochlear amplifier gain (Patuzzi, 1996, Guinan, 2006), which varies across CF (Cooper and Yates, 1994). Also, the effects of MOC efferents that directly modulate the outer-hair-cell activity are generally weak under anesthesia. With regard to the time course of adaptation, MOC efferent effects increase and decay with mean time constants of ∼277 and ∼159 ms, respectively (Backus and Guinan, 2006), which are within the range of dynamics reported for the AN and IC responses to a switching stimulus. However, the AN and IC data showed the opposite trend with a significantly faster adaptation to an increase in mean level than to a decrease in mean level. The lateral olivocochlear efferent, which also modulates AN fiber activity, is a less likely mechanism because of their sluggish dynamics (operation in the range of minutes) (Guinan, 2006).
At the level of the AN, the source of adaptation is generally associated with the IHC–AN synapse complex, although adaptation has also been reported in the voltage responses of IHCs (Kros and Crawford, 1990) and in the AN fiber membrane (Zhang et al., 2007). Several studies of IHCs have found that the exocytosis of vesicles from the ribbon synapse exhibits multiple kinetic components (Moser and Beutner, 2000; Spassova et al., 2004; Nouvian et al., 2006). Although the depletion of a “readily releasable” pool of neurotransmitters occurs on a timescale similar to AN short-term adaptation, longer depolarization can also yield slower kinetic components of exocytosis. Using simultaneous recordings from IHCs and AN fiber terminals, Goutman and Glowatzki (2007) observed that during 1 s depolarizations, the time course of transmitter release was better described by three exponential transient components (with time constants of ∼2, ∼18, and ∼176 ms) in addition to a sustained component. Spassova et al. (2004) also argued that adaptation and recovery from adaptation of sound-evoked chick cochlear nerve discharges follow time courses similar to the exhaustion and recovery of the “readily releasable” pool. On the other hand, endocytosis (i.e., recycling of vesicles) that ensures the subsequent supply of vesicles occurs on a longer timescale (∼7.5 s) (Moser and Beutner, 2000). The presence of multiple contributing processes (e.g., depletion of “readily releasable” pool, replenishment from cytoplasmic pool, endocytosis, etc.) with a diverse range of time constants is consistent with the success of power-law dynamics in describing the overall adaptation.
In addition to the presynaptic mechanisms, postsynaptic receptor desensitization can also contribute to adaptation (Raman et al., 1994; Goutman and Glowatzki, 2007). Spike-rate adaptation was also observed in AN fiber responses to stimulation by a cochlear implant using high-rate pulse trains (Zhang et al., 2007). Some studies suggest that the site of power-law adaptation has been located in the conversion of the receptor potential into action potentials (French and Torkkeli, 2008). In cockroach tactile spine, French (1984) observed no detectable adaptation in the receptor potential, whereas power-law adaptation exists in the spike trains of the associated somatosensory neurons (Chapman and Smith, 1963). Even direct electrical stimulation, which bypasses the mechanotransduction stage, produced the same power-law adaptation (French, 1984), suggesting that postsynaptic membrane dynamics might underlie the observed adaptation.
Most of the IHC studies and the electrical stimulation experiments mentioned above recorded the responses to simple stimuli and investigated the behavior at the onset or during the stimulus period. However, there is an abundance of experimental AN data describing adaptation to various acoustic stimulus features, such as responses after stimulus offset, forward-masking, and increment/decrement paradigms. Recordings from different sites using these paradigms would help to elucidate the degree of contribution by synaptic and membrane mechanisms to the adaptation observed with acoustic excitation.
Footnotes
This research was supported by National Institutes of Health–National Institute on Deafness and Other Communication Disorders Grant R01-01641. We thank Dr. Paul Nelson and the members of the Carney laboratory for comments on an earlier version of the manuscript. The suggestions of two anonymous reviewers were invaluable in improving this manuscript.
- Correspondence should be addressed to Muhammad S. A. Zilany, Department of Neurobiology and Anatomy, University of Rochester, 601 Elmwood Avenue, Box 603, Rochester, NY 14642. msazilany{at}gmail.com