WWW.JNEUROSCI.ORG
-
The Journal of Neuroscience Discover www.zeiss.de/functionality
 QUICK SEARCH:   [advanced]


     
-


HOME
  |  
SEARCH  |   ARCHIVE  |   SUBSCRIBE  |   CONTACT  |   HELP

The Journal of Neuroscience, December 17, 2003, 23(37):11489-11504

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an eLetter
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (23)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Escabí, M. A.
Right arrow Articles by Schreiner, C. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Escabí, M. A.
Right arrow Articles by Schreiner, C. E.

 Previous Article  |  Next Article 

Behavioral/Systems/Cognitive
Naturalistic Auditory Contrast Improves Spectrotemporal Coding in the Cat Inferior Colliculus

Monty A. Escabí,1 Lee M. Miller,3 Heather L. Read,2 and Christoph E. Schreiner4

1Department of Electrical and Computer Engineering and Biomedical Engineering Program and 2Department of Psychology, University of Connecticut, Storrs, Connecticut 06269-2157, 3Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720, and 4W. M. Keck Center for Integrative Neuroscience, University of California, San Francisco, California 94143


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Statistical analysis of natural sounds and speech reveals logarithmically distributed spectrotemporal modulations that can cover several orders of magnitude. By contrast, most artificial stimuli used to probe auditory function, including pure tones and white noise, have linearly distributed amplitude fluctuations with a limited average dynamic range. Here we explore whether the operating range of the auditory system is physically matched to the statistical structure of natural sounds. We recorded single-unit and multi-unit neuronal activity from the central nucleus of the cat inferior colliculus (ICC) in response to dynamic spectrotemporal sound sequences to determine whether ICC neurons respond preferentially to linear or logarithmic spectrotemporal amplitudes. We varied the intensity, dynamic range, and contrast statistics of these sounds to mimic those of natural and artificial stimuli. ICC neurons exhibited monotonic and nonmonotonic contrast dependencies with increasing dynamic range that were independent of the stimulus intensity. Midbrain neurons had higher firing rates and higher receptive field energies and showed a net improvement in spectrotemporal encoding ability for logarithmic stimuli, with an increase in the mutual information rate of ~50% over linear amplitude sounds. This efficient use of logarithmic spectrotemporal modulations by auditory midbrain neurons reflects a neural adaptation to structural regularities in natural sounds and likely underlies human perceptual abilities.

Key words: contrast; modulation depth; inferior colliculus; spectrotemporal; reverse correlation; mutual information; natural sounds


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
A central hypothesis of sensory coding asserts that sensory systems efficiently make use of statistical structure inherent in natural signals. The possibility that sensory systems are adapted for encoding natural signals has been a topic of discussion since the early work of Barlow (1953Go, 1961Go). Recent work has revealed that natural visual (Ruderman and Bialek, 1994Go; Dong and Atick, 1995Go; Ruderman, 1997Go) and acoustic signals (Voss and Clarke, 1975Go; Attias and Schreiner, 1998Go; Nelken et al., 1999Go; Lewicki 2002Go) show robust statistical properties such as scale invariant contrast statistics and 1/f modulation spectrum. Although numerous studies have looked at these statistical characteristics of natural signals, only a few studies have addressed how such statistics can be used for efficient sensory coding (Rieke and Bodnar, 1995Go; Dan et al., 1996Go; Attias and Schreiner, 1998Go; Nelken et al., 1999Go; Stanley et al., 1999Go). Direct application of information theoretic approaches has revealed that sensory neurons respond most efficiently to sensory signals with natural statistics, although the exact mechanisms enabling such efficient processing have not been established.

In natural vision and hearing, our senses are exposed to stimuli that span many orders of magnitude in their mean and instantaneous intensities. Measurements of the spectral, spatial, or temporal fluctuations in the local energy of the sensory signal are typically represented by the modulation index for sounds or by the contrast for visual images. Both of these measures rely on the peak-to-peak amplitude excursion of the sensory signal as the relevant signal parameter and do not fully account for the intermediate amplitudes of the sensory waveform. Temporal modulations in natural sounds and spatial fluctuations in natural scenes, however, cover several orders of magnitude and therefore are represented best by the log-amplitude transform (Ruderman and Bialek, 1994Go; Attias and Schreiner, 1998Go).

Spatial and temporal energy gradations represent much of the information-bearing components of sensory signals, and we therefore expect that sensory systems efficiently make use of spectrotemporal information found in natural sounds and spatiotemporal information found in natural scenes. Considering the rules for scaling in natural sounds and visual scenes, and the logarithmic Weber's law scaling for intensity and luminance discrimination (Weber 1834Go; Fechner, 1860Go; Miller, 1947Go; Harris, 1963Go; Jesteadt and Wier, 1977Go; Florentine et al., 1987Go), one hypothesis is that sensory systems are attuned to logarithmic modulations. We therefore would like to determine whether the log-transform signal expressed in units of decibels, 20 · log10 (s(t)), is potentially more important than the corresponding linear amplitude auditory signal, s(t).

Numerous studies have addressed the neuronal representation of time-varying sounds, although these have traditionally focused on linear amplitude excursions. Studies on sinusoidal amplitude modulation (AM) have demonstrated that phase-locking sensitivity improves with increasing modulation index in the inferior colliculus (ICC) and neuronal firing rates increase monotonically, although these can saturate with as little as 20% modulation index (Rees and Moller, 1983Go; Krishna and Semple, 2000Go). There is some evidence, however, indicating that neurons are also sensitive to the higher-order moments in temporally modulated amplitudes in acoustic signals, even at near 100% modulation depth where firing rates appear to be fully saturated. First, auditory neurons in the cochlea and throughout the entire auditory pathway are exceptionally sensitivity to the velocity and acceleration profiles of temporally ramped stimuli (Heil, 1997aGo,bGo; Heil and Irvine, 1997Go). First-spike timing precision and trial-to-trial reproducibility improve with increasing velocity or acceleration of the temporal acoustic waveform. Time reversal of ramped auditory stimuli produces a shift in the perceived quality and intensity (Irino and Patterson, 1996Go; Akeroyd and Patterson, 1997Go) that is reflected in the response of primary auditory cortex neurons (Lu et al., 2001Go), although these sounds have identical peak-to-peak contrast and energy spectrum. Finally, neurons in the auditory midbrain are sensitive to higher-order moments of modulation waveform, such as the skewness and kurtosis, and appear to respond preferentially to synthetic sounds with naturally matched temporal modulations (Attias and Schreiner, 1998Go). These studies provide evidence that the entire distribution of amplitudes is critical for the neuronal representation, and the peak-to-peak values alone do not account for a significant fraction of observed neuronal responses.

Evidence for the relevance of the log-transform amplitude modulations comes from neurophysiology studies on the representation of sound intensity. Peripheral and central auditory neurons typically respond with an operating range of 30-50 dB and can show monotonic or nonmonotonic rate-level dependencies in central stations (Evans and Whitfield, 1964Go; Palmer and Evans, 1982Go; Ehret and Merzenich, 1988Go; Eggermont, 1989Go; Sutter and Schreiner, 1995Go). Psychophysical evidence further suggests that loudness perception and just-noticeable difference limens for intensity discrimination follow logarithmic (e.g., Weber's Law) relationships (Miller, 1947Go; Stevens, 1957Go; Harris, 1963Go; Jesteadt and Wier, 1977Go; Florentine et al., 1987Go). It is therefore likely that the operating range of auditory neurons is also used to process fine spectrotemporal information found in natural sounds.

We tested whether the operating range of single neurons is suited for encoding spectrotemporal information found in natural signals by comparing the neuronal representation of log-amplitude spectrotemporal modulations, 20 · log10(S(t,f)) (units of decibels), with the corresponding linear amplitude spectrotemporal excursions, S(t,f). Statistical analysis of natural sounds shows that natural signals follow logarithmic scaling laws, having an effective dynamic range that is comparable with the intensity operating range of single neurons for pure tones. Neuronal activity in the cat ICC to logarithmic rippled noise (RN) signals (Escabí and Schreiner, 2002Go) was marked by improvement in spectrotemporal processing ability, including higher spike rates and increased mutual information rates. These findings suggest that the operating range of the auditory system is matched to the spectrotemporal amplitude statistics of natural sensory stimuli.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Natural sound analysis. We studied the spectrotemporal modulations in natural sounds to identify potential differences among various classes of natural sounds and to determine whether neurons in the CNS preferentially encode signals with similar statistical properties. The ensemble of natural sounds included animal vocalizations (64.6 min), continuous running speech (74.0 min), and environmental sounds (51.1 min). As a control, white noise (10 min) was also analyzed using identical analysis procedure. No attempt was made to limit the sounds to any particular subcategory or species. All vocalizations and environmental sound were obtained from commercially available compact disk media from the Macaulay Library of Natural Sounds at Cornell University (Storm, 1994aGo,bGo; Emmons et al., 1997Go). Human speech was obtained from a radio broadcast reproduction of the William Shakespeare play Hamlet (Shakespeare, 1992Go). All sounds were sampled at a rate of 44.1 kHz and 16-bit resolution.

Sounds were initially decomposed by a bank of tonotopically arranged filters into a spectrotemporal representation that mimics the spectral decomposition performed by the cochlea. Filter center frequencies were arranged according to the frequency position function of the cochlea over a range covering 250 Hz to 14 kHz, and filter bandwidths were selected according to the perceptual critical bandwidths (Greenwood, 1990Go). Sounds waveforms were decomposed according to:

(1)
where hk(t) is the impulse response of the k-th filter channel centered about the frequency fk, * is shorthand for the convolution operator, and s(t) is the sound waveform. Spectrotemporally compact B-spline filters (Roark and Escabí, 1999Go) were chosen for this analysis to minimize cross talk across adjacent filter channels and across separate time instants. For the purpose of the statistical analysis, filter bandwidths overlapped by 50%. To increase the display resolution in Figure 1, however, filters were overlapped by 90%.



View larger version (57K):
[in this window]
[in a new window]
 
Figure 1. Analysis of the spectrotemporal contrast statistics of natural sounds and white noise. Cochlear model output representation showing the spectrotemporal modulations of a short segment of human conversational speech (A, B), animal vocalizations (E, F), environment background sounds (running water) (I, J), and white noise (M, N). Color scales are expressed either as a linear amplitude spectrotemporal modulation (normalized for a maximum value of 1) (A, E, I, M) or as the corresponding log-amplitude spectrotemporal modulation pattern (zero mean) expressed in units of decibels (B, F, J, N). The corresponding acoustic sound pressure waveforms are shown for each sound segment above each spectrotemporal envelope in black. The log-transformed spectrotemporal envelope expands the visible dynamic range of the stimulus, revealing detail that is not visible in the linear amplitude spectrotemporal modulation envelope. For each of the sound ensembles, the amplitude values from each spectrotemporal envelope were converted into a probability histogram for both the linear (C, G, K, O) and log-transform spectrotemporal envelope (D, H, L, P). The linear amplitude distribution of all natural sounds (C, G, K) shows a exponential-like distribution in which most of the stimulus components fall at low amplitude values. By comparison, the linear amplitude distribution of white noise (O) covers much of the linear amplitude dimension. The corresponding log-amplitude distributions exhibit a bell-shaped profile in which speech had the broadest dynamic range (D) and white noise covered the narrowest range of amplitude values (P).

 
The temporal waveform at the output of each filter was processed to extract the relative temporal modulations in each filter channel. First, the time-waveform of the k-th filter output, sk(t), was decomposed into a temporal envelope:

(2)
where H[·] is the Hilbert transform operator (Hilbert, 1912Go) and ek(t) is the temporal modulation envelope of the k-th frequency channel centered about a frequency of fk. Next, the temporal envelope of each filter channel was low-pass filtered to limit the modulations across each filter channel to a maximum rate of 100 Hz:

(3)
where h100 (t) is the impulse response of a low-pass B-spline filter with cutoff frequency of 100 Hz and Sk(t) is the band-limited temporal modulation envelope centered about a frequency of fk. This filtering was necessary so that all spectral channels have identical modulation bandwidths. A modulation bandwidth of 100 Hz was chosen because the cochleotopic filter bank decomposition contained filter bandwidths of ~100 Hz at frequencies below 1 kHz, and therefore the outputs for these frequency channels did not contain modulations above 100 Hz. We treat Sk(t) as a two-dimensional function S(t,fk), the spectrotemporal envelope, which displays the energy modulations of each natural sound waveform as a function of frequency and time.

We were interested in the average statistical characteristics of the relative spectrotemporal fluctuations of each signal. After the filter bank decomposition, the envelopes were therefore rescaled according to a linear amplitude convention:

(4)
where SMax = max[S(t,fk)] is the maximum amplitude value of the spectrotemporal envelope, S(t,fk). This rescaling limits the maximum amplitude of the spectrotemporal modulations to 1, consistent with traditional definitions of the analytic modulation signal (Hilbert, 1912Go). The rescaling also preserves the relative excursions and spectrotemporal interrelationships across each frequency band.

Psychophysics of loudness perception and intensity discrimination suggest that relative amplitude fluctuations described using decibel amplitude may be a more appropriate representation of the acoustic waveform. We therefore also considered a log-transform version of the spectrotemporal envelope:

(5)
where µdB is the mean value of 20 log10 (S(t,fk)). This transformation removes the mean spectrotemporal level while preserving the variance and expresses the spectrotemporal envelope in units of decibels.

Analysis of the spectrotemporal envelope statistics consisted of measuring the amplitude distribution function for both the linear and decibel amplitude envelopes. Pixel values obtained from the spectrotemporal envelope were compounded into an amplitude distribution function, p[SLin] and p[SdB], where p[·] is the amplitude distribution function and SLin = SLin(t,fk) and SdB = SdB (t,fk) are shorthand for the linear and decibel spectrotemporal envelopes, respectively. Examples of the corresponding ensemble distributions for speech, vocalizations, background sounds, and white noise are illustrated in Figure 1. Additional analysis also consisted of estimating standard statistical measures directly from SLin and SdB (see Table 1). These included the modulation index, {beta} = (SMax - SMin)/SMax, and the contrast, C = (SMax - SMin)/(SMax + SMin) as well as the waveform SDs, 90th percentile range, and skewness.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary statistics for the natural sound ensembles and white noise comparing the linear versus the log amplitude analysis

 
Electrophysiology. A detailed account of our experimental methods has been reported previously (Escabí and Schreiner, 2002Go). Briefly, cats (n = 4) were initially anesthetized with a mixture of ketamine HCl (10 mg/kg) and acepromazine (0.28 mg/kg, i.m.). After an intravenous infusion line was inserted, a surgical state of anesthesia was induced with ~30 mg/kg Nembutal and maintained throughout the surgery with supplements. Body temperature was measured and maintained with a heating pad at ~37.5°C. An incision was made in the intercartilaginous area of the trachea, and a tracheotomy tube was inserted. After a craniotomy was performed, the ICC was exposed by removing the overlying cerebrum and part of the bony tentorium using a dorsal approach. On completion of the surgery, the animal was maintained in an areflexive state of anesthesia via continuous infusion of ketamine (2-4 mg · kg-1 · hr-1) and diazepam (0.4-1 mg · kg-1 · hr-1) in lactated Ringer's solution (1-4 mg · kg-1 · hr-1). The state of the animal was monitored (heart rate, breathing rate, temperature, and reflexes) throughout the experiment, and the infusion rate was adjusted according to physiologic criteria. Every 12 hr the cat received an injection of dexamethasone (0.14 mg/kg, s.c.) and atropine (0.04 mg · kg-1 · d-1, s.c.). All surgical methods and experiment procedures followed National Institutes of Health and United States Department of Agriculture guidelines and were approved by the committee on animal research, University of California, San Francisco.

Data were obtained from single units (su) and multi-units (mu) in the ICC. One or two closely spaced parylene-coated tungsten microelectrodes (Microprobe Inc., Potomac, MD; 1-3 M{Omega} at 1 kHz) were advanced with a hydraulic microdrive (David Kopf Instruments, Tujunga, CA). Electrode penetration trajectories were at ~20-30° relative to the sagittal plane and approximately orthogonal to the isofrequency band lamina. Action potential traces were recorded onto a digital audio tape (CDAT16; Cygnus Technologies, Delaware Water Gap, PA) at a sampling rate of 24.0 kHz (41.7 µsec resolution) for off-line analysis. Off-line analysis consisted of digital bandpass filtering (0.3-10 kHz) all spike trains and individually spike sorting the action potential traces using a Bayesian spike-sorting algorithm (Lewicki, 1994Go).

Acoustic stimuli. Our analysis of natural sounds suggests that spectrotemporal fluctuations in natural sounds have a broad dynamic range and that these are most appropriately described by the decibel amplitude variable. Therefore, we hypothesized that sensory neurons in the CNS should respond best to sounds that efficiently cover the decibel amplitude dimension.

One approach for testing this hypothesis is to compare the response of natural sounds with those of altered natural sounds. This approach may be limited by the high dimensionality and the correlations present in natural sounds that may prevent us from measuring true contrast effects. Although we currently know very little about the statistical properties of natural sounds, it is generally agreed that these are structurally complex and exhibit spectral and temporal correlations over a wide range of scales (Voss and Clarke, 1975Go; Attias and Schreiner, 1998Go; Theunissen et al., 2000Go). Therefore, modifying the contrast of a natural sound directly could potentially modify some alternate dependent variable. We therefore used synthetic stimuli with logarithmically matched and unmatched contrast statistics to study the efficiency of neuronal coding in the inferior colliculus.

The synthetic sounds consist of RN stimuli (Escabí and Schreiner, 2002Go) that are compatible with reverse correlation and can be used to estimate the spectrotemporal receptive field (STRF) of a neuron. These stimuli dynamically activate the sensory epithelium in the cochlea and allow us to estimate neuronal preferences in an unbiased manner. The spectrotemporal envelope of this stimulus is shown in Figure 2. It has noise-like properties with energy fluctuations that span a temporal modulation range of 0-350 Hz and spectral modulations from 0 to 4 cycles per octave. Signals were generated by modulating individual sinusoidal carriers of frequency, fk, and random phase, {phi}k:

(6)
over a range of 0.5-20 kHz by the stimulus spectrotemporal envelope S(t,Xk). Here Xk = log2 (fk/f0) is an octave frequency axis and the carrier spacing was set to {Delta}X = 0.0231 octaves.



View larger version (47K):
[in this window]
[in a new window]
 
Figure 2. Spectrotemporal envelope and contrast statistics of the Lin- and Log-ripple noise test stimulus. The RN spectrotemporal envelope has random intensity modulations along the temporal and frequency stimulus dimensions (A, F). Log- and Lin-RN have identical spectrotemporal features and differ only in their amplitude statistics (A, F; both shown on a linear amplitude color scale). Temporal cross section of the Lin sound on a linear amplitude axis (B), Lin sound on a decibel axis (D), Log sound on a linear axis (G), and Log sound on a decibel axis (I). The amplitude distributions of each sound (Lin- and Log-RN) are shown, respectively (far right), on a linear (C and H, respectively) and decibel (E and J, respectively) amplitude axis. The Lin-RN follows a uniform linear amplitude distribution, whereas the Log-RN is uniformly distributed on a log-amplitude axis.

 
To address our initial hypothesis of which stimulus dimension is most important (linear or decibel amplitude), the amplitude statistics of the RN spectrotemporal envelope were designed either on a decibel or linear amplitude axis, without modifying the spectrotemporal content. First we created a generic RN spectrotemporal envelope, Sg(t,Xk), that was used to construct all of the sampled acoustic waveforms. From this we constructed five RN sounds that differed only in the contrast statistics of their envelope: sLin(t), s15(t), s30(t), s45(t), and s60(t). Subscripts denote the type of spectrotemporal contrast statistic: Lin designates an RN with linearly distributed amplitude statistics (see Fig. 2A-E). Numerical values designate the dynamic range (in decibels) for RN sounds with logarithmic-distributed (Log) contrast statistics (see Fig. 2F-J). The later sounds therefore had contrast statistics that could cover several orders of magnitude, as is evident for all natural sounds. Because sounds were constructed using an identical generic envelope (Sg(t,Xk)) by applying a nonlinear transformation, all sound sequences had identical spectrotemporal content and differed only in their contrast (amplitude) statistics.

The generic ripple noise envelope has uniformly distributed amplitude statistics in the interval 0-1. Decibel distributed sounds were constructed by applying the transformation:

(7)
to the generic envelope, where M designates the dynamic range of the envelope in units of decibels (M assumes values of 15, 30, 45, or 60 dB). The decibel envelope for this sound, SdB(t,Xk) = 20 · log10(S(t,Xk)) = M · Sg(t,Xk) - M, has a uniform amplitude distribution in the interval [-M, 0] dB (see Fig. 2J).

To determine whether linear or logarithmic modulations preferentially excite sensory neurons, we designed a control stimulus with linear amplitude modulations (as shown in Fig. 1 for white noise). The Lin-RN sound covered a similar range of modulation amplitudes as the Log-RN. The Lin-spectrotemporal envelope is designated as:

(8)
where the modulation index of {beta} = 1 - 10-30/20 = 0.968 was chosen so that the Lin-RN has an identical modulation index as the 30 dB Log-RN sound (i.e., the maximum and minimum amplitude values are identical; minimum = 10-30/20, maximum = 1). These sounds thus are matched at their extremes and differ only in the shape of their amplitude distribution. The Lin-RN has a uniform amplitude distribution in the interval 10-30/20 to 1 (see Fig. 2B,C). To facilitate comparisons, we point out that the 30 dB and Lin amplitude distributions have similar low-order statistics (see Table 2). These include their SDs measured for S(t,Xk) ({sigma}Lin = {beta}/{surd}12 = 0.28 and {sigma}30 = 0.23) and for 20 · log10(S(t,Xk)) ({sigma}30 = 8.66 dB and {sigma}Lin = 6.71 dB).


View this table:
[in this window]
[in a new window]
 
Table 2. Low-and high-order statistics of the RN envelope

 
Stimulus presentation. All experiments were conducted in a sound-attenuating chamber (IAC, Bronx, NY) with stimuli delivered via a closed, binaural electrostatic speaker system (Stax). Stimuli were presented binaurally with an independent RN sound sequence for each ear. This allowed us to compute independent STRFs for the contralateral and ipsilateral ears (Escabí and Schreiner, 2002Go). After single units and multi-units were obtained for pure tones and white noise, a pseudorandom sequence of four 15 sec ripple noise segments (60 sec total at each condition) was presented at five intensities (in 10 or 15 dB steps) and five contrast conditions (15, 30, 45, or 60 dB) and also for the Lin condition. The mean firing rate was measured for each condition and a contrast-intensity response function, R(C,SPL), was approximated by a 4 x 5 matrix of mean firing rates. For visualization purposes (see Figs. 3, 4), the contrast-intensity response matrices were interpolated using the interp2 function (cubic interpolation) in MATLAB (Mathworks Inc.); however, all of the subsequent analysis was performed on the original 4 x 5 response matrix.



View larger version (72K):
[in this window]
[in a new window]
 
Figure 3. Contrast versus intensity response curves of nine single units. The ripple noise stimulus was presented in pseudorandom order for Lin, 15, 30, 45, and 60 dB and at five intensity conditions (intensity spacing of 10 or 15 dB) for a possible 25 combinations. Surface plots depict the measured spike rate as a function of the stimulus contrast and intensity parameters. Spike rates often increased monotonically with increasing contrast (dynamic range) parameter (A-C) and were typically weakest for the Lin-RN. Other neurons displayed nonmonotonic contrast response curves (D-F) in which the mean spike rate was greatest for contrast values of either 30 or 45 dB. The remaining neurons either had a decreasing monotonic response curve (I) or displayed no statistically significant contrast dependency (G, H).

 



View larger version (41K):
[in this window]
[in a new window]
 
Figure 4. Separability of the contrast-intensity response function. Representative contrast-intensity response curves of a contrast-monotonic (A) and nonmonotonic (C) single neuron. Separable approximations (B, D) closely match the true response curves of A and C. In both cases high separability index values are obtained (0.98 for B and 0.99 for D). The separable response components for contrast and SPL are depicted above and to the left of the separable response curves of B and D. Histogram showing the separability index of n = 63 single units and n = 40 multi-units (E). All neurons had a very high separability index, indicating that the response rate can be expressed as an independent function of contrast and intensity.

 
We characterized the contrast-response curves of each neuron along the maximum SPL contour according to the shape of the contrast-response curve as increasing-monotonic, nonmonotonic, decreasing-monotonic, or independent of contrast. As a criterion, we searched for statistically significant changes (increase or decrease) in firing rate at each contrast condition. Estimates of the firing rate measurements, , over a 60 sec time window were bootstrapped for each contrast condition, M = 15, 30, 45, or 60 dB, to determine the variability of the data. The significance probability was determined numerically for p < 0.05 by finding the tail probabilities of the overlapping firing rate distributions across different contrast conditions. Neurons were identified as contrast nonmonotonic whenever the measured firing rates for the 15 and 60 dB conditions were statistically smaller than for 30 or 45 dB contrast. Mean firing rates for monotonic neurons were chosen to satisfy a significance relationship or .

A nonrepeating 18 min segment of the RN was presented at key locations of the contrast-intensity response curve: Lin and 30 dB conditions, Lin and 60 dB conditions, or Lin, 30, and 60 dB conditions. This was used to estimate the STRF of each neuron at multiple-contrast operating conditions. Finally, at 25 recording sites, the mutual information rate of each neuron was estimated from the response rastergrams to a 5 sec segment (repeated 150 times) of the ripple noise (see Mutual information; see Fig. 8).



View larger version (15K):
[in this window]
[in a new window]
 
Figure 8. Spiking pattern and response reproducibility as a function of contrast for two single neurons. A 5 sec segment of the Lin, 30 dB, and/or 60 dB ripple noise was presented. Rastergrams and PSTHs show 125 response traces to the ripple noise: Lin, 30 and 60 dB for neuron 1 (A, shown top to bottom, respectively) and for neuron 2 (B, shown top to bottom, respectively). Each spike is shown as a single dot (bin width, 1 msec). The PSTH for each condition is shown above the corresponding rastergrams (shown on identical amplitude scales for Lin, 30 dB, and 60 dB). Driven firing rates and response reproducibility improve for the 30 and 60 dB RN relative to Lin-RN. Higher peak-to-trough amplitude modulations of the driven spike rate for 30 and 60 dB indicate that stimulus encoding is improved for the Log condition.

 
Spectrotemporal receptive field. Contralateral and ipsilateral STRFs are computed by averaging the pre-event spectrotemporal envelope of the contra- and ipsi-stimulus at the time instant of each neural spike, tn (47 µsec resolution):

(9)
Here T corresponds to the experimental recording time in seconds, {tau} is the temporal delay of the stimulus relative to the neural event time (0-100 msec), (t,Xk) is the zero-mean spectrotemporal envelope for the contra- or ipsi-stimulus, and {sigma}s2 is the envelope variance.

All of the analysis was performed so that the spectrotemporal enveloped used in Equation 9 corresponds precisely with the stimulus dimension under consideration. For instance, if the Lin-RN sound was presented (Eq. 8), the linear-amplitude zero-mean spectrotemporal envelope:

(10)
was used to compute the STRF of the neuron where {sigma}2 = {beta}/12 ({beta} = 1 - 10-30/20) is the variance of the envelope and g is the zero-mean, unit-variance generic spectrotemporal RN envelope. Alternately, if a logarithmic-distributed RN envelope was used, e.g., M = 30 dB, the corresponding zero-mean decibel envelope was used in the analysis:

(11)
where {sigma}2 = M2/12 dB2 is the signal variance for the Log sound. This procedure assured that in both instances the stimulus spectrotemporal envelopes used for the reverse correlation were identical in all respects except their variance. Unfortunately, this meant that the STRF units were distinctly different for the Lin- and Log-spectrotemporal envelopes (Eqs. 10 and 11, respectively). These are given as output/input where the output units are spikes per second for either case but the input units are dB for the Log envelope and unitless for the Lin-envelope. Therefore, an alternate normalization was preferred for the STRF in which we removed the input stimulus dimensions by multiplying by the average input signal: STRFr = {sigma} · STRF. This rate-normalized STRF is given in units of spikes per second and corresponds to the average output produced for the average input (Escabí and Schreiner, 2002Go). Furthermore, after normalizing Equation 9 in this manner, both rate-normalized STRFs are no longer confounded by any stimulus-dependent aspects and are now described by the same equation:

(12)
Throughout this report, the rate-normalized STRF is used to facilitate comparisons.

Statistically significant STRF. Statistically significant regions of the STRF were determined by considering a null condition in which N randomly chosen Poisson spikes are put through Equation 9 (Escabí and Schreiner, 2002Go). The amplitude distribution of this control-STRF was derived in closed form, and significance was tested for all STRFs at p < 0.002. Because the amplitude distribution of the control-STRF quickly approached a Gaussian distribution (for as few as N = 50 spikes), the significant STRF was obtained by keeping all values that exceeded 3.09 SDs of the control noise STRF and setting all other values to zero (e.g., actual significance p < 0.0019 for N = 50).

STRF similarity index. We compared STRF shapes across multiple conditions with the STRF correlation coefficient or similarity index (SI) (DeAngelis et al., 1999Go; Reich et al., 2000Go). For two experiment conditions A and B, we consider the statistically significant vectorized RFs, which consist of all pixels of STRFA and STRFB (determined for both the contra- and ipsi-STRFs) that exceed a significance test (p < 0.002) for condition A or B. The STRF similarity index is then computed as:

(13)
where RFA and RFB are the significant vectorized binaural-STRFs for condition A and B, respectively, corresponds to the vector inner product, and ||·|| designates the vector norm operator. The SI quantifies the STRF shape differences or similarity independently of STRF amplitude and assumes numerical values between -1 and 1, where 0 designates not similar, 1 indicates that the STRFs have identical shape, and -1 indicates that the STRFs have identical shape but differ by a sign inversion.

Rate and magnitude disparity index. Two metrics were designed that allowed us to evaluate firing rate and STRF energy differences independently of the STRF shape. First we computed a rate disparity index (RDI):

(14)
where {lambda}A and {lambda}B are the measured firing rates for conditions A and B, respectively, and s = sign ({lambda}A - {lambda}B). The magnitude of the RDI is numerically equivalent to the percentage of change in firing rate referenced on condition A if s > 0 and B if s < 0. Its sign, s, tells us which condition, A or B, had a higher firing rate: {lambda}A > {lambda}B if s > 0 and {lambda}A < {lambda}B if s < 0.

Differences in the driven neuronal activity between two stimulus conditions were quantified by measuring the percentage of change in the STRF energy, which we measured as an STRF magnitude disparity index (MDI):

(15)
where EA and EB are the significant binaural rate-normalized STRF energies for conditions A and B, respectively (Escabí and Schreiner, 2002Go). The STRF energy measures phase-locked activity (units of spikes per second) that is captured by the STRF of the neuron. Therefore, the MDI measures changes in phase-locked or stimulus-driven neuronal activity where the sign, s, designates which condition was stronger and the magnitude of the MDI designates the percentage of change in driven activity.

Mutual information. A 5 sec segment of the RN stimulus was presented for 150 trials. Response traces were recorded for each trial, and the reliability of the spike train was determined by measuring the mutual information rate (de Ruyter van Steveninck et al., 1997Go; Strong et al., 1998Go). The first 25 traces were discarded for all neurons to minimize the effects of adaptation. Each spike trace was digitized at a sampling resolution of {Delta}t = 1 msec, and the spike train entropy was determined by measuring the probability distribution, P(W), of possible N-bit words, W (also tested for {Delta}t = 2 and 5 msec). A search through the whole experiment was conducted to determine the word distribution, P(W). Using the distribution of N-bit words, the spike train entropy is determined as:

(16)
This measure provides a theoretical upper limit on the amount of information that a spike train can convey and does not account for the possibility of internal noise. To determine the noise inherent within the response, the noise entropy was computed by determining the trial-to-trial reliability of the response (e.g., the entropy in the spike train that does not convey any viable information about the stimulus). At any given time instant, t, the conditional probability distribution of obtaining a given N-bit word was computed, P(W|t). The noise entropy was then determined as:

(17)
where is the conditional ensemble expectation computed over all time. The information that the spike train contains about the stimulus (i.e., the mutual information) is determined by subtracting these two quantities: I = Stotal - Snoise. This procedure was bootstrapped across different stimulus segments, word lengths (T = 5, 6, 8, 10, 15, 20, 40, 80, 160, and 200 msec), and data fractions (80, 50, 33, and 25%). The mutual information and error bounds were then extrapolated (using 80, 50, 33, and 25% data fractions and T = 5-15 msec) for the infinite data case according to the procedure of Strong et al. (1998Go). The algorithm was calibrated with fly visual data from Borst (2003Go).

Spectrotemporal phase-locking index. We measured the contribution of single action potentials by considering how each neuronal spike contributes to the STRF construction procedure (Eq. 12) with a spectrotemporal phase-locking index (PLI) (Escabí and Schreiner, 2002Go). The PLI metric allows us to measure the degree of alignment of action potentials to the spectrotemporal envelope of the sound and provides a measure of the fraction of the spikes of the neuron that contributes to the STRF construction process.

The theoretical basis for the PLI metric is a tight temporal spike alignment to on- and off-features in the RN sound that will lead to optimal buildup of the STRF (Eq. 12). This will produce a large peak-to-peak STRF amplitude. If the alignment of action potentials and stimulus features is poor, the resulting STRF peak-to-peak amplitude will be small. On the basis of this fact, the phase-locking index is defined as the measured peak-to-peak amplitude of the STRF normalized by the theoretical maximum attainable peak-to-peak amplitude for a perfectly phase-locking neuron:

(18)
where max(STRFr) - min(STRFr) is the measured peak-to-peak amplitude for the rate-normalized STRF (Eq. 12) and {surd}12 · {lambda} is the maximum theoretical value for Equation 12 (Escabí and Schreiner, 2002Go). The PLI is bounded between 0 for no evident phase locking (no measurable STRF) to 1 for a perfect phase locking.


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
We studied the amplitude distributions of spectrotemporal modulations in natural sounds to determine whether sensory neurons respond preferentially to sounds with a natural-like dynamic range. First, we quantitatively measured the second-order modulations of various natural sound ensembles. We next tested the representation of linear and logarithmic spectrotemporal modulations to determine whether the amplitude statistics and the dynamic range of the stimulus significantly improve neuronal encoding in the ICC. We designed synthetic RN stimuli that uniformly cover the linear or decibel dimension over a predetermined range of values and are matched to the statistical structure of various artificial stimuli and natural sounds. This approach allowed us to closely match a number of low-order statistics of the linear and log-transform stimulus, such as the SD, modulation depth, and contrast, while allowing us to independently modify the shape of the amplitude distribution and its higher-order moments (e.g., its skewness, kurtosis, and log-transform SD). STRF and information theoretic approaches were then used to compare neuronal encoding abilities for both logarithmic- and linear-ripple noise stimuli and for various dynamic range conditions.

Spectrotemporal modulations and contrast in natural sound
The statistical properties of the spectrotemporal modulations in natural sounds were determined by analyzing an extensive database that included speech, animal vocalizations, environmental sounds, and white noise. Animal vocalizations included sound emissions from a host of domestic (cat, horse, dog, etc.) and nondomesticated (bats, primates, birds, large cats, frogs, etc.) animals. Environmental background sounds were selected from inanimate sources such as running water, wind, and thunder. The sample also included unvocalized sound emissions from animals such as a woodpecker hammering, footsteps in leaves, and buzzing from a swarm of bees. Human speech was conversational, obtained from long, continuous segments of a reproduction of the play Hamlet. All sounds were decomposed into a cochleotopic representation that mimics the output performed by the cochlea (Fig. 1). This spectrotemporal stimulus pattern provides a pictorial representation of the spectral and temporal modulations that are present in each signal.

Figure 1 illustrates representative sound segments from each of the studied stimulus ensembles. The spectrotemporal modulations in each sound are plotted either as a linear amplitude variable (Fig. 1A,E,I,M), normalized to an amplitude range between 0 and 1, or as a log-transform variable expressed in amplitude units of decibels (Fig. 1B,F,J,N). Visually, much of the detail of the linear amplitude spectrotemporal envelope is obscured because most of the stimulus modulations are localized about amplitudes near zero. In contrast, the log-transform spectrotemporal envelope expands the effective dynamic range of each stimulus and expresses the modulations as proportional amplitude, so that much of the structural detail is readily visible. The perceptual difference in the visual representation of these stimuli is consistent with psychophysics of intensity and luminance discrimination, both of which follow Weber's law scaling (Weber, 1834Go; Fechner, 1860Go; Miller, 1947Go; Harris, 1963Go; Jesteadt and Wier, 1977Go; Florentine et al., 1987Go).

We computed the spectrotemporal amplitude distribution of each signal by collapsing all of the pixel values of the spectrotemporal envelopes into a binned histogram. These are shown for both the linear and decibel spectrotemporal envelopes. As observed from the linear spectrotemporal envelopes, the distribution of spectrotemporal amplitudes in natural signals is highly skewed toward values near zero (Fig. 1C,G,K; Table 1). By comparison, white noise has amplitude fluctuations that cover a larger extent of the linear amplitude space (Fig. 1O). Thus, although all natural sounds exhibit spectrotemporal modulations that span nearly the entire range of linear amplitude values (from 0 to 1; all sounds had modulation depths >99.99%), the measured SDs were typically small, indicating that the average amplitude modulations of these signals spanned only a limited region of the linear amplitude space and were skewed toward zero value (Table 1). This conflicting assessment of the spectrotemporal modulations in natural signals indicates that typical measures such as the traditionally defined contrast, C = (SMax - SMin)/(SMax + SMin), or the modulation index, {beta} = (SMax - SMin)/SMax, are inappropriate because they only account for the maximum and minimum stimulus intensities (SMax and SMin, respectively) and do not take into account the distribution of intermediate amplitude values.

The log-transform spectrotemporal envelope histogram expands the observable range of the signal and therefore overcomes many of the limitations observed for the linear amplitude variable. The decibel amplitude modulations in natural sounds follow a roughly symmetric bell-shaped distribution as measured from the low skewness values (Fig. 1D,H,L; Table 1). As is evident in the example spectrotemporal envelopes, the measured SD and 90th percentile range of speech, vocalizations, and environment sounds spanned an extensive range of values (Table 1). Vocalizations and speech have the broadest distributions as measured from their SD and 90th percentile range, whereas environmental sounds covered a narrower range of amplitudes. White noise, by comparison, spanned the narrowest amplitude range.

These findings show that vocalizations and environmental sounds contain logarithmically distributed spectrotemporal modulations with an effective dynamic range (i.e., 90th percentile range; ~30 dB for environmental sounds and ~50 dB for vocalizations and speech) that is closely matched to the intensity operating range of neurons in the auditory pathway (Evans and Whitfield, 1964Go; Palmer and Evans, 1982Go; Eggermont, 1989Go). We therefore postulate that auditory neurons use such information for efficiently encoding and detecting spectrotemporal features in natural signals. As is evident from Figure 1, vocalizations and speech dynamically change over time and exhibit short periods of high-energy and low-energy comodulated activity followed by quiet or background activity (Nelken et al., 1999Go). Background sounds by comparison are usually stationary over the time scales that are relevant for intensity discrimination and loudness perception (Green et al., 1957Go; Stephens, 1973Go) and generally have a much narrower dynamic range. These difference in the effective dynamic range and the time-varying structure between vocalization and environmental sounds thus may be important for signal segregation and may facilitate the detection of vocalizations in high levels of background noise.

Contrast and intensity response characteristics
To test the possibility that the operating range of the central auditory system is matched to efficiently process natural acoustic stimuli, we designed naturalistic RN stimuli that mimic the logarithmic amplitude fluctuations observed in natural sounds and a control stimulus with linearly distributed amplitude fluctuations (see Materials and Methods) (Fig. 2) similar to those found in common experimental stimuli. Although the distribution of logarithmic modulations in the log-RN does not exhibit long tails as evident in all natural sounds (Fig. 1), these sounds have spectrotemporal amplitudes that efficiently cover the decibel amplitude space (Fig. 2I,J), an exponential-like linear amplitude distribution (Fig. 2H), and envelope SDs within the range observed for natural sounds (see Materials and Methods) (Tables 1, 2). For both conditions, the modulations in the RN sound covered an unbiased range of temporal (0-350 Hz range) and spectral (0-4 cycles per octave) modulations, making it a suitable test stimulus for measuring spectrotemporal receptive fields in the ICC (Escabí and Schreiner, 2002Go; Qiu et al., 2003Go). The spectrotemporal content of all sounds was held fixed, and the amplitude distribution of each sound was varied independently. The naturalistic RN has spectrotemporal intensity fluctuations that uniformly covered a dynamic range of 15, 30, 45, or 60 dB (shown for 45 dB in Fig. 2I,J). The linearly distributed control sound had amplitude fluctuations that uniformly covered a predefined linear amplitude range from 10-30/20 = 0.032 to 1 (modulation index = 0.968) (Fig. 2B,C). Both the naturalistic (Log) and artificial (Lin) stimuli had identical spectrotemporal envelope content and differed only in their amplitude statistics (see Materials and Methods) (Figs. 2A,F).

Recordings were performed on n = 63 su and n = 40 mu in the ICC. Sound segments (15 sec) were presented in a pseudorandom order for the different contrast conditions and for 5 rms sound pressure levels (SPLs) extending over a range of 50 or 75 dB (step size of 10 or 15 dB, respectively). Intensity- versus contrast-response curves were derived for each neuron by measuring the mean spike rate at all operating conditions (see Materials and Methods) (Fig. 3). As expected, neurons showed monotonic or nonmonotonic response characteristics as a function of SPL (Evans and Whitfield, 1964Go; Ehret and Merzenich, 1988Go; Eggermont, 1989Go; Sutter and Schreiner, 1995Go). Similar dependencies were observed for the contrast axis. Response characteristics can be increasing-monotonic with increasing dynamic range (Fig. 3A-C), tuned (Fig. 3D-F), decreasing-monotonic (Fig. 3I), or independent (Fig. 3G,H) of the stimulus contrast statistics. For reference, results for the Lin stimulus are shown alongside the Log contrast conditions.

Increasing-monotonic units (n = 37 mu + su) showed a significant increase in firing rate (p < 0.05) with increasing contrast dynamic range. In such cases the mean spike rate was typically minimal for the Lin-RN and 15 dB Log-RN and maximal for the 60 dB Log-RN (firing rate increase over Lin: average = 168%, median = 78%). For all neurons, the mean spike rates were similar for linear-RN and 15 dB Log-RN (p > 0.1). Hence, the minor differences between these two stimulus conditions were biologically insignificant. On increasing the dynamic range above 15 dB, spike rates increased monotonically (Fig. 3A: {lambda}15 = 0.10 spikes/sec and {lambda}60 = 0.95 spikes/sec, p < 0.0001; B: {lambda}15 = 0.03 spikes/sec and {lambda}60 = 3.36 spikes/sec, p < 0.0001; C: {lambda}15 = 0.31 spikes/sec and {lambda}60 = 2.15 spikes/sec, p < 0.0001; taken for the intensity with maximum response).

Nonmonotonic contrast response curves were seen in ~46% of the sites (p < 0.05; n = 47 mu + su) (Fig. 3D-F). Responses were minimal for 15 dB-RN and maximal for Log-RN with a dynamic range of 30 or 45 dB. On increasing the dynamic range to 60 dB, the responses of nonmonotonic neurons were suppressed. On the average, a 34% (multi-unit = 27%) decrease in firing rate was observed for the 60 dB contrast condition (su median = 25%; mu median = 26%; not significantly different, {chi}2v=5 = 3.75; p = 0.44). The single neuron depicted in Figure 3D has a significant reduction (91%; p < 1 x 10-6) in firing rate ({lambda}30 = 9.7 spikes/sec and {lambda}60 = 0.9 spikes/sec). Although the observed nonmonotonic relationships were statistically significant, we point out that reductions in firing rate were usually small. The neurons shown in Figure 3, E and F, had a reduction of 49% ({lambda}30 = 14.7 spikes/sec and {lambda}60=7.48 spikes/sec; p < 2 x 10-6) and 16% ({lambda}30 = 39.0 spikes/sec and {lambda}60 = 33.8 spikes/sec; p < 0.001), respectively. Only four single neurons and three multi-units showed a significant decrease in firing rate to less than half of their maximum response amplitude. Other neurons showed a decreasing trend in firing rates (n = 7) with increasing contrast (Fig. 3I) ({lambda}15 = 2.3 vs {lambda}60 = 0.75 spikes/sec; p < 1 x 10-6) or showed no statistically significant response pattern (n = 12) (Fig. 3G: {lambda}15 = 3.6 vs {lambda}60 = 4.5 spikes/sec, p > 0.35; H: {lambda}15 = 7.7 vs {lambda}60 = 6.7 spikes/sec, p > 0.4).

Independence of response to intensity and contrast
The contrast-intensity response curves of Figure 3 demonstrate that, in principle, stimulus intensity and contrast can be encoded by the mean firing rate characteristics of individual neurons. The hypothesis that intensity is partly encoded by the mean firing rate of single neurons is consistent with this observation. What is presently not clear is how spectral and temporal fluctuations (which are themselves a form of intensity at very fine spectral and temporal scales) associated with the contrast characteristics of the ripple sound are jointly encoded with intensity by individual or populations of neurons. It is possible that neuronal responses to intensity (SPL) and contrast covary or, alternatively, are processed independently of each other. To determine which of these two possibilities is consistent with the observed data, we determined whether the intensity-contrast rate-level functions are separable for these two parameters.

Intensity-contrast response (for logarithmically distributed RN only) curves were decomposed using a singular value decomposition procedure (Strang, 1988Go). This procedure decomposes the contrast-intensity response curve into a weighted sum of functions that are each independent products of the contrast (C) and intensity (SPL) parameters. Mathematically the response function can be expressed as:

(19)
where R(C,SPL) is the contrast-intensity response curve, {gamma}k is the k-th singular value, and uk(C) and vk(SPL) are functions of contrast and intensity, respectively. If the contrast-intensity response curve is strictly a separable function of SPL and C, it is expected that the above sum degenerates into a single term. For this unique scenario, the response of the neuron is expressed by the first term in the sum R(C,SPL) = {gamma}1 · u1(C) · v1(SPL).

A separable approximation of the contrast-response curve was obtained by considering only the first singular value: (C,SPL) = {gamma}1 · u1(C) · v1(SPL). The separable approximation and the true contrast-intensity response curves are depicted in Figure 4 for two single neurons. In both cases the separable approximation captures most of the detail of the true response function, thus supporting the idea that contrast and intensity are processed independently.

A direct measure of separability is provided by considering the relative strength of the first singular value to the higher-order singular values. Thus we devise a separability index:

(20)
which consists of the ratio of the first singular value, , to the weighted sum of all the squared singular values (N = 4 because the measured contrast-intensity response function consists of a 4 x 5 matrix; four contrast versus five intensity conditions). This measure quantifies the overall fraction of the contrast-intensity response curve accounted for by the separable approximation, . Values near zero indicate that the contrast-intensity response curve is strongly nonseparable, whereas values near unity indicate that the response curve is fully separable. The examples of Figure 4 exemplify this point. Both response curves are in close agreement with their separable approximations and consequently the measured separability index values are near unity (Fig. 4A,B: 0.98; C, D: 0.99). Across the population of neurons (n = 63 su and n = 40 mu), the separability index was exceptionally high (Fig. 4E) (mean value = 0.99 ± 0.01; mean ± SD), suggesting that contrast-response characteristics are independent of SPL.

Effects of envelope statistics on spectrotemporal coding
It is conceivable that the auditory system uses the range and shape of the contrast distribution as a secondary acoustic cue. Individual neurons can show nonmonotonic rate response curves to Log-contrast fluctuations that are independent of intensity, reflecting a contrast range sensitivity or even selectivity. For most neurons, the mean response rates were considerably larger for the naturalistic Log-RN than for the control Lin-RN, indicating sensitivity to the shape of the contrast distribution.

Do individual neurons use the dynamic range characteristics in natural sounds to faithfully encode fine spectral and temporal sound components? Can individual neurons more accurately detect specific acoustic features under such naturalistic contrast conditions?

To address these questions, we computed the STRF at different operating points of the contrast-intensity response curve (see Materials and Methods). RN stimuli were presented at identical rms intensity and two or more contrast conditions (Lin vs 30, Lin vs 60, 30 vs 60, or Lin vs 30 vs 60). Figure 5 shows STRFs and the corresponding contrast-intensity response curves for three typical neurons. STRFs were computed at the operating points depicted by the circles on the contrast-intensity response curve (red = Lin, green = 30 dB, and blue = 60 dB). For all conditions, the shape of the STRF is qualitatively similar, indicating that the neuron is responding to similar sound features during all contrast conditions. The mean firing rate and STRF amplitude of the neuron, however, are significantly stronger (p < 0.01) for the Log- than for the Lin-RN stimulus. Comparing the contrast-intensity response curves with the STRF, it is noted that the differential strength of the STRF (units of spikes per second) is increased at contrast operating points where the mean spike rate is likewise increased. This observation indicates that the neuron uses the increased spike rate to encode phase-locked activity with respect to the stimulus spectrotemporal envelope. This response enhancement is typical for the majority of neurons.



View larger version (75K):
[in this window]
[in a new window]
 
Figure 5. Relationship between the contrast-intensity response curve and the STRF. The contrast-intensity response curve is shown for a contrast nonmonotonic unit (A) and two contrast monotonic neurons (E, I). STRFs were computed at the contrast-intensity operating points designated by the colored circles (red =Lin, green = 30 dB, and blue = 60 dB). B-D show the STRFs for the contrast nonmonotonic neuron depicted in A. Both the mean firing rate and STRF amplitude covary, following a similar nonmonotonic relationship with contrast. The STRF energy for the monotonic neuron depicted in D increases monotonically with increasing contrast (F-H). The neuron of I did not respond to the Lin (J) condition but responded with increased efficacy to the 30 and 60 dB conditions (K, L, respectively). Red contours designate statistically significant regions of the STRF; p < 0.002.

 
It appears that changing the contrast operating point of the RN input alters the relative amplitude of the STRF and leaves its shape unaffected, suggesting that the neuron responds to similar sound components but with increased or decreased efficacy. To quantify this effect, we measured amplitude and shape differences of the STRF as a function of the contrast and intensity operating point. We considered three metrics that independently quantify STRF shape, amplitude, and firing rate differences. First we computed the STRF similarity index (DeAngelis et al., 1999Go; Reich et al., 2000Go; Escabí and Schreiner, 2002Go). This metric takes values between -1 and +1 and is numerically equivalent to the Pearson correlation coefficient. Next we measured the percentage of change in firing rate and STRF energy with the rate (RDIA,B) and magnitude disparity index (MDIA,B). These metrics quantify changes in firing rate and STRF energy, respectively, between two experiment conditions, A and B, in which the magnitude designates the percentage of change and the sign (+ or -) designates which condition is stronger (A or B, respectively).

Typical neurons depicting differences in STRF shape, firing rate, and STRF energy are shown in Figure 5. The neuron depicted in Figure 5E-H has similar STRFs (SI60,30 = 0.97; SI60,Lin = 0.92; SI30,Lin = 0.91) for all conditions tested. Therefore this neuron responded to identical spectrotemporal sound patterns at all operating points. Despite the similarity in spectrotemporal shape, its RDI and MDI indicate that the neuron responded with a higher spike rate (RDI30,Lin = 390%; RDI60,Lin = 543%) and stronger differential response strengths (MDI30,Lin = 395%; MDI60,Lin = 468%) for the 30 and 60 dB conditions (compared with Lin). Neurons with nonmonotonic contrast dependencies (Fig. 5A-D) typically show higher spike rates for 30 dB Log-RN compared with Lin-RN (RDI30,Lin = 35%); however, their spike rates are typically higher for 30 dB than for 60 dB (RDI60,30 = -43%). The STRF energy of the neuron is likewise greater for the 30 dB contrast than for Lin-RN or 60 dB-RN (MDI30,Lin = 72%; MDI60,Lin = 2.3%; MDI60,30 = -42%). Other neurons (Fig. 5I-K) responded weakly to the Lin-RN and therefore did not produce statistically significant STRFs for this condition. SI values comparing Lin- versus Log-RN for this example were small (SI30,Lin = 0.21 and SI60,Lin = 0.28); however, SI values between the 30 and 60 dB condition were much higher (SI60,30 = 0.88). MDI (MDI30,Lin = 4,558% and MDI60,Lin = 18,120%) and RDI values (RDI30,Lin = 5,589% and RDI60,Lin = 22,130%) were large, suggesting that the neuron responded efficiently to the Log- but not to the Lin-RN.

Similarity index population data are shown in Figure 6 for n = 57 single neurons and n = 75 multi-units. Multi-unit and single unit data showed similar trends and therefore were pooled together for all conditions (30 vs Lin; 60 vs Lin; 60 vs 30). Most neurons had high SI values (mean SI = 0.77; median SI = 0.87) across all conditions, supporting the initial observations (Fig. 5) that most neurons responded to similar spectrotemporal sound features for both Lin- and Log-RN. Other neurons (12 single units and 7 multi-units) had low SI values (SI < 0.5). Inspection of the data revealed that these neurons had statistically significant STRFs (p < 0.002) for the 30 and 60 dB conditions, but not for the Lin-RN because of insufficient number of action potentials (Fig. 5J-L).



View larger version (11K):
[in this wind