WWW.JNEUROSCI.ORG
-
The Journal of Neuroscience Advertisement
 QUICK SEARCH:   [advanced]


     
-


HOME
  |  
SEARCH  |   ARCHIVE  |   SUBSCRIBE  |   CONTACT  |   HELP

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an eLetter
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (43)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Escabí, M. A.
Right arrow Articles by Schreiner, C. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Escabí, M. A.
Right arrow Articles by Schreiner, C. E.

 Previous Article  |  Next Article 

The Journal of Neuroscience, May 15, 2002, 22(10):4114-4131

Nonlinear Spectrotemporal Sound Analysis by Neurons in the Auditory Midbrain

Monty A. Escabí1, 2 and Christoph E. Schreiner1

1 W. M. Keck Center for Integrative Neuroscience and University of California San Francisco/University of California Berkeley Joint Bioengineering Graduate Group, University of California, San Francisco, California 94143, and 2 Department of Electrical and Computer Engineering, Biomedical Engineering Program, University of Connecticut, Storrs, Connecticut 06269


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
REFERENCES

The auditory system of humans and animals must process information from sounds that dynamically vary along multiple stimulus dimensions, including time, frequency, and intensity. Therefore, to understand neuronal mechanisms underlying acoustic processing in the central auditory pathway, it is essential to characterize how spectral and temporal acoustic dimensions are jointly processed by the brain. We use acoustic signals with a structurally rich time-varying spectrum to study linear and nonlinear spectrotemporal interactions in the central nucleus of the inferior colliculus (ICC). Our stimuli, the dynamic moving ripple (DMR) and ripple noise (RN), allow us to systematically characterize response attributes with the spectrotemporal receptive field (STRF) methods to a rich and dynamic stimulus ensemble. Theoretically, we expect that STRFs derived with DMR and RN would be identical for a linear integrating neuron, and we find that ~60% of ICC neurons meet this basic requirement. We find that the remaining neurons are distinctly nonlinear; these could either respond selectively to DMR or produce no STRFs despite selective activation to spectrotemporal acoustic attributes. Our findings delineate rules for spectrotemporal integration in the ICC that cannot be accounted for by conventional linear-energy integration models.

Key words: inferior colliculus; spectrotemporal; receptive field; nonlinear; ripple; naturalistic; reverse correlation


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
REFERENCES

The central nucleus of the inferior colliculus (ICC) is an obligatory station in the lemniscal auditory system that receives convergent inputs from numerous brainstem structures and sends its highly processed outputs to the auditory thalamus and, subsequently, to the primary auditory cortex. Neurons in the ICC are sensitive to systematic manipulations of temporal, spectral, binaural, and intensity stimulus attributes (Rees and Møller, 1983, 1987; Schreiner et al., 1983; Langner and Schreiner, 1988; Schreiner and Langner, 1988; Irvine and Gao, 1990; Kuwada et al., 1997; Ramachandran et al., 1999; Krishna and Semple, 2000). These properties have been studied extensively with pure tones, modulated tones, and noise stimuli; however, the overall capabilities of the ICC for processing dynamic, spectrally complex acoustic stimuli remain unknown. Clearly, because natural sounds have structurally rich acoustic spectra and can simultaneously vary along spectral, temporal, intensity, and aural acoustic dimensions, it is essential to understand how these are jointly processed and represented within the ICC.

The concept of a stimulus-response function or receptive field (RF) is a mathematical construct that describes the stimulus features that are encoded by a sensory neuron. A widely used RF description that measures the response of a neuron to pure tones of varying frequency and sound pressure level (SPL) is the frequency-tuning curve (FTC; Schreiner and Langner, 1988; Nelken et al., 1997; Ramachandran et al., 1999). Although this descriptor continues to be important, it cannot characterize the dynamic behavior of a neuron in response to an arbitrary, spectrally complex, time-varying stimulus. Consequently, secondary analyses are often used that measure the ability of a neuron to respond to other stimulus aspects, such as the ability to follow successively presented stimuli of different rates (Rees and Møller, 1983, 1987; Schreiner et al., 1983; Møller and Rees, 1986; Langner and Schreiner, 1988; Eggermont, 1999; Krishna and Semple, 2000).

Recently, the use of reverse correlation techniques to estimate the spectrotemporal receptive field (STRF) in the auditory system (Aersten et al., 1980; Yeshurun et al., 1985; Eggermont, 1993; Nelken et al., 1997; de Charms et al., 1998; Klein et al., 2000; Theunissen et al., 2000; Depireux et al., 2001; Miller et al., 2001, 2002) has allowed scientists to overcome some of the practical limitations posed by conventional auditory RFs and the stimuli used to derive them (e.g., pure tones and modulated tones). The STRF describes the stimulus-response function of an auditory neuron along both the spectral and temporal acoustic dimensions, to a rich stimulus ensemble, and makes no assumptions about independence of spectral and temporal response attributes.

Most RF methods, including the STRF procedure, operate under the assumption that the system under investigation integrates information, be it acoustic or visual, in an approximately linear manner. This requires that the spiking output of a sensory neuron be described as a linear or quasilinear function of its inputs. Although this is often a reasonable assumption, it may not always hold. For instance, direct STRF (referred to as spatiotemporal receptive field for visual neurons) approaches are readily applicable for simple cells in the primary visual cortex (Jones and Palmer, 1987; DeAngelis et al., 1993, 1999; Victor and Purpura, 1998; Anzai et al., 1999; Reich et al., 2000) but fail for visual complex cells and neurons outside of VI (Emerson et al., 1987; Szulborski and Palmer, 1990; Livingstone et al., 2001). Other stimulus-dependent limitations are observed for sensory neurons in acoustically specialized animals, where central sensory neurons are often highly nonlinear and specifically tuned to behaviorally relevant vocalizations (Suga and Jen, 1976; Suga, et al., 1978; Margoliash, 1983; Doupe, 1997; Portfors and Wenstrup, 1999; Theunissen et al., 2000).

Theoretically, the STRF procedure requires the use of white noise as a probing stimulus. Practically, however, because sensory neurons in central stations respond to a limited range of spectrotemporal (spatiotemporal) modulations and are often inhibited by white noise, it is necessary to synthesize acoustic or visual sequences that are optimized for any particular station (de Charms et al., 1998; Klein et al., 2000). Often this is achieved with randomly arranged spectrotemporal tone pips in the auditory system (de Charms et al., 1998; Theunissen et al., 2000) and spatiotemporally interleaved bars or spots of light in the visual system (Emerson et al., 1987; DeAngelis et al., 1993, 1999; Anzai et al., 1999; Reich et al., 2000). Recently, some of the stimulus-dependent limitations associated with such stimuli have been overcome with the use of natural sounds (Theunissen et al., 2000) in the avian auditory cortex homolog.

In this study, we recorded single-unit activity from neurons in the ICC of cats in response to dynamic spectrotemporally complex stimulus sequences. Our synthetic stimuli, the dynamic moving ripple (DMR) and ripple noise (RN), are designed to stringently satisfy a number of theoretical requirements for use with the reverse correlation STRF methods. Furthermore, these sounds share various properties with natural sounds that allow us to overcome some of the practical limitations of white noise, randomly interleaved tone pips, and other synthetic reverse correlation stimuli. Compared with natural signals, these stimuli offer the advantage that they can be parametrically manipulated, allowing for a systematic assessment of nonlinear response characteristics within the ICC. Our findings demonstrate the presence of distinct spectrotemporal nonlinearities in the ICC and identify possible mechanisms used for complex sound analysis, source segregation, and signal detection.


    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
REFERENCES

Surgical preparation

Cats were initially anesthetized with a mixture of ketamine HCl (10 mg/kg) and acepromazine (0.28 mg/kg, i.m.). After an intravenous infusion line was inserted, a surgical state of anesthesia was induced with ~30 mg/kg Nembutal and maintained throughout the surgery with supplements. Body temperature was measured with a rectal probe and maintained with a heating pad at ~37.5°C. An incision was made in the intercartilaginous area of the trachea, and a tracheotomy tube was inserted. After performing a craniotomy, the ICC was exposed by removing the overlying cerebrum and part of the bony tentorium using a dorsal approach. On completion of the surgery, the animal was maintained in an areflexive state of anesthesia via continuous infusion of ketamine (2-4 mg · kg-1 · h-1) and diazepam (0.4-1 mg · kg-1 · h-1) in lactated Ringer's solution (1-4 mg · kg-1 · h-1). The state of the animal was monitored (heart rate, breathing rate, temperature, and periodically checked reflexes) throughout the experiment, and the infusion rate was adjusted according to physiological criteria. Every 12 hr, the cat received an injection of dexamethasone (0.14 mg/kg, s.c.) to prevent brain edema and atropine to reduce salivation (0.04 mg · kg-1 · d-1, s.c.). All surgical methods and experiment procedures followed National Institutes of Health and US Department of Agriculture guidelines and were approved by the committee on animal research of the University of California, San Francisco.

Neuronal recording

Data were obtained from n = 81 single units in the central nucleus of the inferior colliculus of three anesthetized cats. One or two closely spaced parylene-coated tungsten microelectrodes (Microprobe Inc., Potomac, MD; 1-3 MOmega at 1 kHz) were advanced with a hydraulic microdrive (David Kopf Instruments, Tujunga, CA). Action potential traces were recorded onto a digital audiotape (CDAT16; Cygnus Technologies, Delaware Water Gap, PA) at a sampling rate of 24.0 kHz (41.7 µsec resolution) for off-line analysis. Off-line analysis consisted of digital bandpass filtering (0.3-10 kHz) and individually spike sorting the action potential traces using a Bayesian spike-sorting algorithm (Lewicki, 1994).

Electrode penetration trajectories were at ~20-30° relative to the sagittal plane. Electrodes were initially advanced through the external nucleus and onto the central nucleus while audiovisually determining single neuron and multiunit characteristic frequencies (CFs). The boundary between the external and central nucleus of the inferior colliculus (IC) was confirmed physiologically (Merzenich and Reid, 1974) by a reversal or discontinuity in the CF trend and by monotonically increasing CFs as a function of depth (over a range of ~1-20 kHz and ~1.5-5.0 mm relative to the surface of the IC), consistent with the central nucleus. All electrode recordings throughout the remainder of the experiment were taken from this physiologically defined region. Except for the depth and CF constraints, recording locations were randomly distributed within the ICC.

Acoustic stimuli

RN and DMR stimulus waveforms were designed on a digital computer using the MATLAB (Mathworks) programming environment. The spectrotemporal envelopes shown in Figure 1C,D define the energy modulations, in time and frequency, that are used to modulate a bank of sinusoidal carriers of frequencies fk. As with natural signals, the envelope of these sounds is time-varying and probes spectral and temporal neuronal response preferences. Furthermore, analogous to various classes of natural signals (Fig. 1A,B), these sounds have unique short-term statistics (Fig. 2D,E) and yet their long-term statistics are identical (Fig. 2D,E, far right; see Stimulus correlation statistics). Therefore, both sounds satisfy the necessary requirement for use with the reverse correlation procedure that we use to estimate auditory spectrotemporal receptive fields (see Spectrotemporal receptive field).

Dynamic moving ripple envelope. The DMR envelope is designed as a dynamic sinusoidal grating on a octave frequency and decibel amplitude axis. Two parameters defined the DMR envelope: the instantaneous ripple density, Omega (t), defines the number of spectral peaks per octave at a given time instant; and Fm(t) defines the instantaneous modulation rate. The DMR spectrotemporal envelope is expressed as:
S<SUB><UP>DMR</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)=M/2 · <UP>sin</UP>[2&pgr;&OHgr;(t)X<SUB><UP>k</UP></SUB>+&PHgr;(t)], (1)
where M = 30 or 45 is the modulation depth of the envelope in decibels, Xk = log2(fk/f1) is the octave frequency axis relative to the lowest stimulus frequency (f1 = 500 Hz), and Phi (t) = int <UP><SUB>0</SUB><SUP>t</SUP></UP>Fm(tau )dtau controls the time-varying temporal modulation rate, Fm(t). Spectral [Omega (t)] and temporal [Fm(t)] parameters are independent and slowly time-varying random processes (maximum rates of change, 1.5 Hz for Fm and 3.0 Hz for Omega ). The time rate of change of both parameters was heuristically chosen so that they coincide with the observed range of values for similar acoustic features in speech and vocalizations (Greenberg, 1998). To guarantee that the stimulus space was covered in a statistically unbiased manner, both parameters were designed with uniformly (flat) distributed amplitudes in the intervals 0-4 cycles per octave for Omega  and -350 to +350 Hz for Fm.

The time-varying stimulus parameters were generated in the MATLAB programming environment. First, the parameters were generated as a random sequence of normally distributed samples (randn function in MATLAB) using a sampling rate of 3 Hz for Fm(t) and 6 Hz for Omega (t). These sequences had maximum frequency contents of 1.5 and 3 Hz, respectively (because the maximum signal frequency is half of the sampling frequency). To generate the acoustic sound waveforms at a sampling rate of 44.1 kHz (Eq. 4) it was necessary to resample both of the parameter signals to an equivalent sampling rate. Therefore, we upsampled both signals to 44.1 kHz using a cubic interpolation procedure (interp1 function in MATLAB; "cubic" option; upsampling factor, 14,700 for modulation rate and 7350 for ripple density). Next we needed to convert the parameter amplitudes from a normal to a uniform distribution so that the probability of occurrence of each parameter is statistically unbiased within the selected intervals. This normalization was performed with the error function:
erf(x)=2/<RAD><RCD>&pgr;</RCD></RAD><LIM><OP>∫</OP><LL>0</LL><UL><UP>x</UP></UL></LIM>e<SUP>−<UP>t</UP><SUP>2</SUP></SUP>dt.
This function converts normally distributed amplitudes to uniformly distributed amplitudes over the interval -1 to +1 and a subsequent linear rescaling of the amplitudes to the selected interval. This operation had only a subtle effect on the spectrum of these signals and is necessary to guarantee that the signal parameters are statistically unbiased (flat distribution) within their predefined range.

Ripple noise envelope. The RN envelope is first generated as a linear superposition of L = 16 independently chosen DMR envelopes, SDMRl(t, Xk):
<A><AC>S</AC><AC>&cjs1171;</AC></A><SUB><UP>RN</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)=<FR><NU>1</NU><DE><RAD><RCD>L</RCD></RAD></DE></FR><LIM><OP>∑</OP><LL>l=1</LL><UL>L</UL></LIM>S<SUB><UP>DMR</UP><SUB><UP>l</UP></SUB></SUB>(t, X<SUB><UP>k</UP></SUB>), (2)
where the sum is normalized so that the SDs of the RN and DMR are identical. Although this guaranteed that the average contrast of the DMR and RN envelopes be the same (i.e., identical SD), the RN amplitude distribution had long tails and resembled a Gaussian distribution, whereas the DMR envelope is approximately uniform and confined to the interval [-M/2, M/2]. Instances at the high- and low-intensity tails of the distribution of the RN envelope can therefore potentially activate undesirable intensity- and contrast-dependent nonlinearities. We overcame this possibility by compressing the RN envelope so that its amplitude statistics resemble those of the DMR. The compressed RN envelope is given by:
S<SUB><UP>RN</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)=f[<A><AC>S</AC><AC>&cjs1171;</AC></A><SUB><UP>RN</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)], (3)
where f(x) = M/2 · erf(x/sigma DMR) and erf(·) is the error function. This envelope covers a relative intensity range of [-M/2, -M/2] dB as for the DMR envelope. This procedure allows us to isolate spectrotemporal nonlinearities from intensity- or contrast-dependent ones. A second concern was that the erf(·) function significantly distorts the RN envelope by introducing high-frequency envelope modulation components, and this in turn could compromise experimental results. We found, both analytically (data not shown) and through simulation, that the ripple spectrum and spectrotemporal autocorrelation (see Fig. 2D,E, far right; shown for compressed RN) of the uncompressed and compressed RN were in close agreement (2.1% rms error for both the ripple spectrum and autocorrelation).

Acoustic waveform. From the DMR and RN spectrotemporal envelopes, the acoustic sound pressure waveforms, s(t), are constructed by modulating L = 230 sinusoidal carriers that are added together:
s(t)=<LIM><OP>∑</OP><LL>k=1</LL><UL>L</UL></LIM>S<SUB><UP>Lin</UP></SUB>(t, X<SUB><UP>k</UP></SUB>) · <UP>sin</UP>(2&pgr;f<SUB><UP>k</UP></SUB>t+&phgr;<SUB><UP>k</UP></SUB>), (4)
where phi k is a randomly chosen phase (0-2pi ), which gives s(t) a noise-like character, and SLin(t, Xk) is a transformed version of the DMR or RN envelopes that describes the amplitude modulations in linear amplitude units. The linear envelope is bounded between 10-M/20 and 1. It is related to the decibel envelopes by (here we use SdB in place of SDMR and SRN):
S<SUB><UP>Lin</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)=10<SUP><FR><NU>S<SUB><UP>dB</UP></SUB>(t, X<SUB><UP>k</UP></SUB>)−M/2</NU><DE>20</DE></FR></SUP>. (5)
Frequency carriers are geometrically spaced at a resolution of 43 carriers per octave: fk = alpha  · f- 1 (alpha  = 1.01617) over a range of 5.32 octaves (500-20,000 Hz). Although the resultant power spectrum is not flat, this guarantees that the primary sensory epithelium is uniformly excited and equal energy is provided per unit octave.

Sound presentation. All recordings were made with the animal in a sound-shielded chamber (IAC, Bronx, NY), with stimuli delivered via a closed, binaural speaker system (electrostatic diaphragms from Stax). Single neurons or clusters of neurons were initially isolated audiovisually by presenting pure tones, white noise, or both. FTCs were derived in two of the three experiments with a pseudorandom sequence of pure tones presented at 15 intensities and 45 geometrically spaced frequencies. In one experiment, rate-level functions were measured with the RN stimulus as a function of SPL and contrast. After these initial tests, DMR and RN stimuli were presented binaurally with an independent sound sequence for each ear. The DMR stimulus was presented for 10-20 min, followed by 10-18 min (full length presented for ~95% of the recording sites; identical stimuli for all experiments) of the RN at ~30-70 dB/carrier greater than the neuron response threshold (as determined by the FTC or rate-level functions). Because the RN and DMR stimuli are each composed of 230 sinusoid carriers, the effective SPLs were 10 · log10(230) = 23.6 dB greater than these values (i.e., ~53-93 dB greater than threshold; SPL range, 75 ± 19 dB SPL, 64 ± 19 dB/one-third octave, or 51 ± 19 dB/carrier). Both RN and DMR were presented at identical intensities and contrast so that they covered an identical range of amplitudes and fall well within the intensity response area of the neuron. Sixteen neurons were also tested with a short 5 sec segment of the DMR and RN that was presented 40 consecutive times. This was used to construct response rastergrams for each stimulus (see Fig. 10). Finally, for six neurons that did not respond to the RN, the DMR stimulus was again presented at the end of the recording session to verify that the given neurons were still responsive and to verify the stability of the electrode placement.

Stimulus correlation statistics

The long-term and instantaneous spectrotemporal correlation statistics of the RN and DMR stimulus constitute an essential aspect of the stimulus design and the experimental approach. These were evaluated in closed form and rigorously tested via simulation. Only a brief account is provided.

A spectrotemporal Gaussian window, wi(t, X), of SD sigma x = 0.5 octaves and sigma t = 5, 10, or 20 msec and centroid about t = ti was used to localize the RN or DMR spectrotemporal envelope, S(t, X). The instantaneous spectrotemporal autocorrelation function was obtained by evaluating the localized autocorrelation:
R<SUB><UP>ss</UP></SUB>(&tgr;, &xgr;‖t<SUB><UP>i</UP></SUB>)=E[S(t, X)w<SUB><UP>i</UP></SUB>(t, X)S(t−&tgr;, X−&xgr;)w<SUB><UP>i</UP></SUB>(t−&tgr;, X−&xgr;)], (6)
where the expectation operator, E[·], is taken with respect to time, t, and the spectral distance variable, X [Eqs. 1, 3 are substituted for S(t, X)]. The variable ti corresponds to the time instant when the autocorrelation is evaluated, and tau  and xi  correspond to the temporal lag and spectral displacement, respectively.

In closed form the solutions for the RN and DMR are given by:
R<SUB><UP>DMR</UP></SUB>(&tgr;, &xgr;)=&sfgr;<SUP>2</SUP><SUB><UP>DMR</UP></SUB> · <UP>cos</UP>(2&pgr;&OHgr;<SUB><UP>i</UP></SUB>&xgr;+2&pgr;F<SUB><UP>m, i</UP></SUB>&tgr;) · R<SUB><UP>ww</UP></SUB>(&tgr;, &xgr;) (7)

 R<SUB><UP>RN</UP></SUB>(&tgr;, &xgr;)=[&sfgr;<SUP>2</SUP><SUB><UP>RN</UP></SUB> · <UP>sinc</UP>(2&OHgr;<SUB><UP>Max</UP></SUB>&xgr;) · <UP>sinc</UP>(2F<SUB><UP>Max</UP></SUB>&tgr;)+e(&tgr;, &xgr;)] · R<SUB><UP>ww</UP></SUB>(&tgr;, &xgr;), (8)
where sigma <UP><SUB>DMR</SUB><SUP>2</SUP></UP> = M2/8 and sigma <UP><SUB>RN</SUB><SUP>2</SUP></UP> = M2/12 are the variance of the DMR and RN, respectively, and Rww(tau , xi ) is the autocorrelation function of the Gaussian window (which is itself a Gaussian window of SD radical 2sigma x and radical 2sigma t). The parameters Omega i = Omega (ti) and Fm,i = Fm(ti) are the instantaneous DMR parameters evaluated at ti. Because the stimulus parameters dynamically vary with time at a nominal rate of 3 and 1.5 Hz (Fig. 2A,B), the DMR instantaneous spectrotemporal autocorrelation likewise varies with time (Fig. 2D). Accordingly, its spectrotemporal envelope is nonstationary at these time scales. The term e(tau , xi ) is a spectrotemporal noise term, and the parameters Omega Max = 4 cycles per octave and FMax =350 Hz are the maximum ripple parameters.

The long-term autocorrelation for both sounds was obtained by performing a time average of the instantaneous autocorrelation: RSS (tau , xi ) = E[RSS (tau , xi |ti)] (E[·] is now evaluated with respect to ti). The autocorrelation is identical in form for both sounds:
 R<SUB><UP>DMR</UP></SUB>(&tgr;, &xgr;)=R<SUB><UP>RN</UP></SUB>(&tgr;, &xgr;)=&sfgr;<SUP>2</SUP><SUB><UP>S</UP></SUB> · <UP>sinc</UP>(2&OHgr;<SUB><UP>Max</UP></SUB>&xgr;) · <UP>sinc</UP>(2F<SUB><UP>Max</UP></SUB>&tgr;) · R<SUB><UP>ww</UP></SUB>(&tgr;, &xgr;). (9)
The autocorrelations only differ in the SD by a multiplicative factor of 20% (RN, sigma S = sigma RN = Mradical 12 dB; DMR, sigma S = sigma DMR = Mradical 8 dB).

Spectrotemporal receptive field

STRFs are computed by averaging the pre-event spectrotemporal envelope. For a sequence of N neural events at times, tn (sampled at 41.7 µsec resolution), contralateral and ipsilateral STRFs are obtained as [here we use SdB(t, Xk) in place of SDMR(t, Xk) or SRN(t, Xk)]:
<UP>STRF</UP>(&tgr;, X<SUB><UP>k</UP></SUB>)=1/(&sfgr;<SUP>2</SUP><SUB><UP>s</UP></SUB> · T) · <LIM><OP>∑</OP></LIM><SUB><UP>n</UP></SUB>S<SUB><UP>dB</UP></SUB>(t<SUB><UP>n</UP></SUB>−&tgr;, X<SUB><UP>k</UP></SUB>), (10)
where T is the experimental recording time in seconds, tau  is the temporal delay of the stimulus relative to the neural event time (0-100 msec), and sigma <UP><SUB>S</SUB><SUP>2</SUP></UP> is the variance of the decibel spectrotemporal envelope for the DMR or RN. During the DMR and RN stimulus presentation, independent sound sequences were binaurally presented to each animal. This allowed us to independently estimate the contralateral and ipsilateral STRFs by replacing the contralateral and ipsilateral spectrotemporal envelopes into Equation 10 (Marmarelis and Naka, 1974).

Stimulus envelopes were sampled at 4.0 kilosamples/sec (temporal) and 43 samples per octave (spectral). The STRF is formally given in units of spikes per second per decibel. We use a rate-normalized version of the STRF, STRFr(tau , Xk) = sigma s · STRF(tau , Xk), which corresponds to the average driven output produced at time 0, in units of spikes per second, for the average differential stimulus (decibels) presented within the receptive field of the neuron.

Statistically significant STRF

We devised a procedure for measuring the statistically significant STRF by considering a null condition in which N randomly chosen spikes are put through Equation 10. This procedure consists of adding random sound waveforms to construct a control STRF from which statistical significance can be determined. Solutions for this procedure were derived analytically in closed form (data not shown). The distribution of amplitudes for the control STRF quickly approached a normal distribution (with as little as N = 50 spikes). Therefore, a simplification was made in which we determined the two-tailed probability of exceeding a threshold relative to the control STRF under the assumption of a normal distribution. The statistically significant portion of the STRF (p < 0.002) is obtained by keeping all values of the STRF that exceed 3.09 SD of the control noise STRF and setting all other values to 0. Analytically this is expressed as |sigma <UP><SUB>S</SUB><SUP>2</SUP></UP> · T · STRF(tau , xi )/radical N| > 3.09 · sigma s. No smoothing was performed before or after thresholding. This procedure was tested against the analytically derived solutions, and we found that actual significance values were always slightly smaller (e.g., actual significance value of p < 0.0019 for N = 50).

To determine relative significance of STRFs, on an equal spike basis, we further evaluated significance by recomputing all STRFs using 100 action potentials and determining all pixel values that exceeded the p < 0.002 confidence intervals. For these pixel values, the average and maximum signal-to-noise ratio (SNR) was computed. Average and peak SNRs were computed as:
<UP>SNR</UP><SUB><UP>Mean</UP></SUB>=<RAD><RCD>E[<UP>STRF</UP><SUP>2</SUP><SUB>100</SUB>]/&sfgr;<SUP>2</SUP><SUB>100</SUB></RCD></RAD>
and
<UP>SNR</UP><SUB><UP>Max</UP></SUB>=<UP>max</UP>(‖<UP>STRF</UP><SUB>100</SUB>‖)/&sfgr;<SUB>100</SUB>,
where sigma 100 is the SD of the noise control STRF derived for 100 random spikes. Thus, for any given pixel, the SNR determines the number of SDs by which STRF pixels stand out above the noise.

Null hypothesis

Response nonlinearities are tested against the expected results for an ideal linear model neuron. Given that the long-term spectrotemporal autocorrelation functions for the DMR and RN are identical, it follows that for a purely linear neuron STRFDMR = STRFRN (for proof, see Appendix A). Significant differences between the RN and DMR STRFs can be attributed to response nonlinearities. To quantify response differences, we use the statistically significant portion of the STRFs and use this to compute a number of response metrics for the DMR and RN: similarity index, rate and magnitude disparity index, and the phase-locking index (see below).

Quantifying DMR and RN response differences

Neural responses for DMR and RN were compared in three complementary ways. First the STRF similarity index (SI; DeAngelis et al., 1999; Reich et al., 2000) was used to quantify shape differences between STRFDMR and STRFRN. Using the STRF pixel values that exceeded the statistical significance threshold of p < 0.002 for either condition, we treated the STRFs as vectors (including significant contralateral and ipsilateral pixels). The vectorized RFs were then used to evaluate the similarity index:
<UP>SI</UP>=<FR><NU>⟨RF<SUB><UP>DMR</UP></SUB>, RF<SUB><UP>RN</UP></SUB>⟩</NU><DE>∥RF<SUB><UP>DMR</UP></SUB>∥ · ∥RF<SUB><UP>RN</UP></SUB>∥</DE></FR>, (11)
where RFDMR and RFRN are the significant STRFs, < ·, ·> is the vector inner product, and ∥·∥ designates the vector norm operator. The SI is numerically identical to the Pearson correlation coefficient.

We devised two metrics to evaluate differences in firing rate and driven activity independently of STRF shape. First we computed the rate disparity index (RDI):
<UP>RDI</UP>=s · <FENCE><FENCE><FR><NU>r<SUB><UP>DMR</UP></SUB></NU><DE>r<SUB><UP>RN</UP></SUB></DE></FR></FENCE><SUP><UP>s</UP></SUP>−1</FENCE> · 100%, (12)
where s = sign(rDMR - rRN), and the mean spike rates for each condition are rDMR and rRN. The magnitude of the RDI is numerically equivalent to the percent change in firing rate between DMR and RN. Its sign tells us which condition, DMR or RN, had a higher firing rate (+, DMR; -, RN). To quantify differences in driven activity, we used a third metric, the magnitude disparity index (MDI). The MDI is identical in form to the RDI, where the mean firing rates, rDMR and rRN, are replaced by the rate-normalized STRF energies, EDMR and ERN, for the corresponding conditions. Here the STRF energy is computed as:
E=<RAD><RCD><LIM><OP>∑</OP><LL>k=1</LL><UL>L</UL></LIM>∫<UP>STRF</UP><SUB><UP>r</UP></SUB>(&tgr;, X<SUB><UP>k</UP></SUB>)<SUP>2</SUP>d&tgr;</RCD></RAD>. (13)
Because the response of the neuron could be fractionally distributed between the contralateral and ipsilateral ears, the energy of the contra- and ipsi-STRFs was measured independently, and the cumulative sum was taken as:
E<SUB><UP>Total</UP></SUB>=<RAD><RCD>E<SUP>2</SUP><SUB><UP>c</UP></SUB>+E<SUP>2</SUP><SUB><UP>i</UP></SUB></RCD></RAD>,
where Ec and Ei are the contra- and ipsi-STRF energies. The STRF energy measures phase-locked activity (units of spikes per second) and is equivalent to the average phase-locked output for a linear integrating neuron (for proof, see Appendix B).

Phase-locking index

The phase-locking index (PLI) quantifies the ability of a neuron to phase lock to the spectrotemporal envelope. This metric is obtained by dividing the peak-to-peak STRF amplitude (in spikes per second) by the mean spike rate, r:
<UP>PLI</UP>=<FR><NU><UP>max</UP>(&sfgr;<SUB><UP>S</UP></SUB> · <UP>STRF</UP>)−<UP>min</UP>(&sfgr;<SUB><UP>S</UP></SUB> · <UP>STRF</UP>)</NU><DE>r</DE></FR> · <FR><NU>1</NU><DE>&Dgr;</DE></FR>, (14)
and normalizing this quantity by a theoretically derived factor, Delta , that corresponds to the theoretical maximum peak-to-peak rate-normalized STRF amplitude (confining this index to the range of 0-1). For the DMR, Delta  = radical 8, and for the RN, Delta  = radical 12 (for proof, see Appendix C).

Frequency domain analysis: ripple transfer function and conditioned response histogram

As an alternative to the STRF, we further evaluated neuronal response preferences to DMR and RN in the frequency domain. These approaches are useful, because they can be used to quantify neural responses as a function of ripple frequency and temporal modulation rate parameters.

The ripple transfer function (RTF) is one such descriptor. It is obtained directly from the STRF by performing a two-dimensional Fourier transform on the statistically significant STRF (p < 0.002), discarding the phase, and keeping the magnitude (see Fig. 5A,B). From the RTF, the best ripple density and best modulation rate parameters were determined for all phase-locking neurons. These are chosen by the location in the magnitude response with the peak amplitude. In instances in which two responses are observed (for negative and positive modulation rates), the secondary response was selected only if its response magnitude exceeded 50% of the maximum response magnitude. Positive (negative) modulation rates designate downward (upward)-going stimulus features; however, because the STRF is a time-reversed version of best stimulus of the neuron, this convention is flipped for the neuron and its RTF (positive, upward sweep; negative, downward sweep).

Although this approach was successfully applied for many neurons, other neurons did not show statistically significant STRFs; therefore, it was impossible to estimate their RTFs directly. We therefore approximate the probability distribution function of observing a given set of parameters given a spike at time tn, P(Fm, Omega |tn), by performing a spike-triggered average with respect to the time-varying DMR parameters, Omega (t) and Fm(t):
P<SUB><UP>kl</UP></SUB>=<LIM><OP>∑</OP><LL>n=1</LL><UL>N</UL></LIM>I[k&Dgr;F<SUB><UP>m</UP></SUB>≤F<SUB><UP>m</UP></SUB>(t<SUB><UP>n</UP></SUB>)≤(k+1)&Dgr;F<SUB><UP>m</UP></SUB>] · I[l&Dgr;&OHgr;≤&OHgr;(t<SUB><UP>n</UP></SUB>)≤(l+1)&Dgr;&OHgr;], (15)
where Pkl is the discrete version of P(Fm, Omega |tn), and I[·] is the identity function. The identity function takes a value of unity whenever the condition inside its argument is satisfied. Otherwise, it assumes a value of 0. Thus for any given bin of Pkl, this conditioned response histogram (CRH) is incremented by +1 if and only if the instantaneous parameters, Fm(tn) and Omega (tn), fall within the required intervals, kDelta Fm <=  Fm(tn<=  (k + 1)Delta Fm and lDelta Omega  <=  Omega (tn<=  (l + 1)Delta Omega , at the time of the neuronal spike, tn (see Fig. 5C,D). Bin width resolutions of Delta Fm = 15-35 Hz and Delta Omega  = 0.2-0.4 cycles per octave were used. The exact position used to estimate the parameters relative to the neuronal spike time, tn, did not alter the resulting histogram (tested for a time lag of 0-50 msec), because the parameters vary at a slow rate (1.5 and 3 Hz) compared with the integration time of ICC neurons (usually tens of milliseconds).

As for single units, it was also useful to characterize population responses in the frequency domain, and we therefore extended these methods to include population statistics. By averaging the RTFs of individual neurons, we estimated the population ripple transfer function (pRTF) for those neurons with significant STRFs. To avoid biasing the pRTF because of systematic differences in firing strength, the RTFs of individual neurons were equally weighted so that the cumulative area of each was exactly 1.

For neurons that did not produce statistically significant STRFs, a modified approach was applied. We normalized the CRH of each neuron so that its cumulative sum was exactly 1. An average was then taken over the entire population, thereby producing the "population" CRH (pCRH). To facilitate comparisons, the pCRH was interpolated using the interp2 function (spline option) in MATLAB to identical resolution as for pRTF.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
APPENDIX A
APPENDIX B
APPENDIX C
REFERENCES

We studied 81 single neurons with the intent of understanding how dynamic spectrotemporal signals are processed within the central nucleus of the inferior colliculus. Specifically, we address whether single neurons integrate spectrotemporal information according to a linear integration model and whether dynamic stimulus aspects significantly affect neuronal encoding. Our complex stimuli constitute an integral part of the experimental protocol, and we fully characterize several pertinent properties of the stimulus ensembles. By design, both test sounds have identical average statistics and, therefore, equally sample the relevant spectrotemporal stimulus dimensions for this study. As a first-order test of evaluating spectrotemporal response nonlinearities, we compute and compare the spectrotemporal receptive field for each sound type. We also characterize higher-order response attributes that are not directly accessible with the STRF descriptor.

Stimulus statistics: average versus dynamic spectrotemporal characteristics of the dynamic moving ripple and ripple noise

To test the possibility that individual auditory neurons in the ICC are selective for structural features prevalent in natural sounds (Fig. 1A,B), complex broadband stimuli (Fig. 1C,D) were designed that allow us to systematically identify nonlinear processing capabilities of auditory neurons. These stimuli fulfill a number of theoretical and ecological constraints: first, both sounds were designed to stringently meet a number of necessary requirements for use with the STRF. Second, both sounds incorporate a number of pertinent acoustic stimulus attributes that are prevalent in various natural signals [e.g., spectral energy peaks, frequency modulation (FM) sweeps, and temporal modulations] and that determine important perceptual qualities (Plomp, 1970, 1983; Van Veen and Houtgast, 1983).



View larger version (110K):
[in this window]
[in a new window]
 
Figure 1.   Synthetic sound sequence used for reverse correlation analysis (C, D) and some corresponding natural sound counterparts (A, kitten vocalizations; B, babbling brook). The DMR (C) is designed to mimic spectral profiles created by formants (spectral energy peaks) and temporal modulations in speech production and animal vocalizations. The ripple density parameter, Omega (t), corresponds to the number of energy peaks (cycles per octave) along the spectral axis at time t. The temporal modulation rate, Fm(t), describes the repetition rate of the envelope in hertz. The second stimulus, the RN (D), has noise-like properties that uniformly cover the ripple dimensions. The DMR and RN are shown for a maximum temporal modulation rate of 70 Hz, although a value of 350 Hz was used for the experiments.

The DMR stimulus (Fig. 1C) is an extension of the rippled spectrum noise used to characterize spectral and temporal response properties in the ferret and cat auditory cortex (Schreiner and Calhoun, 1994; Kowalski et al., 1996; Klein et al., 2000). This sound is constructed so that its spectrotemporal envelope is dynamic and coherently modulated ("structured") in time and frequency. As for speech and animal vocalizations (Fig. 1A), the DMR has strong short-time spectrotemporal correlations. These are determined by two independent parameters that vary randomly in time: the temporal modulation rate, Fm(t), and ripple density, Omega (t) (see Materials and Methods; Figs. 1C, and 2). The temporal modulation parameter determines the number of onsets and offsets per unit time (units of hertz) (Fig. 1C, top right). At any given time, the DMR sound produces a sinusoidal energy excitation pattern along the sensory epithelium, where the number of peaks per octave frequency is determined by the ripple density at that instant (Fig. 1C, top right). To efficiently excite neurons in the range characteristic for vocalizations, these parameters continuously vary at a nominal rate of 3 Hz (ripple density) (Fig. 2A) and 1.5 Hz (temporal modulation rate) (Fig. 2B) (in speech, for instance, similar features change at a rate of ~2-8 Hz; Greenberg, 1998).



View larger version (65K):
[in this window]
[in a new window]
 
Figure 2.   Stimulus dynamics and spectrotemporal correlation statistics of the DMR and RN. The DMR parameter trajectories Omega (t) (A; ripple density, 0-4 cycles per octave) and Fm(t) (B; modulation rate, -350-350 Hz) are shown for a short 15 sec segment. The spectrotemporal parameters efficiently cover the ripple space (C; shown for the 15 sec segments of A, B). The instantaneous correlation function of the DMR (D) and RN (E) are shown for three distinct time instants, t1-t3 [D; left to right, Omega (t1) = 1 cycle per octave; Fm(t1) = 0 Hz; Omega (t2) = 2 cycles per octave; Fm(t2) = 150 Hz; Omega (t3) = 0.15 cycles per octave; Fm(t3) = -60 Hz]. The RN instantaneous correlation function consists of a narrow central peak and a noisy surround (E). The global autocorrelation is identical for both sounds, consisting of an impulse-like central peak of width 3 msec and one-fourth octave (D, E, far right).

By averaging 16 independently chosen DMR envelopes, we designed a second stimulus, the RN. This sound is locally weakly correlated ("unstructured"), resembling background and environmental noises such as wind and rain (Fig. 1B). Visually, its spectrotemporal envelope (Fig. 1D) has a noisy profile both along time and along the spectral axis and lacks coherent modulations as present in the DMR and many vocalization sounds (Voss and Clarke, 1975; Attias and Schreiner, 1998; Nelken et al., 1999; Theunissen et al., 2000).

To characterize and compare the instantaneous versus the average behavior of these stimuli and their suitability for the reverse correlation method, the spectrotemporal autocorrelation function was evaluated for each stimulus. Dynamic properties were evaluated over short intervals of 10, 20, and 40 msec, which are comparable with integration times for ICC neurons. Global correlation statistics were evaluated for the ensemble as a whole (consisting of a 20 min continuous sound segment; see Materials and Methods). Both the local (shown for 10 msec analysis interval) and global spectrotemporal autocorrelations are depicted in Figure 2D,E.

The local autocorrelation depicts the spectrotemporal modulations that are present at a given time instant over a 10 msec segment. For the DMR stimulus, these take the form of tapered oscillations at a characteristic ripple density, modulation rate, and frequency sweep direction (Fig. 2D). Comparing the DMR and RN, it is clear that the local stimulus statistics are markedly different. Although the DMR has strong local correlations over the defined 10 msec intervals, the RN lacks any definitive spectral and temporal oscillations (Fig. 2E). Accordingly, its local autocorrelation is qualitatively similar at all time instants, consisting of a narrow central peak with a noisy surround. Therefore, the RN appears to be stationary or locally time-invariant. By comparison, the DMR has local envelope statistics that are dynamic; that is, they continuously vary with time.

By averaging the instantaneous autocorrelation function over all 10 msec time instants, it is possible to characterize the average statistics for the DMR and RN stimulus ensembles, which are identical (Fig. 2D,E, far right). In both cases, the average spectrotemporal autocorrelation assumes a narrow impulse-like character, which is the essential requirement for deriving receptive fields with the reverse correlation method (Eggermont, 1993; Klein et al., 2000).

Linear spectrotemporal receptive fields for DMR and RN

Neuronal data were evaluated by computing the STRF for neurons in the ICC and comparing neuronal responses to the spectrotemporally structured (DMR) and unstructured (RN) sounds. The STRF is a mathematical construct that describes the integrating area of the neuron along time and along the sensory epithelium (i.e., the frequency axis) and that depicts the spectrotemporal arrangement of neuronal excitation (red domains) and inhibition (blue domains). Figure 3 illustrates the spike-triggered average procedure we use to derive STRFs in response to DMR and RN. The STRF procedure requires that the probing stimulus have an unbiased modulation spectrum (both in time and along the sensory epithelium) or, equivalently, an impulsive spectrotemporal autocorrelation function that fully covers the physiologically relevant limits. Both the RN and DMR were designed with this constraint in mind; by limiting the temporal modulation rate to 350 Hz and the ripple density to 4 cycles per octave, we should be able to characterize 90-95% of the neurons in the ICC (Langner and Schreiner, 1988; Krishna and Semple, 2000) without biasing their STRFs.



View larger version (65K):
[in this window]
[in a new window]
 
Figure 3.   Spike-triggered average and the STRF. At each instant of an action potential, the pre-event sound segment (up to 100 msec before spiking) is extracted and averaged for the entire stimulus ensemble. Red regions indicate stimulus patterns that were likely to be present whenever a neural response occurred at delay of 0. Blue indicates stimulus patterns that tended to be off at a moment before spike initiation. Functionally, these are interpreted as excitation (red) and inhibition (blue).

This essential property, which makes the RN and DMR stimuli suitable for reverse correlation, also permits the identification of spectrotemporal response nonlinearities. Given that both stimuli have identical low-order statistics (matched in intensity, contrast, and average envelope modulations), it is expected that a linear integrating neuron would have an average neural response that is similar for the RN and DMR conditions. That is, because both the RN and DMR stringently satisfy the necessary requirements for reverse correlation, we expect that STRFDMR = STRFRN if the neuron behaves as a linear integrator (see Materials and Methods and Appendix B for proof). By comparing DMR and RN responses, we find that 60% (n = 49) of the neurons in our ICC sample met this requirement (Fig. 4). For reference, pure tone FTCs are shown alongside the RN and DMR STRFs when available (Fig. 4A,D). A red bar designates the mean sound pressure level (per one-third octave) for DMR and RN.



View larger version (77K):
[in this window]
[in a new window]
 
Figure 4.   Spectrotemporal receptive fields of neurons that responded to DMR and RN. Neurons were tested with pure tones (A, D, left column), DMR (B, E, G, I, middle column), and the RN (C, F, H, J, right column) stimuli (individual neurons are shown by row). Frequency-tuning curves depict the frequency versus intensity response area of a neuron (A, D). The red horizontal line designates the mean stimulus level (per one-third octave) used for RN and DMR. STRFs have similar shapes (similarity index: B, C, 0.94; E, F, 0.76; G, H, 0.77; I, J, 0.7) and strength (magnitude disparity index: B, C, -13%; E, F, 178%; G, H, 74%; I, J, 4%; rate disparity index: B, C, -6%; E, F, 35%; G, H, 24%; I, J, -53%). To facilitate comparisons, STRFs are shown on identical color scales for RN and DMR. STRFs for each neuron are drawn on individually chosen spectral and temporal scales. Significant patterns of the STRF are denoted by red contours (p < 0.002 contour).

Neurons in our sample showed a variety of preference to stimulus patterns in the DMR and RN, including suppressive side bands, obliquely oriented excitatory or inhibitory regions, and distinct temporal response profiles (e.g., on-off, off-on, and off-on-off). Typically, excitatory and inhibitory STRF features were consistent between DMR and RN, although in some cases, inhibitory features were less pronounced for the RN (Fig. 4E,F). DMR and RN firing rates were generally high (mean spike rate, 11.2 spikes/sec for DMR and 11.8 spikes/sec for RN) and significantly correlated [correlation coefficient, 0.85 ± 0.08 (mean ± SE)] for this subset of neurons. Likewise, all neurons had comparable STRF energies. The neuron of Figure 4B,C, for instance, had a spike rate of 34.0 spikes/sec for the DMR and 36.2 spikes/sec for the RN (difference, 6%) and comparable STRF energies (EDMR, 2.6 spikes/sec; ERN, 3.0 spikes/sec; difference, 13%). The presence of well defined, statistically significant STRFs (p < 0.002) for both DMR and RN indicates that neurons efficiently phase locked to the stimulus spectrotemporal envelope. To distinguish these functional properties from those of other neurons in our sample, we refer to these as type I responses.

Frequency domain RF analysis

Complementary to the STRF, we also evaluated neuronal data in the frequency domain to extract physiologically meaningful parameters from the STRF and to describe neuronal preferences in terms of low-pass and bandpass filtering (Depireux et al., 2001; Klein et al., 2000).

First, we converted the STRF to an RTF (Fig. 5A,B). The RTF maps a the preferences of a neuron as a function of the temporal (modulation rate) and spectral (ripple density) stimulus parameters (see Materials and Methods). Whether a neuron integrates spectral or temporal information in a low-pass or bandpass manner depends strongly on the spectrotemporal relationship between neural excitation and inhibition in its STRF. For instance, the neuron of Figure 4B,C, has an on-off temporal response pattern; therefore, its RTF resembles a bandpass filter along the temporal modulation axis (Fig. 6A) that is centered at a best temporal modulation rate (bTM) of 45 Hz. Likewise, along the spectral axis, this neuron has a weak but significant inhibitory region alongside an excitatory region. Therefore its response as a function of ripple density also has a bandpass response profile with the dominant response peak centered at a best ripple density (bRD) of 0.6 cycles per octave. Neurons that lack interleaved patterns of excitatory (on) and inhibitory (off) subfields in their STRFs generally have low-pass response characteristics (Fig. 4I,J) along the spectral and temporal dimensions. The STRF of this example is marked by an off-on-off temporal response pattern, but its spectral STRF patterns lack interleaved excitatory and inhibitory subfields. Accordingly, its RTF (Fig. 6B) shows a bandpass response pattern in time (bTM, 200 Hz) and a low-pass response pattern along the spectral axis (bRD, 0 cycles per octave).



View larger version (81K):
[in this window]
[in a new window]
 
Figure 5.   Frequency domain response analysis. The auditory STRF (A; shown for RN) is used to compute the RTF (B; shown for RN) by applying a two-dimensional Fourier transform. The RTF depicts time-locked energy in the neural response as a function of temporal modulation rate, Fm, and ripple density, Omega . Red indicates parameter combinations that evoked a strong time-locked response, and blue indicates a weak response. The CRH (D) characterizes nonlinear neuronal responses that do not show up in the STRF. For each neural event, the spectral and temporal DMR parameters, Omega (tk) and Fm(tk), are determined at the time instance of the neural spike, tk. The values of Omega  and Fm are then used to increment the corresponding bin in the histogram by +1 (D).



View larger version (73K):
[in this window]
[in a new window]
 
Figure 6.   Neuronal preferences determined with the RTF (left column) and CRH (right column) shown for the neurons of Figures 4B,I and 7J,G (A-D, respectively). The RTF and CRH depict the spectrotemporal frequency combinations (modulation frequency and ripple density) that preferentially activate a neuron. These can show either a low-pass or bandpass tuning profile along the temporal modulation or ripple density axis. Generally, neuronal tuning is similar for the RTF and CRH.

In a second related approach, a CRH was used to evaluate neuronal selectivity by tabulating the number of action potentials as a function of ripple parameters (see Materials and Methods). Unlike the STRF and RTF, this method accumulates the stimulus parameters, as opposed to the averaging stimulus waveforms, and is therefore insensitive to spike timing jitter. Figure 5C,D illustrates this approach. Generally, we find that RTF and CRH are in close agreement (Fig. 6). However, the CRH also reflects nonspecific activity, that is, action potentials that fall outside the dominant RTF boundaries and presumably do not contribute to the construction of the STRF (Fig. 6A,B).

Nonlinear spectrotemporal receptive fields for DMR and RN

One question addressed in this study is whether ICC neurons require specific acoustic features to be efficiently activated and whether these features can be identified using the STRF method. One reason why it may be difficult to identify the preferred acoustic features of a neuron using a direct approach is because conventional reverse correlation stimuli (such as the RN or spectrotemporal tone pips) seldom contain isolated sound patterns during a typical recording period. As an example, the DMR stimulus has pronounced energy peaks and FM sweeps that appear in isolation in its spectrotemporal envelope (Fig. 1C). These same features are much more subtle in the RN (Fig. 1D), because they are superimposed with other components. How do such stimulus characteristics affect the ability of a neuron to respond, and which of these stimuli is better suited for identifying neuronal preferences in central auditory stations? Presumably, if a neuron exhibits substantial nonlinearities, significant differences could be expected between DMR and RN.

Not all studied neurons responded equally well to the DMR and RN. A small but significant (14%; n = 11) subset of neurons responded selectively to the DMR stimulus (Fig. 7; type II neurons). In general, type II neurons had low firing rates to the DMR and little or no response to the RN. Average firing rates for either stimuli were significantly lower than for type I responders (mean DMR, 0.61 spikes/sec; t test, p < 0.003; mean RN, 0.13 spikes/sec; t test, p < 0.0025). Surprisingly, despite the low spike rates, STRFs derived with the DMR were highly significant (p <=  0.002) and exceptionally clean.



View larger version (77K):
[in this window]
[in a new window]
 
Figure 7.   Spectrotemporal receptive fields of neurons that responded specifically to the DMR sound (B, D, G, J, middle column) but responded weakly or had no response to the RN (C, E, H, K, right column). Frequency-tuning curves derived with pure tones are shown for reference (A, F, I, left column). Red lines designate the mean stimulus level (per one-third octave) used for DMR and RN. Significant STRF patterns are denoted by red contours. All neurons are shown at distinct spectral and temporal scales.

Figure 7 depicts typical responses for these neurons. Some neurons (Fig. 7B-E) responded to both the DMR (0.24 and 1.4 spikes/sec) and the RN (0.14 and 0.2 spikes/sec) sounds, although their DMR firing rate was significantly stronger. DMR STRFs were highly significant (p < 0.002), with well defined excitatory and inhibitory subfields. However, the RN STRFs of these type II neurons were weak, with no distinguishable boundaries and excitatory and inhibitory subregions. Furthermore, the DMR STRF energy was 725% (Fig. 7B,C; EDMR, 0.100 spikes/sec; ERN, 0.012 spikes/sec) and 1280% (Fig. 7D,E; EDMR, 0.276 spikes/sec; ERN, 0.020) stronger, respectively, than for RN. Although these neurons did respond weakly to RN, other neurons responded exclusively to the DMR (Fig. 7G,H,J,K). Again, these neurons had extremely low spike rates (0.45 and 0.11 spikes/sec, respectively) to the DMR and no response to the RN (0 spikes). These STRFs were constructed using 276 (Fig. 7G) and 139 (Fig. 7J) spikes for the DMR over a 10 and 20 min recording period, respectively. Nevertheless, their STRFs are as noise-free as those of type I responders that typically had thousands to tens of thousands of action potentials.

Interestingly, response characteristics for type II neurons are consistent with the idea that they are highly selective for some of the DMR stimulus features. The fact that we can compute highly significant DMR STRFs from very few spikes further suggests that the acoustic features leading to spike initiation must be precisely aligned in time and frequency; otherwise, STRFs would not accurately build up. To determine whether this is so, we recomputed all DMR STRFs using a subset of 100 randomly chosen action potentials for each neuron and determined the mean and maximum SNRs of those pixel values that exceeded a significance criterion of p < 0.002 (see Materials and Methods). The SNR of these conditioned DMR STRFs was approximately twice as strong for type II neurons (average maximum SNR, 8.7 for type II vs 4.0 for type I; paired t test, p < 3.5 × 10-5; average mean SNR, 4.7 for type II vs 2.8 for type I; paired t test p < 0.003). This suggests that the spectrotemporal waveforms added to compute the STRF are more consistent from spike to spike for type II neurons compared with type I neurons. Consequently, type II neurons are highly sensitive for particular stimulus features in the DMR stimulus, resulting in exceptionally clean STRFs that can be obtained with very few action potentials. Response specificity is also reflected in the CRH for these neurons. Compared with type I responses, CRHs for type II responses show highly localized peaks (Fig. 6C,D, far right) and lack nonspecific activity. Together, the low firing rates, high response specificity to the DMR, and unresponsiveness to RN demonstrate that these neurons are extremely nonlinear and highly selective for isolated spectrotemporal sound patterns.

It may be argued that the seemingly low spike rates and sparse responses of these neurons are simply attributed to stimulus levels near or below the response threshold of the neuron. We tested for this possibility in 6 of the 11 neurons by computing FTCs with pure tones (Fig. 7A,F,I). The FTCs are shown alongside