 |
Previous Article | Next Article 
The Journal of Neuroscience, May 15, 2002, 22(10):4114-4131
Nonlinear Spectrotemporal Sound Analysis by Neurons in the
Auditory Midbrain
Monty A.
Escabí1, 2 and
Christoph E.
Schreiner1
1 W. M. Keck Center for Integrative Neuroscience
and University of California San Francisco/University of California
Berkeley Joint Bioengineering Graduate Group, University of California,
San Francisco, California 94143, and 2 Department of
Electrical and Computer Engineering, Biomedical Engineering Program,
University of Connecticut, Storrs, Connecticut 06269
 |
ABSTRACT |
The auditory system of humans and animals must process information
from sounds that dynamically vary along multiple stimulus dimensions,
including time, frequency, and intensity. Therefore, to understand
neuronal mechanisms underlying acoustic processing in the central
auditory pathway, it is essential to characterize how spectral and
temporal acoustic dimensions are jointly processed by the brain. We use
acoustic signals with a structurally rich time-varying spectrum to
study linear and nonlinear spectrotemporal interactions in the central
nucleus of the inferior colliculus (ICC). Our stimuli, the dynamic
moving ripple (DMR) and ripple noise (RN), allow us to systematically
characterize response attributes with the spectrotemporal receptive
field (STRF) methods to a rich and dynamic stimulus ensemble.
Theoretically, we expect that STRFs derived with DMR and RN would be
identical for a linear integrating neuron, and we find that ~60% of
ICC neurons meet this basic requirement. We find that the remaining
neurons are distinctly nonlinear; these could either respond
selectively to DMR or produce no STRFs despite selective activation to
spectrotemporal acoustic attributes. Our findings delineate rules for
spectrotemporal integration in the ICC that cannot be accounted for by
conventional linear-energy integration models.
Key words:
inferior colliculus; spectrotemporal; receptive field; nonlinear; ripple; naturalistic; reverse correlation
 |
INTRODUCTION |
The central nucleus of the inferior
colliculus (ICC) is an obligatory station in the lemniscal auditory
system that receives convergent inputs from numerous brainstem
structures and sends its highly processed outputs to the auditory
thalamus and, subsequently, to the primary auditory cortex. Neurons in
the ICC are sensitive to systematic manipulations of temporal,
spectral, binaural, and intensity stimulus attributes (Rees and
Møller, 1983 , 1987 ; Schreiner et al., 1983 ; Langner and Schreiner,
1988 ; Schreiner and Langner, 1988 ; Irvine and Gao, 1990 ; Kuwada et al.,
1997 ; Ramachandran et al., 1999 ; Krishna and Semple, 2000 ). These
properties have been studied extensively with pure tones, modulated
tones, and noise stimuli; however, the overall capabilities of the ICC
for processing dynamic, spectrally complex acoustic stimuli remain unknown. Clearly, because natural sounds have structurally rich acoustic spectra and can simultaneously vary along spectral, temporal, intensity, and aural acoustic dimensions, it is essential to understand how these are jointly processed and represented within the ICC.
The concept of a stimulus-response function or receptive field (RF) is
a mathematical construct that describes the stimulus features that are
encoded by a sensory neuron. A widely used RF description that measures
the response of a neuron to pure tones of varying frequency and sound
pressure level (SPL) is the frequency-tuning curve (FTC; Schreiner and
Langner, 1988 ; Nelken et al., 1997 ; Ramachandran et al., 1999 ).
Although this descriptor continues to be important, it cannot
characterize the dynamic behavior of a neuron in response to an
arbitrary, spectrally complex, time-varying stimulus. Consequently,
secondary analyses are often used that measure the ability of a neuron
to respond to other stimulus aspects, such as the ability to
follow successively presented stimuli of different rates (Rees and
Møller, 1983 , 1987 ; Schreiner et al., 1983 ; Møller and Rees, 1986 ;
Langner and Schreiner, 1988 ; Eggermont, 1999 ; Krishna and Semple,
2000 ).
Recently, the use of reverse correlation techniques to estimate the
spectrotemporal receptive field (STRF) in the auditory system (Aersten
et al., 1980 ; Yeshurun et al., 1985 ; Eggermont, 1993 ; Nelken et
al., 1997 ; de Charms et al., 1998 ; Klein et al., 2000 ; Theunissen et
al., 2000 ; Depireux et al., 2001 ; Miller et al., 2001 , 2002 ) has
allowed scientists to overcome some of the practical limitations posed
by conventional auditory RFs and the stimuli used to derive them (e.g.,
pure tones and modulated tones). The STRF describes the
stimulus-response function of an auditory neuron along both the
spectral and temporal acoustic dimensions, to a rich stimulus ensemble,
and makes no assumptions about independence of spectral and temporal
response attributes.
Most RF methods, including the STRF procedure, operate under the
assumption that the system under investigation integrates information,
be it acoustic or visual, in an approximately linear manner. This
requires that the spiking output of a sensory neuron be described as a
linear or quasilinear function of its inputs. Although this is often a
reasonable assumption, it may not always hold. For instance, direct
STRF (referred to as spatiotemporal receptive field for visual neurons)
approaches are readily applicable for simple cells in the primary
visual cortex (Jones and Palmer, 1987 ; DeAngelis et al., 1993 ,
1999 ; Victor and Purpura, 1998 ; Anzai et al., 1999 ; Reich et al.,
2000 ) but fail for visual complex cells and neurons outside of
VI (Emerson et al., 1987 ; Szulborski and Palmer, 1990 ; Livingstone et
al., 2001 ). Other stimulus-dependent limitations are observed for
sensory neurons in acoustically specialized animals, where central
sensory neurons are often highly nonlinear and specifically tuned to
behaviorally relevant vocalizations (Suga and Jen, 1976 ; Suga, et al.,
1978 ; Margoliash, 1983 ; Doupe, 1997 ; Portfors and Wenstrup, 1999 ;
Theunissen et al., 2000 ).
Theoretically, the STRF procedure requires the use of white noise as a
probing stimulus. Practically, however, because sensory neurons in
central stations respond to a limited range of spectrotemporal (spatiotemporal) modulations and are often inhibited by white noise, it
is necessary to synthesize acoustic or visual sequences that are
optimized for any particular station (de Charms et al., 1998 ; Klein et
al., 2000 ). Often this is achieved with randomly arranged
spectrotemporal tone pips in the auditory system (de Charms et al.,
1998 ; Theunissen et al., 2000 ) and spatiotemporally interleaved bars or
spots of light in the visual system (Emerson et al., 1987 ; DeAngelis et
al., 1993 , 1999 ; Anzai et al., 1999 ; Reich et al., 2000 ).
Recently, some of the stimulus-dependent limitations associated with
such stimuli have been overcome with the use of natural sounds
(Theunissen et al., 2000 ) in the avian auditory cortex homolog.
In this study, we recorded single-unit activity from neurons in the ICC
of cats in response to dynamic spectrotemporally complex stimulus
sequences. Our synthetic stimuli, the dynamic moving ripple (DMR) and
ripple noise (RN), are designed to stringently satisfy a number of
theoretical requirements for use with the reverse correlation STRF
methods. Furthermore, these sounds share various properties with
natural sounds that allow us to overcome some of the practical
limitations of white noise, randomly interleaved tone pips, and other
synthetic reverse correlation stimuli. Compared with natural signals,
these stimuli offer the advantage that they can be parametrically
manipulated, allowing for a systematic assessment of nonlinear response
characteristics within the ICC. Our findings demonstrate the presence
of distinct spectrotemporal nonlinearities in the ICC and identify
possible mechanisms used for complex sound analysis, source
segregation, and signal detection.
 |
MATERIALS AND METHODS |
Surgical preparation
Cats were initially anesthetized with a mixture of ketamine HCl
(10 mg/kg) and acepromazine (0.28 mg/kg, i.m.). After an intravenous infusion line was inserted, a surgical state of anesthesia was induced
with ~30 mg/kg Nembutal and maintained throughout the surgery with
supplements. Body temperature was measured with a rectal probe and
maintained with a heating pad at ~37.5°C. An incision was made in
the intercartilaginous area of the trachea, and a tracheotomy tube was
inserted. After performing a craniotomy, the ICC was exposed by
removing the overlying cerebrum and part of the bony tentorium using a
dorsal approach. On completion of the surgery, the animal was
maintained in an areflexive state of anesthesia via continuous infusion
of ketamine (2-4
mg · kg 1 · h 1)
and diazepam (0.4-1
mg · kg 1 · h 1)
in lactated Ringer's solution (1-4
mg · kg 1 · h 1).
The state of the animal was monitored (heart rate, breathing rate,
temperature, and periodically checked reflexes) throughout the
experiment, and the infusion rate was adjusted according to physiological criteria. Every 12 hr, the cat received an injection of
dexamethasone (0.14 mg/kg, s.c.) to prevent brain edema and atropine to
reduce salivation (0.04 mg · kg 1 · d 1,
s.c.). All surgical methods and experiment procedures followed National
Institutes of Health and US Department of Agriculture guidelines and
were approved by the committee on animal research of the University of
California, San Francisco.
Neuronal recording
Data were obtained from n = 81 single units in
the central nucleus of the inferior colliculus of three anesthetized
cats. One or two closely spaced parylene-coated tungsten
microelectrodes (Microprobe Inc., Potomac, MD; 1-3 M at 1 kHz) were
advanced with a hydraulic microdrive (David Kopf Instruments, Tujunga, CA). Action potential traces were recorded onto a digital audiotape (CDAT16; Cygnus Technologies, Delaware Water Gap, PA) at a sampling rate of 24.0 kHz (41.7 µsec resolution) for off-line analysis. Off-line analysis consisted of digital bandpass filtering (0.3-10 kHz)
and individually spike sorting the action potential traces using a
Bayesian spike-sorting algorithm (Lewicki, 1994 ).
Electrode penetration trajectories were at ~20-30° relative to the
sagittal plane. Electrodes were initially advanced through the external
nucleus and onto the central nucleus while audiovisually determining
single neuron and multiunit characteristic frequencies (CFs).
The boundary between the external and central nucleus of the inferior
colliculus (IC) was confirmed physiologically (Merzenich and Reid,
1974 ) by a reversal or discontinuity in the CF trend and by
monotonically increasing CFs as a function of depth (over a range of
~1-20 kHz and ~1.5-5.0 mm relative to the surface of the IC),
consistent with the central nucleus. All electrode
recordings throughout the remainder of the experiment were taken from
this physiologically defined region. Except for the depth and CF
constraints, recording locations were randomly distributed within the ICC.
Acoustic stimuli
RN and DMR stimulus waveforms were designed on a digital
computer using the MATLAB (Mathworks) programming environment. The spectrotemporal envelopes shown in Figure 1C,D define the
energy modulations, in time and frequency, that are used to modulate a
bank of sinusoidal carriers of frequencies
fk. As with natural signals, the
envelope of these sounds is time-varying and probes spectral and
temporal neuronal response preferences. Furthermore, analogous to
various classes of natural signals (Fig. 1A,B), these sounds have unique short-term statistics (Fig. 2D,E)
and yet their long-term statistics are identical (Fig.
2D,E, far right; see Stimulus correlation
statistics). Therefore, both sounds satisfy the necessary requirement
for use with the reverse correlation procedure that we use to estimate
auditory spectrotemporal receptive fields (see Spectrotemporal
receptive field).
Dynamic moving ripple envelope. The DMR envelope is designed
as a dynamic sinusoidal grating on a octave frequency and decibel amplitude axis. Two parameters defined the DMR envelope: the
instantaneous ripple density, (t), defines the number of
spectral peaks per octave at a given time instant; and
Fm(t) defines the
instantaneous modulation rate. The DMR spectrotemporal envelope is
expressed as:
|
(1)
|
where M = 30 or 45 is the modulation depth of
the envelope in decibels, Xk = log2(fk/f1)
is the octave frequency axis relative to the lowest stimulus frequency
(f1 = 500 Hz), and
(t) =  Fm( )d controls the time-varying temporal modulation rate,
Fm(t). Spectral [ (t)] and temporal
[Fm(t)] parameters are
independent and slowly time-varying random processes (maximum rates of
change, 1.5 Hz for Fm and 3.0 Hz for
). The time rate of change of both parameters was heuristically
chosen so that they coincide with the observed range of values for
similar acoustic features in speech and vocalizations (Greenberg,
1998 ). To guarantee that the stimulus space was covered in a
statistically unbiased manner, both parameters were designed with
uniformly (flat) distributed amplitudes in the intervals 0-4 cycles
per octave for and 350 to +350 Hz for
Fm.
The time-varying stimulus parameters were generated in the MATLAB
programming environment. First, the parameters were generated as a
random sequence of normally distributed samples (randn function in
MATLAB) using a sampling rate of 3 Hz for
Fm(t) and 6 Hz for (t). These sequences had maximum frequency contents of
1.5 and 3 Hz, respectively (because the maximum signal frequency is
half of the sampling frequency). To generate the acoustic sound
waveforms at a sampling rate of 44.1 kHz (Eq. 4) it was necessary to
resample both of the parameter signals to an equivalent sampling rate. Therefore, we upsampled both signals to 44.1 kHz using a cubic interpolation procedure (interp1 function in MATLAB; "cubic"
option; upsampling factor, 14,700 for modulation rate and 7350 for
ripple density). Next we needed to convert the parameter amplitudes
from a normal to a uniform distribution so that the probability of occurrence of each parameter is statistically unbiased within the
selected intervals. This normalization was performed with the error
function:
This function converts normally distributed amplitudes to
uniformly distributed amplitudes over the interval 1 to +1 and a
subsequent linear rescaling of the amplitudes to the selected interval.
This operation had only a subtle effect on the spectrum of these
signals and is necessary to guarantee that the signal parameters are
statistically unbiased (flat distribution) within their predefined range.
Ripple noise envelope. The RN envelope is first
generated as a linear superposition of L = 16 independently chosen DMR envelopes, SDMRl(t,
Xk):
|
(2)
|
where the sum is normalized so that the SDs of the RN and DMR
are identical. Although this guaranteed that the average contrast of
the DMR and RN envelopes be the same (i.e., identical SD), the RN
amplitude distribution had long tails and resembled a Gaussian distribution, whereas the DMR envelope is approximately uniform and
confined to the interval [ M/2, M/2].
Instances at the high- and low-intensity tails of the distribution of
the RN envelope can therefore potentially activate undesirable
intensity- and contrast-dependent nonlinearities. We overcame this
possibility by compressing the RN envelope so that its amplitude
statistics resemble those of the DMR. The compressed RN envelope is
given by:
|
(3)
|
where f(x) = M/2 · erf(x/ DMR) and
erf(·) is the error function. This envelope covers a
relative intensity range of [ M/2, M/2] dB
as for the DMR envelope. This procedure allows us to isolate
spectrotemporal nonlinearities from intensity- or contrast-dependent ones. A second concern was that the erf(·) function
significantly distorts the RN envelope by introducing high-frequency
envelope modulation components, and this in turn could compromise
experimental results. We found, both analytically (data not shown) and
through simulation, that the ripple spectrum and spectrotemporal
autocorrelation (see Fig. 2D,E, far right;
shown for compressed RN) of the uncompressed and compressed RN were in
close agreement (2.1% rms error for both the ripple spectrum and autocorrelation).
Acoustic waveform. From the DMR and RN spectrotemporal
envelopes, the acoustic sound pressure waveforms,
s(t), are constructed by modulating
L = 230 sinusoidal carriers that are added
together:
|
(4)
|
where k is a randomly chosen phase
(0-2 ), which gives s(t) a noise-like
character, and SLin(t,
Xk) is a transformed version of the
DMR or RN envelopes that describes the amplitude modulations in linear
amplitude units. The linear envelope is bounded between 10 M/20 and 1. It is related
to the decibel envelopes by (here we use SdB in place of
SDMR and
SRN):
|
(5)
|
Frequency carriers are geometrically spaced at a resolution of
43 carriers per octave: fk = · fk 1 ( = 1.01617) over a
range of 5.32 octaves (500-20,000 Hz). Although the resultant power
spectrum is not flat, this guarantees that the primary sensory
epithelium is uniformly excited and equal energy is provided per unit octave.
Sound presentation. All recordings were made with the animal
in a sound-shielded chamber (IAC, Bronx, NY), with stimuli delivered via a closed, binaural speaker system (electrostatic diaphragms from
Stax). Single neurons or clusters of neurons were initially isolated
audiovisually by presenting pure tones, white noise, or both. FTCs were
derived in two of the three experiments with a pseudorandom sequence of
pure tones presented at 15 intensities and 45 geometrically spaced
frequencies. In one experiment, rate-level functions were measured
with the RN stimulus as a function of SPL and contrast. After these
initial tests, DMR and RN stimuli were presented binaurally with an
independent sound sequence for each ear. The DMR stimulus was presented
for 10-20 min, followed by 10-18 min (full length presented for
~95% of the recording sites; identical stimuli for all experiments)
of the RN at ~30-70 dB/carrier greater than the neuron response
threshold (as determined by the FTC or rate-level functions). Because
the RN and DMR stimuli are each composed of 230 sinusoid carriers, the
effective SPLs were 10 · log10(230) = 23.6 dB greater than these values (i.e., ~53-93 dB greater than
threshold; SPL range, 75 ± 19 dB SPL, 64 ± 19 dB/one-third
octave, or 51 ± 19 dB/carrier). Both RN and DMR were presented at
identical intensities and contrast so that they covered an identical
range of amplitudes and fall well within the intensity response area of
the neuron. Sixteen neurons were also tested with a short 5 sec segment
of the DMR and RN that was presented 40 consecutive times. This was
used to construct response rastergrams for each stimulus (see Fig. 10).
Finally, for six neurons that did not respond to the RN, the DMR
stimulus was again presented at the end of the recording session to
verify that the given neurons were still responsive and to verify the stability of the electrode placement.
Stimulus correlation statistics
The long-term and instantaneous spectrotemporal correlation
statistics of the RN and DMR stimulus constitute an essential aspect of
the stimulus design and the experimental approach. These were evaluated
in closed form and rigorously tested via simulation. Only a brief
account is provided.
A spectrotemporal Gaussian window,
wi(t, X),
of SD x = 0.5 octaves and
t = 5, 10, or 20 msec and centroid about
t = ti was used to
localize the RN or DMR spectrotemporal envelope,
S(t, X). The instantaneous
spectrotemporal autocorrelation function was obtained by evaluating the
localized autocorrelation:
|
(6)
|
where the expectation operator, E[·], is taken
with respect to time, t, and the spectral distance variable,
X [Eqs. 1, 3 are substituted for S(t,
X)]. The variable
ti corresponds to the time instant
when the autocorrelation is evaluated, and and correspond to
the temporal lag and spectral displacement, respectively.
In closed form the solutions for the RN and DMR are given by:
|
(7)
|
|
(8)
|
where  = M2/8 and  = M2/12 are the variance of the
DMR and RN, respectively, and Rww( , ) is the autocorrelation function of the Gaussian window (which is
itself a Gaussian window of SD 2 x and
2 t). The parameters i = (ti)
and Fm,i = Fm(ti) are the
instantaneous DMR parameters evaluated at
ti. Because the stimulus parameters
dynamically vary with time at a nominal rate of 3 and 1.5 Hz (Fig.
2A,B), the DMR instantaneous spectrotemporal
autocorrelation likewise varies with time (Fig. 2D).
Accordingly, its spectrotemporal envelope is nonstationary at these
time scales. The term e( , ) is a spectrotemporal noise
term, and the parameters Max = 4 cycles per
octave and FMax =350 Hz are the
maximum ripple parameters.
The long-term autocorrelation for both sounds was obtained by
performing a time average of the instantaneous autocorrelation: RSS ( , ) = E[RSS ( ,
|ti)] (E[·] is
now evaluated with respect to ti). The
autocorrelation is identical in form for both sounds:
|
(9)
|
The autocorrelations only differ in the SD by a multiplicative
factor of 20% (RN, S = RN = M 12 dB; DMR,
S = DMR = M 8 dB).
Spectrotemporal receptive field
STRFs are computed by averaging the pre-event spectrotemporal
envelope. For a sequence of N neural events at times,
tn (sampled at 41.7 µsec
resolution), contralateral and ipsilateral STRFs are obtained as [here
we use SdB(t,
Xk) in place of
SDMR(t,
Xk) or
SRN(t,
Xk)]:
|
(10)
|
where T is the experimental recording time in
seconds, is the temporal delay of the stimulus relative to the
neural event time (0-100 msec), and  is the
variance of the decibel spectrotemporal envelope for the DMR or RN.
During the DMR and RN stimulus presentation, independent sound
sequences were binaurally presented to each animal. This allowed us to
independently estimate the contralateral and ipsilateral STRFs by
replacing the contralateral and ipsilateral spectrotemporal envelopes
into Equation 10 (Marmarelis and Naka, 1974 ).
Stimulus envelopes were sampled at 4.0 kilosamples/sec (temporal) and
43 samples per octave (spectral). The STRF is formally given in units
of spikes per second per decibel. We use a rate-normalized version of
the STRF, STRFr( ,
Xk) = s
· STRF( , Xk), which corresponds to the average driven output produced at time 0, in units of spikes per
second, for the average differential stimulus (decibels) presented within the receptive field of the neuron.
Statistically significant STRF
We devised a procedure for measuring the statistically
significant STRF by considering a null condition in which N
randomly chosen spikes are put through Equation 10. This procedure
consists of adding random sound waveforms to construct a control STRF
from which statistical significance can be determined. Solutions for this procedure were derived analytically in closed form (data not
shown). The distribution of amplitudes for the control STRF quickly
approached a normal distribution (with as little as N = 50 spikes). Therefore, a simplification was made in which we determined
the two-tailed probability of exceeding a threshold relative to the
control STRF under the assumption of a normal distribution. The
statistically significant portion of the STRF (p < 0.002) is obtained by keeping all values of the STRF that exceed
3.09 SD of the control noise STRF and setting all other values to 0. Analytically this is expressed as | · T · STRF( , )/ N| > 3.09 · s. No smoothing was performed before or after
thresholding. This procedure was tested against the analytically derived solutions, and we found that actual significance values were
always slightly smaller (e.g., actual significance value of
p < 0.0019 for N = 50).
To determine relative significance of STRFs, on an equal spike basis,
we further evaluated significance by recomputing all STRFs using 100 action potentials and determining all pixel values that exceeded the
p < 0.002 confidence intervals. For these pixel values, the average and maximum signal-to-noise ratio (SNR) was computed. Average and peak SNRs were computed as:
and
where 100 is the SD of the noise
control STRF derived for 100 random spikes. Thus, for any given pixel,
the SNR determines the number of SDs by which STRF pixels stand out
above the noise.
Null hypothesis
Response nonlinearities are tested against the expected results
for an ideal linear model neuron. Given that the long-term spectrotemporal autocorrelation functions for the DMR and RN are identical, it follows that for a purely linear neuron
STRFDMR = STRFRN (for
proof, see Appendix A). Significant differences between the RN and DMR
STRFs can be attributed to response nonlinearities. To quantify
response differences, we use the statistically significant portion of
the STRFs and use this to compute a number of response metrics for the
DMR and RN: similarity index, rate and magnitude disparity index, and
the phase-locking index (see below).
Quantifying DMR and RN response differences
Neural responses for DMR and RN were compared in three
complementary ways. First the STRF similarity index (SI; DeAngelis et
al., 1999 ; Reich et al., 2000 ) was used to quantify shape
differences between STRFDMR and
STRFRN. Using the STRF pixel values that exceeded the statistical significance threshold of p < 0.002 for either condition, we treated the STRFs as vectors (including
significant contralateral and ipsilateral pixels). The vectorized RFs
were then used to evaluate the similarity index:
|
(11)
|
where RFDMR and
RFRN are the significant STRFs,
·, · is the vector inner product, and · designates
the vector norm operator. The SI is numerically identical to the
Pearson correlation coefficient.
We devised two metrics to evaluate differences in firing rate and
driven activity independently of STRF shape. First we computed the rate
disparity index (RDI):
|
(12)
|
where s = sign(rDMR rRN), and the mean spike rates for
each condition are rDMR and
rRN. The magnitude of the RDI is
numerically equivalent to the percent change in firing rate between DMR
and RN. Its sign tells us which condition, DMR or RN, had a higher firing rate (+, DMR; , RN). To quantify differences in driven activity, we used a third metric, the magnitude disparity index (MDI).
The MDI is identical in form to the RDI, where the mean firing rates,
rDMR and
rRN, are replaced by the
rate-normalized STRF energies, EDMR
and ERN, for the corresponding
conditions. Here the STRF energy is computed as:
|
(13)
|
Because the response of the neuron could be fractionally
distributed between the contralateral and ipsilateral ears, the energy
of the contra- and ipsi-STRFs was measured independently, and the
cumulative sum was taken as:
where Ec and
Ei are the contra- and ipsi-STRF
energies. The STRF energy measures phase-locked activity (units of
spikes per second) and is equivalent to the average phase-locked output
for a linear integrating neuron (for proof, see Appendix B).
Phase-locking index
The phase-locking index (PLI) quantifies the ability of a neuron
to phase lock to the spectrotemporal envelope. This metric is obtained
by dividing the peak-to-peak STRF amplitude (in spikes per second) by
the mean spike rate, r:
|
(14)
|
and normalizing this quantity by a theoretically derived factor,
, that corresponds to the theoretical maximum peak-to-peak rate-normalized STRF amplitude (confining this index to the range of
0-1). For the DMR, = 8, and for the RN,
= 12 (for proof, see Appendix C).
Frequency domain analysis: ripple transfer function and conditioned
response histogram
As an alternative to the STRF, we further evaluated neuronal
response preferences to DMR and RN in the frequency domain. These approaches are useful, because they can be used to quantify neural responses as a function of ripple frequency and temporal modulation rate parameters.
The ripple transfer function (RTF) is one such descriptor. It is
obtained directly from the STRF by performing a two-dimensional Fourier
transform on the statistically significant STRF
(p < 0.002), discarding the phase, and keeping
the magnitude (see Fig. 5A,B). From the RTF, the best ripple
density and best modulation rate parameters were determined for all
phase-locking neurons. These are chosen by the location in the
magnitude response with the peak amplitude. In instances in which two
responses are observed (for negative and positive modulation rates),
the secondary response was selected only if its response magnitude
exceeded 50% of the maximum response magnitude. Positive (negative)
modulation rates designate downward (upward)-going stimulus features;
however, because the STRF is a time-reversed version of best stimulus
of the neuron, this convention is flipped for the neuron and its RTF
(positive, upward sweep; negative, downward sweep).
Although this approach was successfully applied for many neurons, other
neurons did not show statistically significant STRFs; therefore, it was
impossible to estimate their RTFs directly. We therefore approximate
the probability distribution function of observing a given set of
parameters given a spike at time tn, P(Fm,
|tn), by performing a
spike-triggered average with respect to the time-varying DMR
parameters, (t) and
Fm(t):
|
(15)
|
where Pkl is the discrete
version of P(Fm,
|tn), and I[·] is the
identity function. The identity function takes a value of unity
whenever the condition inside its argument is satisfied. Otherwise, it
assumes a value of 0. Thus for any given bin of Pkl, this conditioned response
histogram (CRH) is incremented by +1 if and only if the instantaneous
parameters,
Fm(tn)
and (tn), fall within the required
intervals, k Fm Fm(tn) (k + 1) Fm and
l (tn) (l + 1) , at the time of the neuronal spike,
tn (see Fig. 5C,D). Bin
width resolutions of Fm = 15-35 Hz
and  = 0.2-0.4 cycles per octave were used. The exact
position used to estimate the parameters relative to the neuronal spike time, tn, did not alter the resulting
histogram (tested for a time lag of 0-50 msec), because the
parameters vary at a slow rate (1.5 and 3 Hz) compared with the
integration time of ICC neurons (usually tens of milliseconds).
As for single units, it was also useful to characterize population
responses in the frequency domain, and we therefore extended these
methods to include population statistics. By averaging the RTFs of
individual neurons, we estimated the population ripple transfer
function (pRTF) for those neurons with significant STRFs. To avoid
biasing the pRTF because of systematic differences in firing strength,
the RTFs of individual neurons were equally weighted so that the
cumulative area of each was exactly 1.
For neurons that did not produce statistically significant STRFs, a
modified approach was applied. We normalized the CRH of each neuron so
that its cumulative sum was exactly 1. An average was then taken over
the entire population, thereby producing the "population" CRH
(pCRH). To facilitate comparisons, the pCRH was interpolated using the
interp2 function (spline option) in MATLAB to identical resolution as
for pRTF.
 |
RESULTS |
We studied 81 single neurons with the intent of understanding how
dynamic spectrotemporal signals are processed within the central
nucleus of the inferior colliculus. Specifically, we address whether
single neurons integrate spectrotemporal information according to a
linear integration model and whether dynamic stimulus aspects significantly affect neuronal encoding. Our complex stimuli constitute an integral part of the experimental protocol, and we fully
characterize several pertinent properties of the stimulus ensembles. By
design, both test sounds have identical average statistics and,
therefore, equally sample the relevant spectrotemporal stimulus
dimensions for this study. As a first-order test of evaluating
spectrotemporal response nonlinearities, we compute and compare the
spectrotemporal receptive field for each sound type. We also
characterize higher-order response attributes that are not directly
accessible with the STRF descriptor.
Stimulus statistics: average versus dynamic spectrotemporal
characteristics of the dynamic moving ripple and ripple noise
To test the possibility that individual auditory neurons in the
ICC are selective for structural features prevalent in natural sounds
(Fig. 1A,B), complex
broadband stimuli (Fig. 1C,D) were designed that allow us to
systematically identify nonlinear processing capabilities of auditory
neurons. These stimuli fulfill a number of theoretical and ecological
constraints: first, both sounds were designed to stringently meet a
number of necessary requirements for use with the STRF. Second, both
sounds incorporate a number of pertinent acoustic stimulus attributes
that are prevalent in various natural signals [e.g., spectral energy
peaks, frequency modulation (FM) sweeps, and temporal modulations] and
that determine important perceptual qualities (Plomp, 1970 , 1983 ; Van
Veen and Houtgast, 1983 ).

View larger version (110K):
[in this window]
[in a new window]
|
Figure 1.
Synthetic sound sequence used for reverse
correlation analysis (C, D) and some corresponding
natural sound counterparts (A, kitten vocalizations;
B, babbling brook). The DMR (C) is
designed to mimic spectral profiles created by formants (spectral
energy peaks) and temporal modulations in speech production and animal
vocalizations. The ripple density parameter,
(t), corresponds to the number of energy peaks
(cycles per octave) along the spectral axis at time t.
The temporal modulation rate,
Fm(t), describes the
repetition rate of the envelope in hertz. The second stimulus, the RN
(D), has noise-like properties that uniformly
cover the ripple dimensions. The DMR and RN are shown for a maximum
temporal modulation rate of 70 Hz, although a value of 350 Hz was used
for the experiments.
|
|
The DMR stimulus (Fig. 1C) is an extension of the rippled
spectrum noise used to characterize spectral and temporal response properties in the ferret and cat auditory cortex (Schreiner and Calhoun, 1994 ; Kowalski et al., 1996 ; Klein et al., 2000 ). This sound
is constructed so that its spectrotemporal envelope is dynamic and
coherently modulated ("structured") in time and frequency. As for
speech and animal vocalizations (Fig. 1A), the DMR
has strong short-time spectrotemporal correlations. These are
determined by two independent parameters that vary randomly in time:
the temporal modulation rate,
Fm(t), and ripple density,
(t) (see Materials and Methods; Figs. 1C, and
2). The temporal modulation parameter
determines the number of onsets and offsets per unit time (units of
hertz) (Fig. 1C, top right). At any given time, the DMR sound produces a sinusoidal energy excitation pattern along the
sensory epithelium, where the number of peaks per octave frequency is
determined by the ripple density at that instant (Fig. 1C,
top right). To efficiently excite neurons in the range characteristic for vocalizations, these parameters continuously vary at
a nominal rate of 3 Hz (ripple density) (Fig. 2A) and 1.5 Hz (temporal modulation rate) (Fig. 2B) (in
speech, for instance, similar features change at a rate of ~2-8 Hz;
Greenberg, 1998 ).

View larger version (65K):
[in this window]
[in a new window]
|
Figure 2.
Stimulus dynamics and spectrotemporal correlation
statistics of the DMR and RN. The DMR parameter trajectories
(t) (A; ripple density, 0-4
cycles per octave) and
Fm(t)
(B; modulation rate, 350-350 Hz) are shown for a
short 15 sec segment. The spectrotemporal parameters efficiently cover
the ripple space (C; shown for the 15 sec segments of
A, B). The instantaneous correlation function of the DMR
(D) and RN (E) are shown
for three distinct time instants,
t1-t3
[D; left to right,
(t1) = 1 cycle per octave;
Fm(t1) = 0 Hz; (t2) = 2 cycles per
octave;
Fm(t2) = 150 Hz; (t3) = 0.15 cycles
per octave;
Fm(t3) = 60 Hz]. The RN instantaneous correlation function consists of a
narrow central peak and a noisy surround (E). The
global autocorrelation is identical for both sounds, consisting of an
impulse-like central peak of width 3 msec and one-fourth octave
(D, E, far right).
|
|
By averaging 16 independently chosen DMR envelopes, we designed a
second stimulus, the RN. This sound is locally weakly correlated ("unstructured"), resembling background and environmental noises such as wind and rain (Fig. 1B). Visually, its
spectrotemporal envelope (Fig. 1D) has a noisy
profile both along time and along the spectral axis and lacks coherent
modulations as present in the DMR and many vocalization sounds (Voss
and Clarke, 1975 ; Attias and Schreiner, 1998 ; Nelken et al., 1999 ;
Theunissen et al., 2000 ).
To characterize and compare the instantaneous versus the average
behavior of these stimuli and their suitability for the reverse correlation method, the spectrotemporal autocorrelation function was
evaluated for each stimulus. Dynamic properties were evaluated over
short intervals of 10, 20, and 40 msec, which are comparable with
integration times for ICC neurons. Global correlation statistics were
evaluated for the ensemble as a whole (consisting of a 20 min
continuous sound segment; see Materials and Methods). Both the local
(shown for 10 msec analysis interval) and global spectrotemporal autocorrelations are depicted in Figure 2D,E.
The local autocorrelation depicts the spectrotemporal modulations that
are present at a given time instant over a 10 msec segment. For the DMR
stimulus, these take the form of tapered oscillations at a
characteristic ripple density, modulation rate, and frequency sweep
direction (Fig. 2D). Comparing the DMR and RN, it is
clear that the local stimulus statistics are markedly different.
Although the DMR has strong local correlations over the defined 10 msec
intervals, the RN lacks any definitive spectral and temporal
oscillations (Fig. 2E). Accordingly, its local
autocorrelation is qualitatively similar at all time instants,
consisting of a narrow central peak with a noisy surround. Therefore,
the RN appears to be stationary or locally time-invariant. By
comparison, the DMR has local envelope statistics that are dynamic;
that is, they continuously vary with time.
By averaging the instantaneous autocorrelation function over all 10 msec time instants, it is possible to characterize the average
statistics for the DMR and RN stimulus ensembles, which are identical
(Fig. 2D,E, far right). In both cases, the
average spectrotemporal autocorrelation assumes a narrow impulse-like character, which is the essential requirement for deriving receptive fields with the reverse correlation method (Eggermont, 1993 ; Klein et
al., 2000 ).
Linear spectrotemporal receptive fields for DMR and RN
Neuronal data were evaluated by computing the STRF for neurons in
the ICC and comparing neuronal responses to the spectrotemporally structured (DMR) and unstructured (RN) sounds. The STRF is a
mathematical construct that describes the integrating area of the
neuron along time and along the sensory epithelium (i.e., the frequency
axis) and that depicts the spectrotemporal arrangement of neuronal
excitation (red domains) and inhibition (blue domains). Figure
3 illustrates the spike-triggered average
procedure we use to derive STRFs in response to DMR and RN. The STRF
procedure requires that the probing stimulus have an unbiased
modulation spectrum (both in time and along the sensory epithelium) or,
equivalently, an impulsive spectrotemporal autocorrelation function
that fully covers the physiologically relevant limits. Both the RN and
DMR were designed with this constraint in mind; by limiting the
temporal modulation rate to 350 Hz and the ripple density to 4 cycles
per octave, we should be able to characterize 90-95% of the neurons
in the ICC (Langner and Schreiner, 1988 ; Krishna and Semple, 2000 )
without biasing their STRFs.

View larger version (65K):
[in this window]
[in a new window]
|
Figure 3.
Spike-triggered average and the STRF. At each
instant of an action potential, the pre-event sound segment (up to 100 msec before spiking) is extracted and averaged for the entire stimulus
ensemble. Red regions indicate stimulus patterns that
were likely to be present whenever a neural response occurred at delay
of 0. Blue indicates stimulus patterns that tended to be
off at a moment before spike initiation. Functionally, these are
interpreted as excitation (red) and inhibition
(blue).
|
|
This essential property, which makes the RN and DMR stimuli suitable
for reverse correlation, also permits the identification of
spectrotemporal response nonlinearities. Given that both stimuli have
identical low-order statistics (matched in intensity, contrast, and
average envelope modulations), it is expected that a linear integrating
neuron would have an average neural response that is similar for the RN
and DMR conditions. That is, because both the RN and DMR stringently
satisfy the necessary requirements for reverse correlation, we expect
that STRFDMR = STRFRN if
the neuron behaves as a linear integrator (see Materials and Methods and Appendix B for proof). By comparing DMR and RN responses, we find
that 60% (n = 49) of the neurons in our ICC sample met this requirement (Fig. 4). For reference,
pure tone FTCs are shown alongside the RN and DMR STRFs when available
(Fig. 4A,D). A red bar designates the mean
sound pressure level (per one-third octave) for DMR and RN.

View larger version (77K):
[in this window]
[in a new window]
|
Figure 4.
Spectrotemporal receptive fields of neurons that
responded to DMR and RN. Neurons were tested with pure tones (A,
D, left column), DMR (B, E, G, I,
middle column), and the RN (C, F, H, J,
right column) stimuli (individual neurons are shown by
row). Frequency-tuning curves depict the frequency
versus intensity response area of a neuron (A, D). The
red horizontal line designates the mean stimulus level
(per one-third octave) used for RN and DMR. STRFs have similar shapes
(similarity index: B, C, 0.94; E, F,
0.76; G, H, 0.77; I, J, 0.7) and strength
(magnitude disparity index: B, C, 13%; E,
F, 178%; G, H, 74%; I, J, 4%;
rate disparity index: B, C, 6%; E, F,
35%; G, H, 24%; I, J, 53%). To
facilitate comparisons, STRFs are shown on identical color scales for
RN and DMR. STRFs for each neuron are drawn on individually chosen
spectral and temporal scales. Significant patterns of the STRF are
denoted by red contours (p < 0.002 contour).
|
|
Neurons in our sample showed a variety of preference to stimulus
patterns in the DMR and RN, including suppressive side bands, obliquely
oriented excitatory or inhibitory regions, and distinct temporal
response profiles (e.g., on-off, off-on, and off-on-off). Typically, excitatory and inhibitory STRF features were consistent between DMR and RN, although in some cases, inhibitory features were
less pronounced for the RN (Fig. 4E,F). DMR
and RN firing rates were generally high (mean spike rate, 11.2 spikes/sec for DMR and 11.8 spikes/sec for RN) and significantly
correlated [correlation coefficient, 0.85 ± 0.08 (mean ± SE)] for this subset of neurons. Likewise, all neurons had
comparable STRF energies. The neuron of Figure 4B,C,
for instance, had a spike rate of 34.0 spikes/sec for the DMR and 36.2 spikes/sec for the RN (difference, 6%) and comparable STRF energies
(EDMR, 2.6 spikes/sec; ERN,
3.0 spikes/sec; difference, 13%). The presence of well defined,
statistically significant STRFs (p < 0.002) for
both DMR and RN indicates that neurons efficiently phase locked to the
stimulus spectrotemporal envelope. To distinguish these functional
properties from those of other neurons in our sample, we refer to these
as type I responses.
Frequency domain RF analysis
Complementary to the STRF, we also evaluated neuronal data in the
frequency domain to extract physiologically meaningful parameters from
the STRF and to describe neuronal preferences in terms of low-pass and
bandpass filtering (Depireux et al., 2001 ; Klein et al., 2000 ).
First, we converted the STRF to an RTF (Fig.
5A,B). The RTF maps a the
preferences of a neuron as a function of the temporal (modulation rate)
and spectral (ripple density) stimulus parameters (see Materials and
Methods). Whether a neuron integrates spectral or temporal information
in a low-pass or bandpass manner depends strongly on the
spectrotemporal relationship between neural excitation and inhibition
in its STRF. For instance, the neuron of Figure 4B,C,
has an on-off temporal response pattern; therefore, its RTF resembles
a bandpass filter along the temporal modulation axis (Fig.
6A) that is centered at
a best temporal modulation rate (bTM) of 45 Hz. Likewise, along the
spectral axis, this neuron has a weak but significant inhibitory region
alongside an excitatory region. Therefore its response as a function of
ripple density also has a bandpass response profile with the dominant
response peak centered at a best ripple density (bRD) of 0.6 cycles per octave. Neurons that lack interleaved patterns of excitatory (on) and
inhibitory (off) subfields in their STRFs generally have low-pass response characteristics (Fig. 4I,J)
along the spectral and temporal dimensions. The STRF of this example is
marked by an off-on-off temporal response pattern, but its spectral
STRF patterns lack interleaved excitatory and inhibitory subfields.
Accordingly, its RTF (Fig. 6B) shows a bandpass
response pattern in time (bTM, 200 Hz) and a low-pass response pattern
along the spectral axis (bRD, 0 cycles per octave).

View larger version (81K):
[in this window]
[in a new window]
|
Figure 5.
Frequency domain response analysis. The auditory
STRF (A; shown for RN) is used to compute the RTF
(B; shown for RN) by applying a two-dimensional Fourier
transform. The RTF depicts time-locked energy in the neural response as
a function of temporal modulation rate,
Fm, and ripple density, .
Red indicates parameter combinations that evoked a
strong time-locked response, and blue indicates a weak
response. The CRH (D) characterizes nonlinear
neuronal responses that do not show up in the STRF. For each neural
event, the spectral and temporal DMR parameters,
(tk) and
Fm(tk),
are determined at the time instance of the neural spike,
tk. The values of and
Fm are then used to increment the
corresponding bin in the histogram by +1
(D).
|
|

View larger version (73K):
[in this window]
[in a new window]
|
Figure 6.
Neuronal preferences determined with the RTF
(left column) and CRH (right column)
shown for the neurons of Figures 4B,I and
7J,G (A-D, respectively). The RTF and
CRH depict the spectrotemporal frequency combinations (modulation
frequency and ripple density) that preferentially activate a neuron.
These can show either a low-pass or bandpass tuning profile along the
temporal modulation or ripple density axis. Generally, neuronal tuning
is similar for the RTF and CRH.
|
|
In a second related approach, a CRH was used to evaluate neuronal
selectivity by tabulating the number of action potentials as a function
of ripple parameters (see Materials and Methods). Unlike the STRF and
RTF, this method accumulates the stimulus parameters, as opposed to the
averaging stimulus waveforms, and is therefore insensitive to spike
timing jitter. Figure 5C,D illustrates this approach.
Generally, we find that RTF and CRH are in close agreement (Fig. 6).
However, the CRH also reflects nonspecific activity, that is, action
potentials that fall outside the dominant RTF boundaries and presumably
do not contribute to the construction of the STRF (Fig.
6A,B).
Nonlinear spectrotemporal receptive fields for DMR and RN
One question addressed in this study is whether ICC neurons
require specific acoustic features to be efficiently activated and
whether these features can be identified using the STRF method. One
reason why it may be difficult to identify the preferred acoustic features of a neuron using a direct approach is because conventional reverse correlation stimuli (such as the RN or spectrotemporal tone
pips) seldom contain isolated sound patterns during a typical recording
period. As an example, the DMR stimulus has pronounced energy peaks and
FM sweeps that appear in isolation in its spectrotemporal envelope
(Fig. 1C). These same features are much more subtle in the
RN (Fig. 1D), because they are superimposed with
other components. How do such stimulus characteristics affect the
ability of a neuron to respond, and which of these stimuli is better
suited for identifying neuronal preferences in central auditory
stations? Presumably, if a neuron exhibits substantial nonlinearities,
significant differences could be expected between DMR and RN.
Not all studied neurons responded equally well to the DMR and RN. A
small but significant (14%; n = 11) subset of neurons responded selectively to the DMR stimulus (Fig.
7; type II neurons). In
general, type II neurons had low firing rates to the DMR and little or
no response to the RN. Average firing rates for either stimuli were
significantly lower than for type I responders (mean DMR, 0.61 spikes/sec; t test, p < 0.003; mean RN,
0.13 spikes/sec; t test, p < 0.0025).
Surprisingly, despite the low spike rates, STRFs derived with the DMR
were highly significant (p 0.002) and
exceptionally clean.

View larger version (77K):
[in this window]
[in a new window]
|
Figure 7.
Spectrotemporal receptive fields of
neurons that responded specifically to the DMR sound (B, D, G,
J, middle column) but responded weakly or had no
response to the RN (C, E, H, K, right
column). Frequency-tuning curves derived with pure tones are
shown for reference (A, F, I, left
column). Red lines designate the mean stimulus
level (per one-third octave) used for DMR and RN. Significant STRF
patterns are denoted by red contours. All neurons are
shown at distinct spectral and temporal scales.
|
|
Figure 7 depicts typical responses for these neurons. Some neurons
(Fig. 7B-E) responded to both the DMR (0.24 and 1.4 spikes/sec) and the RN (0.14 and 0.2 spikes/sec) sounds, although their
DMR firing rate was significantly stronger. DMR STRFs were highly significant (p < 0.002), with well defined
excitatory and inhibitory subfields. However, the RN STRFs of these
type II neurons were weak, with no distinguishable boundaries and
excitatory and inhibitory subregions. Furthermore, the DMR STRF energy
was 725% (Fig. 7B,C; EDMR, 0.100 spikes/sec; ERN, 0.012 spikes/sec) and 1280%
(Fig. 7D,E; EDMR, 0.276 spikes/sec;
ERN, 0.020) stronger, respectively, than for RN.
Although these neurons did respond weakly to RN, other neurons
responded exclusively to the DMR (Fig. 7G,H,J,K). Again, these neurons had extremely low spike rates (0.45 and 0.11 spikes/sec, respectively) to the DMR and no response to the RN (0 spikes). These STRFs were constructed using 276 (Fig. 7G)
and 139 (Fig. 7J) spikes for the DMR over a 10 and 20 min recording period, respectively. Nevertheless, their STRFs are as
noise-free as those of type I responders that typically had thousands
to tens of thousands of action potentials.
Interestingly, response characteristics for type II neurons are
consistent with the idea that they are highly selective for some of the
DMR stimulus features. The fact that we can compute highly significant
DMR STRFs from very few spikes further suggests that the acoustic
features leading to spike initiation must be precisely aligned in time
and frequency; otherwise, STRFs would not accurately build up. To
determine whether this is so, we recomputed all DMR STRFs using a
subset of 100 randomly chosen action potentials for each neuron and
determined the mean and maximum SNRs of those pixel values that
exceeded a significance criterion of p < 0.002 (see
Materials and Methods). The SNR of these conditioned DMR STRFs was
approximately twice as strong for type II neurons (average maximum SNR,
8.7 for type II vs 4.0 for type I; paired t test, p < 3.5 × 10 5;
average mean SNR, 4.7 for type II vs 2.8 for type I; paired t test p < 0.003). This suggests that the
spectrotemporal waveforms added to compute the STRF are more consistent
from spike to spike for type II neurons compared with type I neurons.
Consequently, type II neurons are highly sensitive for particular
stimulus features in the DMR stimulus, resulting in exceptionally clean
STRFs that can be obtained with very few action potentials. Response
specificity is also reflected in the CRH for these neurons. Compared
with type I responses, CRHs for type II responses show highly localized peaks (Fig. 6C,D, far right) and lack nonspecific
activity. Together, the low firing rates, high response specificity to
the DMR, and unresponsiveness to RN demonstrate that these neurons are
extremely nonlinear and highly selective for isolated spectrotemporal
sound patterns.
It may be argued that the seemingly low spike rates and sparse
responses of these neurons are simply attributed to stimulus levels
near or below the response threshold of the neuron. We tested for this
possibility in 6 of the 11 neurons by computing FTCs with pure tones
(Fig. 7A,F,I). The FTCs are shown alongside |