 |
Previous Article | Next Article 
The Journal of Neuroscience, March 15, 1998, 18(6):2283-2300
Feature Extraction by Burst-Like Spike Patterns in Multiple
Sensory Maps
W.
Metzner1,
C.
Koch2,
R.
Wessel3, and
F.
Gabbiani2
1 Department of Biology, University of California at
Riverside, Riverside, California 92521-0427, 2 Computation
and Neural Systems Program, Division of Biology, 139-74, California
Institute of Technology, Pasadena, California 91125, and
3 Department of Physics, University of California at San
Diego, La Jolla, California 92093-0319
 |
ABSTRACT |
In most sensory systems, higher order central neurons extract those
stimulus features from the sensory periphery that are behaviorally
relevant (e.g., Marr, 1982 ; Heiligenberg, 1991 ). Recent studies have
quantified the time-varying information carried by spike trains of
sensory neurons in various systems using stimulus estimation methods
(Bialek et al., 1991 ; Wessel et al., 1996 ). Here, we address the
question of how this information is transferred from the sensory neuron
level to higher order neurons across multiple sensory maps by using the
electrosensory system in weakly electric fish as a model. To determine
how electric field amplitude modulations are temporally encoded and
processed at two subsequent stages of the amplitude coding pathway, we
recorded the responses of P-type afferents and E- and I-type pyramidal
cells in the electrosensory lateral line lobe (ELL) to random
distortions of a mimic of the fish's own electric field. Cells in two
of the three somatotopically organized ELL maps were studied
(centromedial and lateral) (Maler, 1979 ; Carr and Maler, 1986 ). Linear
and second order nonlinear stimulus estimation methods indicated that
in contrast to P-receptor afferents, pyramidal cells did not reliably
encode time-varying information about any function of the stimulus
obtained by linear filtering and half-wave rectification. Two pattern
classifiers were applied to discriminate stimulus waveforms preceding
the occurrence or nonoccurrence of pyramidal cell spikes in response to
the stimulus. These signal-detection methods revealed that pyramidal
cells reliably encoded the presence of upstrokes and downstrokes in
random amplitude modulations by short bursts of spikes. Furthermore,
among the different cell types in the ELL, I-type pyramidal cells in
the centromedial map performed a better pattern-recognition task than
those in the lateral map and than E-type pyramidal cells in either
map.
Key words:
stimulus estimation; signal detection; statistical
pattern recognition; temporal coding; electric fish; Eigenmannia
 |
INTRODUCTION |
Any comprehensive characterization
of sensory information processing by neuronal networks has to rely on a
wide range of experimental and theoretical approaches (Reichardt and
Poggio, 1976 ; Marr, 1982 ; Koch, 1998 ). Knowledge of the anatomy of
sensory pathways and of the microstructure of neuronal circuits as well
as of electrophysiological response properties of single neurons and
their involvement in behavior are required (Heiligenberg, 1991 ). In
addition, a quantitative understanding of the encoding and processing
of sensory information in single and multiple neuronal spike trains is
also needed. Information theoretical approaches and methods drawn from
statistical signal processing have long been applied to examine the
latter aspects (Marmarelis and Naka, 1972 ; Marmarelis and McCann,
1973 ). However, in many cases, the functional and behavioral
significance of such studies remained unclear. Because the
electrosensory system is relatively simply structured and the role of
its circuitry in processing behaviorally relevant signals across
multiple parallel sensory pathways is well known, it represents an
ideal model to investigate, in a quantitative manner, the computational
mechanisms underlying the encoding and processing of sensory
information at the single neuron and network level (Konishi, 1991 ).
Weakly electric fish generate electric fields for the active detection
of objects and for communication (for review, see Heiligenberg, 1991 ;
Bastian, 1994 ; Metzner and Viete, 1996a ,b ). In Eigenmannia, electric signals that follow an almost sinusoidal time course are
produced by continuous discharges of an electric organ at a rate
between 150 and 600 Hz. Two types of electroreceptors exist to monitor
electric fields: low-frequency ampullary and high-frequency tuberous
(Zakon, 1986 ; Heiligenberg, 1993 ). Tuberous electroreceptors and their
associated primary afferents are tuned to the dominant spectral
frequency range of the animal's own electric organ discharge (EOD) and
consist of two subclasses. T-type afferents fire phase-locked to the
zero-crossings of the EOD waveform. P-type afferents, on the other
hand, fire intermittently, with highly fluctuating response latencies,
and they encode changes in the electric field amplitude (Scheich et
al., 1973 ; Bastian and Heiligenberg, 1980 ; Bastian, 1981a , 1986b ;
Wessel et al., 1996 ).
All electroreceptor afferents terminate in the electrosensory lateral
line lobe (ELL) of the hindbrain in a somatotopic manner. The ELL
consists of four mediolaterally adjacent segments or "maps" (Fig.
1). The medial segment receives input
from ampullary afferents, whereas the three remaining segments
(centromedial, centrolateral, and lateral) are each innervated by one
collateral of tuberous afferents (Heiligenberg and Dye, 1982 ; Carr and
Maler, 1986 ). Tuberous afferents excite basilar pyramidal cells
directly and inhibit nonbasilar pyramidal cells indirectly via granule
cells. Thus, an increased firing rate in a P-type afferent, reflecting a rise in stimulus amplitude, will excite a basilar (E-type) pyramidal cell and inhibit a nonbasilar (I-type) pyramidal cell (Bastian and
Heiligenberg, 1980 ; Saunders and Bastian, 1984 ).

View larger version (53K):
[in this window]
[in a new window]
|
Figure 1.
A, Frontal section through the hindbrain
(right half) of Eigenmannia showing the four segments, or
maps, of the electrosensory lateral line lobe (ELL). MS,
Medial (ampullary) segment; three tuberous segments: CMS,
centromedial, CLS, centrolateral, LS, lateral;
Cer, cerebellum, VIII, octavolateral nerve
(containing the electrosensory afferents); layers of the tuberous ELL
segments: dnl, deep neuropil layer (contains collaterals of
electrosensory afferents), sl, spherical cell layer
(contains phase coding cells; serves as landmark), gl,
granule cell layer (contains inhibitory interneurons), pl,
pyramidal cell layer (contains E- and I-units shown in B),
ml, molecular layer (contains apical dendrites of pyramidal
cells). B, Camera lucida drawings of an E-type
(top) and I-type (bottom) pyramidal cells that
were labeled intracellularly with neurobiotin.
|
|
Several computational mechanisms for the transfer of electric field
amplitude information were suggested from the mean response characteristics of pyramidal cells. For instance, pyramidal cells may
combine half-wave rectification with transmission of time-varying information on temporal changes in the stimulus waveform (such as the
first derivative of the electric field amplitude modulation) (Enger and
Szabo, 1965 ; Bastian 1986b ). Alternatively, they might convey
time-varying information on specific frequency ranges contained in the
amplitude modulations of the stimulus (Maler, 1989 ; Rose and Call,
1992 , 1993 ; Fortune and Rose, 1997 ). The aim of the present study was
to test these hypotheses and determine how amplitude modulations are
temporally encoded and processed between the first two stages in the
segregated pathways (Metzner and Juranek, 1997 ) of the electrosensory
system.
A short report of parts of our results has been published previously
(Gabbiani et al., 1996 ).
 |
MATERIALS AND METHODS |
Preparation. Thirty-five adult specimens of
Eigenmannia species from 12 to 20 cm body length were used
in this study. The fish had either been bred and raised in the
laboratory or were purchased from a tropical fish wholesaler
(Bailey's, San Diego, CA). They were immobilized by intramuscular
injection of Flaxedil (<5 µg/gm body weight) (gallamine
triethiodide; Sigma, St. Louis, MO), gently suspended in the center of
the experimental aquarium (water conductivity, 90-110 µS/cm, pH 7;
temperature, 26-28°C) by a foam-lined forceps with only the dorsal
surface of their head protruding above the water surface, and
respirated with a stream of aquarium water via a silicone-coated glass
tube inserted in their mouth. A small plexiglass rod was glued to the
parietal bone under local anesthesia (2% lidocaine; Western Medical
Supplies, Arcadia, CA) to further stabilize the fish. The experimental
tank was situated on a vibration isolation table (Newport, Fountain Valley, CA). Although Flaxedil strongly attenuated the fish's EOD,
residual signals (amplitude, 50 µV to 1 mV) locked to the spinal
command neurons could still be monitored with a suction electrode
fitted over the tip of the tail. Curarization reduced the EOD amplitude
below the threshold level of electroreceptors. The electrosensory
system was stimulated using an EOD mimic consisting of a sinusoidal
stimulus (S1) that was applied between an electrode in the mouth and an
electrode near the tip of the tail and is described in more detail
below.
Electrophysiology. For recordings of receptor afferents, the
posterior branch of the anterior lateral line nerve just rostral to the
operculum was exposed. This allowed us to record extracellularly the
activity of single P-type electroreceptor afferents. For intracellular recordings from pyramidal cells, the ELL was reached by removing part
of the occipital bone overlying the caudal cerebellum (~3 mm2) under local anesthesia (lidocaine).
Intracellular signals were recorded with glass micropipettes filled
with 3 M KCl (resistance, 40-60 M for extracellular and
70-130 M for intracellular recordings; borosilicate glass was
pulled on a Sutter P-87 (Flaming-Brown, Novato, CA). Penetrations were
obtained by applying brief overcompensation of capacitance
neutralization or slight mechanical tapping of the headstage and of the
microdrive or both. Cell membrane potentials were sometimes stabilized
during data acquisition by passage of a weak, hyperpolarizing current
(corresponding to a hyperpolarization of the membrane potential of ~5
mV). Recording signals were amplified with an intracellular
electrometer (World Precision Instruments, WPI 767, Sarasota, FL) and
stored on a video tape using a PCM recording adapter (sample rate, 40 kHz; Vetter 3000A, Rebersburg, PA). They were subsequently converted
from analog to digital using a commercial data analysis system (sample
rate, 2 or 4 kHz/channel; Datawave, Denver, CO).
Anatomy. The ELL is highly laminated (Fig. 1), and the
somata of large pyramidal cells are situated in a central layer that extends dorsoventrally over a distance of 200 µm. GABAergic
polymorphic cells and few small pyramidal cells are found in the
ventral region of this pyramidal cell layer and appear to make only
intrinsic connections within the ELL (Bastian et al., 1993 ; Maler and
Mugnaini, 1994 ). We recorded only from the large pyramidal cells of the centromedial (CMS) and lateral (LS) segments that are situated in the
central pyramidal cell layer and project to higher order structures.
This layer can be identified easily by anatomical and physiological
criteria. For instance, the center of the pyramidal cell layer is
located ~200 µm dorsal to the spherical cell layer, which is only
~100 µm thick and physiologically very distinct. Spherical cells
are innervated by T-receptor afferents and fire, in contrast to
pyramidal cells, strictly phase-locked to the EOD mimic even at the low
stimulus amplitudes used in this study. This very reliable landmark
allowed us to limit data collection to the pyramidal cell layer and in
some cases to the molecular layer that contains the large apical
dendritic trees of pyramidal cells (Carr and Maler, 1986 ). Pyramidal
cell activity was recorded with electrodes filled with neurobiotin (2%
in 3 M KCl) (Vector Laboratories, Eugene, OR) for
intracellular labeling (Metzner and Heiligenberg, 1991 ; Heiligenberg et
al., 1996 ). In all cases, the subsequent histological analysis revealed
label in pyramidal cells only (Fig. 1). If no intracellular labeling
could be performed, the recording site was histologically verified by
setting electrolytic lesions at the conclusion of the experiment using
a high-frequency current (Hyfrecator 733; Bircher Medical Systems,
Irvine, CA) (bipolar setting at 15 W for 10 sec) (Metzner, 1993 ). The
current was applied through a low-impedance recording electrode (<20
M ) positioned at the most lateral (in the case of the centromedial ELL segment) and most medial recordings (in the case of the lateral ELL
segment) at the depth of the pyramidal cell layer. The fish was
euthanized with MS-222 (tricaine-methane sulfonate; Sigma) and perfused
with 4% paraformaldehyde in 0.1 M phosphate buffer. Brains
were post-fixed overnight and then cut on a vibratome in sections of 50 µm thickness. Sections were mounted, dehydrated, counterstained with
neutral red or cresyl violet (Nissl stain), and coverslipped. The
nomenclature of brain structures used for the light-microscopic
analysis follows Maler et al. (1991) .
Stimulation protocols. The stimulus presentation followed
the convention described in Wessel et al. (1996) . Briefly, the voltage V(t) of the electric field mimic had a mean amplitude
A0 and a carrier frequency
fcarrier and was modulated randomly (random amplitude modulations, RAMs) according to V(t) = A0 [1 + s(t)] cos (2
fcarrier t). The carrier frequency
was set at the fish's electric organ frequency before immobilization
and was above 350 Hz in all fish used for our experiments. The mean
amplitude A0 took values between 1 and 5 mV/cm
near and perpendicular to the head of the fish. The stimulus
s(t) had a flat power spectrum up to a cut-off frequency,
fc, which in the present study was varied
between 2 and 40 Hz. This white noise was generated by playing a blank
tape on a tape recorder [bandwidth, DC to 10 kHz, signal-to-noise
ratio (SNR) = 50 dB; Racal Instruments, UK], which was subsequently
filtered by a flat-amplitude, low-pass filter (two four-pole
Butterworth filters in series; Wavetek Rockland 452, San Diego, CA).
The SD, , of the stimulus was varied between 0.1 and 0.45 V,
corresponding to variations between 10 and 45% of the mean electric
field amplitude. This range of values was slightly higher than the one
used in Wessel et al. (1996) and led to more reliable responses in
pyramidal cells. For the lower range of cut-off frequencies,
fc, used in this study as compared with
Wessel et al. (1996) , A(t) = A0
[1 + s(t)] remained positive up to = 0.45 V,
i.e., no phase changes were introduced in the voltage
V(t) relative to the carrier signal cos(2
fcarrier t).
At the beginning of each recording, the response of the cell to
sinusoidal amplitude modulations between 2 and 10 Hz of a stimulus with
a carrier frequency similar to the animal's own EOD frequency was
determined. This allowed us to easily classify the cell as E- or I-unit
(Metzner and Heiligenberg, 1991 ). Subsequently, the response of the
cell to RAMs was tested. Each stimulus configuration was presented for
2.5 min. Various stimulus configurations were tested by changing the
parameters fc,
A0, and pseudorandomly between
stimulus presentations, as time permitted. At least three configurations were required to include a cell recording into our data
base. The maximum time we recorded intracellularly from a pyramidal
cell was 101 min.
Spike train analysis. To study the spontaneous activity of
pyramidal cells, the spike peak occurrence times were selected and
resampled at 2 kHz. Interspike interval (ISI) distributions, including
means and coefficients of variation (CVs), autocorrelation functions,
power spectra, and variance versus mean spike count curves were
computed using standard methods, as described in Gabbiani and Koch
(1998) . The analysis of stimulus-driven activity was also performed on
spike occurrence times and stimuli initially digitized at 2 kHz. For
very low stimulus cut-off frequencies ( 5 Hz), the sampling rate was
divided by a factor of 4, and the recording time was multiplied by the
same factor, to improve temporal averaging at the expense of spectral
frequency resolution.
Linear stimulus estimation. Linear estimation of the
stimulus from the spike trains of pyramidal cells and P-receptor
afferents was performed and quantified as described in detail in Wessel et al. (1996) and Gabbiani and Koch (1998 , Sec 9.7). Briefly, the
autocorrelation function of the spike train and the cross-correlation with the stimulus were computed and used to obtain a Wiener-Kolmogorov filter that was then convolved with the spike train, yielding an
optimal linear estimate of the stimulus in the mean square sense
(Gabbiani and Koch, 1998 ). The accuracy of this stimulus estimation
method and of the one described in the following paragraph were
assessed by computing the root mean square error, , between the
estimate and the stimulus. The coding fraction was then defined from
the root mean square error, , and the SD, , of the stimulus as
= 1 / . The coding fraction takes the maximum value of 1 when the stimulus is perfectly estimated ( = 0) and the minimum value of 0 when estimation is at chance level ( = ) (Gabbiani and
Koch, 1996 ). Thus, the coding fraction provides an objective estimate of the time-varying information encoded in a neuronal spike
train, as assessed by an ideal observer. The accuracy of stimulus
estimations was further characterized in the frequency domain by
computing signal-to-noise ratios (SNRs) as a function of stimulus
frequency (see Fig. 6). A value of 1 for
SNR(f) at a given frequency, f,
means that estimation of this particular frequency is at chance level,
whereas perfect estimation corresponds to an infinite SNR (Gabbiani and
Koch, 1996 ). In addition to the stimulus itself, three linear and
non-linear functions of the stimulus were estimated by the same method.
These included, after subtraction of the mean stimulus, first,
positively and negatively half-wave rectified stimuli; second, the
temporal derivative of the stimulus; and third, positively and
negatively half-wave rectified temporal derivatives. Temporal
derivatives were computed by linear convolution of the stimuli with a
digital differentiation filter. To suppress the amplification of
high-frequency noise inherent to such a numerical computation, the
differentiation filter was convolved with a carefully selected Kaiser
window, as explained in Hamming (1989 , Sec 9.7) and Oppenheim and
Schaffer (1989) .
Nonlinear stimulus estimation. The (half-wave-rectified)
stimulus and its (half-wave-rectified) temporal derivative were also estimated from the spike trains of pyramidal cells by a quadratic algorithm, which took possible nonlinear interactions in the encoding of detailed time-course information into account that could have taken
place between two spike occurrence times. The implementation used a
straightforward modification of the fast orthogonal method described in
Korenberg (1988) . Because of the computational burden of such general
quadratic algorithms (Koh and Powers, 1985 ), the stimuli and spike
trains were first resampled with a resolution of 20 msec, corresponding
to a sample rate of 50 Hz. This sampling rate was sufficient to resolve
the time-course of stimuli with a cut-off frequency below 25 Hz (see
Fig. 6B, C). During the resampling of pyramidal cell
spike trains, two spikes occurring in the same 20 msec bin were
replaced by a single event of doubled amplitude {i.e.,
(t t1) + (t t2) 2 (t t3) for t1,
t2 in [t3 10 msec;
t3 + 10 msec], where (t ti) represents the occurrence of a spike at
ti, i = 1, 2, 3}. The
down-sampling allowed us to use linear and quadratic filters of
manageable size (31 and 31 × 31 elements, respectively) to cover
the time windows of interest (±300 msec around each spike). Linear
estimation of the stimulus from resampled pyramidal spike trains were
first compared with those obtained at a sampling rate of 2 kHz. The
estimation filters and the fraction of the stimulus encoded that were
obtained with these two methods were identical, indicating that
temporal modulations in the instantaneous firing rate of pyramidal
cells below 20 msec did not carry substantial information about the
stimulus time-course. In contrast, stimulus estimations from 50 Hz
down-sampled P-receptor afferent spike trains were substantially worse
than those obtained at a 2 kHz sampling rate, indicating that
modulations in the instantaneous firing rate of P-receptor afferents at
time scales shorter than 20 msec carried substantial information.
Therefore, quadratic estimation methods were not pursued further with
P-receptor afferent spike trains.
Feature extraction. We assessed the ability of pyramidal
cell and P-receptor spikes to convey information about the presence of
temporal features, such as up- and downstrokes in random modulations of
the electric field amplitude, by discriminating stimulus waveforms preceding the occurrence or nonoccurrence of spikes in response to the
stimulus by using two pattern classifiers (Fisher and Euclidian, respectively). In the following, we will first explain how the stimulus
waveforms were obtained and then describe how the two classifiers were
defined and how the classification error characterizing their
performance was computed.
Each spike train and the corresponding stimulus s(t) were
binned using three bin sizes between t = 0.5 msec (corresponding to the sampling rate of 2 kHz for the spike
occurrence times) and a maximal bin size
tmax. The maximal bin size was
determined from the requirement that no more than one spike should fall
in a given bin. In general, this requirement was slightly alleviated by
the fact that in <1.8% of all spike occurrences, two spikes were
allowed to fall in the same bin. This accounted for the rare occurrence
of exceptionally close spikes and for an unfavorable placement of the
bins with respect to the spike train. For spikes of pyramidal cells,
t ranged between 3 and 15 msec, whereas for P-receptor afferents t ranged from 0.5 to 4.5 msec
because of their higher firing rates (see Fig. 7A). The
wave-form of the RAM that preceded the bin [t t;t] was defined as the 101 dimensional stimulus vector st = (s(t 100 t), ...,
s(t)), and the variable t took the value 1 or
0 depending on whether a spike occurred in the bin [t t;t]. Let P
(s| = 1) and
P(s| = 0) be the two distributions of
stimulus vectors conditioned on the occurrence or nonoccurrence of a
spike in a bin of size t. The collection of
stimulus vectors belonging to these distributions was determined from
the experimental data by considering successively each time point
t = n t for n ranging from
101 to the largest integer smaller than
T/ t, with T being the
duration over which one particular stimulus configuration was presented
(see above; usually, T = 140 sec). Each vector
st was assigned to
P(s| = 1) or to
P(s| = 0) according to whether
( t = 1) or not ( t = 0) a spike occurred
in the bin [t t;t] (Fig.
2).

View larger version (37K):
[in this window]
[in a new window]
|
Figure 2.
Schematization of the data analysis performed for
the feature extraction method. Center panel, Each stimulus
and spike train was subdivided into short bins (labeled
a-r) containing at most one spike. The collection of
stimuli preceding each bin was separated into two distributions,
P(s| = 1) and
P(s| = 0), according to whether a spike
occurred in the corresponding bin (c, d, g, j, k, m, q, r)
or not (a, b, e, f, h, i, l, n-p). In the example depicted
here, spikes occur preferentially after a RAM upstroke (as for E-type
pyramidal cells). The separation of the two distributions and, thus,
the ability of spikes to reliably convey the presence of an upstroke
was then assessed using a linear classifier. Side boxes,
Means m0 (left, top graph) and m1 (right, top graph) of the stimulus
preceding no spike occurrence (left box) and the occurrence
of a spike (right box) for an E-unit in the CMS as well as
the covariance matrices, 0 (left, bottom
graph) and 1 (right, bottom graph)
characterizing the second order variations of
P(s| = 0) and
P(s| = 1) around
m0 and m1 (stimulus
parameters: A0 = 3.0 mV/cm,
fc = 44 Hz, = 0.32 V; bin size
t = 3.5 msec). Note the difference in scale
between the two top panels. Because our stimuli are
stationary and zero mean, the means m0 and m1 are related according to
p0m0 + p1m1 = 0 (with
p1 = probability of a spike in bin
t, and p0 = probability of
no spike occurrence in bin t).
|
|
The separation of the distributions
P(s| = 0) and
P(s| = 1) in stimulus space was
assessed by the ability of a statistical pattern classifier to
discriminate among them. Consider a linear classifier of the form:
|
(1)
|
where the dot denotes matrix multiplication and the notation
xT for a vector x denotes
the transposed vector, obtained by exchanging the rows and columns of
x. According to Equation 1, for a fixed feature vector
f and threshold , a stimulus s is classified
as belonging to class 1 (i.e., the class of stimuli eliciting a spike)
or class 0 (i.e., the class of stimuli eliciting no spike) by
projecting s onto f and comparing the value of
the projection to the threshold . If
fT·s is larger
than threshold [corresponding to
hf, (s) > 0], then s
is assigned to class 1; otherwise s is assigned to class
0.
Fisher classifier. The performance of the classifier of
Equation 1 relies on an appropriate choice of the feature vector
f and the threshold . The optimal feature vector
f was determined by maximizing Fisher's linear discriminant
function. Let m0 and m1
be the mean values of the conditional distributions P(s| = 0) and
P(s| = 1), respectively, and denote by
0 and 1 the corresponding
covariances:
|
(2)
|
where the average · i is over the
distribution P(s| = i),
i = 0, 1 (Figs. 2, 4A-D). The
covariance matrices defined in Equation 2 characterize to a first
approximation the variances and the correlations among the components
of the stimulus vector s for the two classes
i = 0, 1. Estimates of
m0, m1,
0, and 1 were obtained using
maximum likelihood estimators. The feature vector f used to
separate the distributions P(s| = i), i = 0, 1 was obtained by maximizing the
signal-to-noise ratio:
|
(3)
|
This function constrains only the direction of f in
stimulus space because it is independent of the magnitude of
f: SNR( f) = SNR(f) for 0. Therefore, to obtain the optimal direction for f it is sufficient to maximize SNR(f) over a subset of vectors having
constant norm, as explained in the next paragraph. To clarify the
significance of maximizing the signal-to-noise ratio of Equation 3, let
us denote by
P(fT·s| = 1) and
P(fT·s| = 0) the two one-dimensional distributions of stimuli projected onto
f (Fig. 3). Their means,
µi, and variances, i2, are
given by:
|
(4)
|
|
(5)
|
for i = 0, 1. Therefore, the numerator of
Equation 3 is the squared distance between these means,
(µ1 µ0)2, whereas the
denominator is equal to 1/2( 02 + 12). Thus, Equation 3 selects an optimal direction
in stimulus space by attempting to maximize the distance between the
means of the projected distributions while minimizing the sum of their
variances. Both of these criteria in general will contribute to the
discrimination performance, as illustrated in a two-dimensional example
in Figure 3 (Jolliffe, 1986 , Sec 9.1; Bishop, 1995 , Sec 3.6.1).

View larger version (17K):
[in this window]
[in a new window]
|
Figure 3.
Graphic illustration of the principle underlying
the selection of the optimal feature vector f using the
Fisher discriminant function (see Eq. 3). In this two-dimensional
example, the circles and squares are sample
points drawn from two Gaussian distributions with different mean
vectors, mi, and identical covariance matrices, i (i = 0, 1), representing
P(s| = 0) and
P(s| = 1), respectively. For each
direction f in stimulus space one computes the
means, µi, and the variances,
i2, of the two distributions
P(fT·s| = 1) and
P(fT·s| = 0) projected onto f. The optimal direction selected by Eq. 3 is the one that maximizes the squared distance between these means,
divided by the sum of their variances. In this particular example, the
squared distance between the means, µi, is
maximized, and the sum of the variances,
i2, is minimized for the direction
shown.
|
|
The solution vector f to Equation 3 can be obtained by the
method of Lagrange multipliers, i.e., by maximizing the numerator of
Equation 3 and keeping the denominator constant (Anderson, 1984 , Sec
6.4; Bishop, 1995 , Sec 3.6.1 and appendix C). The resulting condition
for f is
|
(6)
|
This equation can be solved immediately if the covariance
matrices i (i = 0, 1) are invertible:
f = 2( 0 + 1) 1 (m1 m0). If the covariance matrices
i are not invertible, then the two
distributions of stimuli are concentrated on a linear subspace of the
original stimulus space, and the discrimination problem has to be
considered and solved on the common subspace on which 0
and 1 are invertible (apart from trivial cases, see Anderson and Bahadur, 1962 , footnote 3). Because in the present case
the matrices i were not always invertible,
this latter requirement was implemented as follows: the matrix
1/2( 0 + 1) was diagonalized numerically. The eigenvalues i and the
corresponding eigenvectors ei were
arranged in decreasing order of magnitude: 1 2 ... 101 0. The first
n largest eigenvalues accounting for 99% of the variance
were retained, i.e., n was the smallest integer such
that:
|
(7)
|
The projection of m1 m0 onto the first n eigenvectors of
1/2( 0 + 1) was computed:
vi = (m1 m0)T·ei,
for i = 1, ...,n. The optimal feature vector f was then obtained from
|
(8)
|
(Fig. 4E,
F) (Press et al., 1992 , Sec
2.6; Bishop, 1995 , Sec 3.4.3 and 3.6.2). The number of eigenvalues
retained ranged from 3 to 101 and depended on the cut-off frequency of
the stimulus as well as on the sampling step
t. This relationship can be understood by
considering the matrix p0 0 + p1 1, where
p1 is the probability of spike occurrence in a
bin t and p0 the
probability of no spike occurring in t. This
matrix is a continuous deformation of 1/2 0 + 1/2 1 and represents the covariance of the stochastic
process s(t). A classical result states that for the
stationary stimuli s(t) used in these experiments, the
eigenvalues of p0 0 + p1 1 are asymptotically related to the
power spectrum of s(t) (Greenander and Szegö, 1958 ,
Chap 5). The number of non-zero eigenvalues is therefore determined by
the cut-off frequency of s(t) and the sampling step
t. Furthermore, the eigenvectors of
p0 0 + p1 1 are expected to be
oscillating functions of time, which is a characteristic that is also
observed for the eigenvectors of 1/2 0 + 1/2 1. The oscillatory behavior of the eigenvectors of
1/2 0 + 1/2 1 translated into an
oscillatory behavior of the feature vector f, which was more
or less pronounced depending on the value of the projections of
m1 m0 onto these
eigenvectors (Fig. 4F, H).

View larger version (35K):
[in this window]
[in a new window]
|
Figure 4.
Computation of the optimal feature vector
f exemplified for an I-type pyramidal cell in LS (stimulus
parameters: A0 = 1.25 mV/cm,
fc = 12 Hz, = 0.29 V;
t = 7 msec). A, B, Mean stimuli m0 and m1
preceding no spike occurrence and a spike, respectively. In these two
panels and the following ones, error bars always represent SD over 10 repetitions of the same experiment. C, D, Covariance
matrices 0 and 1 of the distributions P(s| = 0) and
P(s| = 1). Insets, Mean
value and SD of the estimated covariances along the main diagonals and
the t = 0 axis. E, Eigenvalues of
1/2 0 + 1/2 1 sorted in decreasing order of
magnitude. In E-G, arrows indicate the last eigenvalue taken into account for the computation of f (eigenvalue number = 17). F, Normalized value of the projection of
m1 m0 onto the
eigenvectors of 1/2 0 + 1/2 1. The sum of the first 17 eigenvectors weighted by the corresponding normalized projection yields f. G, Value of the
signal-to-noise ratio, SNR, as a function of the number of eigenvalues
considered for the computation of f. SNR saturates when 17 eigenvectors are taken into account. Thus, eigenvectors with eigenvalue
numbers larger than 17 do not contribute significantly to the
discrimination performance. H, Feature vector obtained by
the Fisher method (solid line) and Euclidian feature vector
f = m1 m0 obtained directly from A and
B (dashed line).
|
|
Quantification of the classifier performance. To quantify
the classifier performance, we computed the projection of each
st onto f. This was used to determine
the two conditional distributions
P(fT·s| = 1) and
P(fT·s| = 0) (Fig. 5A). The
probability of correct detection (PD),
i.e., the probability of correctly identifying a stimulus vector
s as eliciting a spike and the probability of false-alarm (PFA), i.e., the probability of
incorrectly classifying a stimulus vector s as eliciting a
spike were obtained for successive values of the threshold by
numerically integrating the tails of the two distributions
P(fT·s > | = 1) and
P(fT·s > | = 0) using a trapezoidal rule.
PD was then plotted as a function of
PFA (Fig. 5B). This plot is called a
receiver operating characteristic (ROC) curve (Green and Swets,
1966 ).

View larger version (30K):
[in this window]
[in a new window]
|
Figure 5.
Quantification of the feature extraction
performance (same example as in Fig. 4). A, Distributions
P(fT·s| = 0)
(left curve) and
P(fT·s| = 1) (right curve) of the stimulus projected onto the
feature vector f. The distributions were computed using 241 bins centered at the mean of each distribution and covering ±3 SD. The
last bin on each side represents the tail of the distribution. Note the
large tail for negative values of f. Error bars represent SD
over 10 repetitions of the same experiment. B, The probability of correctly identifying a stimulus vector s as
eliciting a spike plotted as a function of the probability of
incorrectly classifying a stimulus vector s as eliciting a
spike (= false alarm). This plot, which is called an ROC curve, corresponds to the performance of the linear classifier
hf, (s) for different
values of the threshold . Decreasing the threshold increases the
probability of false alarm. Dashed line, Chance level.
C, Probability of misclassification
1/2PFA + 1/2(1 PD) plotted as a function of the
probability of false alarm, PFA. The minimum,
minPFA[1/2PFA + 1/2(1 PD)] yields the
value of used to characterize the performance of single spikes to
convey information on the presence of temporal features in the
stimulus. Dotted line, Performance of the Euclidian classifier (see Fig. 4H). D, Minimum
probability of misclassification minPFA[(1 p1)PFA + p1(1 PD)] as a function of
the prior probability of a spike in a bin, p1.
The choice p1 = 1/2 used to compute (see
C) corresponds to the least favorable prior (i.e., the
highest possible value for the probability of misclassification). Inset, Dependence of on the bin size
t. Although decreases with bin size in
this example, increases and minima for intermediate bin sizes were also
observed.
|
|
The overall probability of misclassifying a stimulus as eliciting a
spike or not, , was obtained as the minimum of:
|
(9)
|
over the whole range of values determined by
PFA [0;1] and the function
PD(PFA) (Fig.
5C). In this equation, PFA represents the probability of false positives (= false alarm), whereas (1 PD) is the probability of false
negatives. A value of = 1/2 corresponds to a discrimination
performance at chance level. Conversely, = 0 indicates that it is
possible to perfectly predict the occurrence or nonoccurrence of a
spike by projecting the stimulus vector s onto f.
Thus, is a measure of how well the projection of s onto
the feature vector f predicts the occurrence or
nonoccurrence of a spike. This in turn can be interpreted as a measure
of how accurately spikes of P-receptor afferents and pyramidal cells
convey information on the presence or absence of temporal features,
such as upstrokes or downstrokes in the RAM wave-form. Note that the
error rate does not correspond to the minimum error achieved in
predicting the occurrence of a spike by an ideal observer who has
complete knowledge of the statistical properties of the stimulus and
the spike occurrence probability (Bayes rule) (Poor, 1994 ) because the
prior probability p1 of a spike in a bin
t was not taken into account. Instead, the
error rate corresponds to the minimum error achieved by an ideal
observer having no access to p1, i.e.,
the minimum error rate under the least favorable priors, or minimax
rule (Fig. 5D) (Poor, 1994 ). The error rate was computed
by using a resubstitution method (i.e., the same data set was used to
compute f and ). In a series of test cases, we verified
that the downward bias of was negligible by comparing to the
error rate obtained by a cross-validation method (Fukunaga, 1990 ). This
result was in accordance with theoretical analyses on the dependence of
the bias of Fisher discriminants with sample size (Raudys and Jain, 1991 ). Ultimately, the bin size t that
yielded the lowest value of was retained. An example of the
dependence of on t is illustrated in
Fig. 5D (inset).
Euclidian classifier. The performance of the Fisher
classifier was compared with the performance of the Euclidian
classifier, which was obtained by using the feature vector
f = m1 m0 without taking into account the covariances
of P(s| = 0) and
P(s| = 1) (Figs. 4H,
5C). This feature vector is considerably easier to compute
and coincides with the Fisher feature vector when the two covariance
matrices 0 and 1 are proportional to the
identity matrix, that is, when no correlations exist among different
components of s (see Eq. 2). This follows from Equation 1,
because in this case the matrix multiplication on the left side reduces
to multiplication by a scalar. Although our covariance matrices were
usually not proportional to the identity matrix (Figs. 2,
4D,E), comparison of the performance achieved by
these two methods served as a measure of the influence of correlations between components of s on the classification performance. More general classification schemes such as linear logistic
discrimination (Efron, 1975 ) or nonlinear classifiers (Fukunaga, 1990 )
were not considered.
Information conveyed by bursts of spikes. Pyramidal cells
exhibited a marked tendency to fire short bursts of spikes in their spontaneous activity as well as in response to electric field RAMs.
Therefore, we separated their spikes into two subclasses consisting of
isolated spikes (denoted by = 1isol) and of
spikes belonging to bursts (denoted by = 1burst)
based on the shape of the interspike interval distribution (see
Results). An additional subclass consisting of spikes belonging to
bursts of three or more spikes was also considered ( = 1burst3). To investigate how the occurrence of
upstrokes and downstrokes in the RAM waveform was signaled by isolated
spikes versus spikes belonging to bursts, we applied the techniques
described above to study the separation between
P(fT·s| = 0) and
P(fT·s| = 1isol) as well as the separation between
P(fT·s| = 0) and
P(fT·s| = 1burst) or the separation between
P(fT·s| = 0) and
P(fT·s| = 1burst3).
Comparison across cell types and maps. We compared the
performance of the different pyramidal cell types, i.e., E-units versus I-units as well as units from CMS versus LS, by computing the median
probabilities of misclassification for each class and testing them for
statistically significant differences using nonparametric methods
(Wilcoxon rank sum test) (Lehmann, 1975 ).
Information conveyed by periods of silence in P-receptor afferent
spike trains. The encoding of RAM downstrokes by periods of
silence in P-receptor afferent spike trains was studied using similar
feature extraction methods. Briefly, temporal stimulus waveforms (300 msec long; sampling step t = 5 msec)
were separated into two classes according to whether a 50 msec period
of nonspiking occurred in P-receptor afferent spike trains. This time
window reached from 250 to 300 msec, corresponding to the most recent 50 msec of the 300-msec-long stimulus waveform. The two classes were
subsequently classified by projection onto a Euclidian feature vector.
 |
RESULTS |
We recorded the activity of a total of 236 pyramidal cells by
using both intracellular and extracellular recording techniques and of
20 P-receptor afferents extracellularly. Recordings of 61 pyramidal
cells and 18 P-receptor afferents were suited for data analysis.
Spontaneous activity of pyramidal cells
Pyramidal cell recordings in vitro often reveal
spontaneous slow rhythmic discharges (Mathieson and Maler, 1988 ; Turner
et al., 1996 ). We recorded the spontaneous activity of pyramidal cells
in vivo and analyzed a total of 36 cells (17 E-units from CMS, 9 I-units from CMS, 6 E-units from LS, and 4 I-units from LS). The
spontaneous activity was measured in the complete absence of an
electric field, i.e., no carrier signal was presented. Mean firing
rates ranged between 7 and 43 Hz, and the CVs of the ISI distributions
reached values between 0.4 and 2.2 (Fig.
6A, inset). The
variance of the spike count was a linear function of the mean when
plotted in double-log coordinates, with slopes ranging from 0.8 to 1.6 (counting intervals: 10-5010 msec, corresponding to mean spike counts
between 1 and 200; Pearson correlation coefficients: 0.96-0.99).
Although these values differ from the in vitro results (Turner et al., 1996 ) (see Discussion), they are consistent with values
obtained in in vivo recordings in various other sensory systems (Teich et al., 1996 ). In the majority of cases analyzed (n = 32), pyramidal cells tended to fire short bursts
of spikes separated by longer intervals of silence. This manifested
itself in the interspike interval distributions by one or two prominent peaks, usually well separated from a tail of longer intervals (Fig.
6A). The remaining four cases resembled the regularly
spiking (n = 2) and the irregularly spiking pyramidal
cells (n = 2) described in Turner et al. (1996) . The
tendency of pyramidal cells to fire in bursts was also obvious in the
autocorrelation function of the spike trains: it exhibited a large peak
at a delay corresponding to the preferred intraburst interspike
interval (Fig. 6B).

View larger version (28K):
[in this window]
[in a new window]
|
Figure 6.
Spontaneous activity of an E-type pyramidal cell
in CMS (no external stimulus present). A, ISI distribution,
with a mean interspike interval of 60 msec and a CV equal to 1.56. The
arrow indicates the maximal interspike interval
(tmax = 19 msec) between two spikes assigned to
the same burst event. The range of values for the mean and CV observed
in 36 cells is given in the inset. B, The autocorrelation function of the spike train (thick line)
showed a peak at the preferred intraburst interspike interval (15 msec). This peak disappears in the autocorrelation function of the
events (thin line), which consist of the original isolated
spikes and one spike for each burst in the spike train (Bair et al.,
1994 ). The function singularity of both autocorrelation functions
at t = 0 has been subtracted. C, The
probability distribution of the number of spikes per event was always
well fitted by a straight line in logarithmic coordinates (Pearson
correlation coefficient r = 0.998; observed range,
from 0.957 to 0.999). The arrow indicates a single burst
event containing 16 spikes that was not included in the fit and was
classified as an outlier (total number of events, 1012). D,
Plot of the slope, a, versus the intercept parameter,
b, describing the statistics of spikes per event
(dashed line, best linear fit). Different symbols
indicate responses obtained from different pyramidal cell types and ELL
maps (n = 32 pyramidal cells).
ECMS, E-units in centromedial
map; ELS, E-units in lateral
map; ICMS, I-units in
centromedial map; ILS, I-units
in lateral map.
|
|
On the basis of these observations, we defined bursts as events
consisting of two or more spikes that were separated by a time interval
shorter than a value tmax. This value,
tmax, was determined for each spike train
from its ISI distribution by selecting the first trough immediately
following the large peak(s) described above (Fig. 6A,
arrow) [note that this approach is similar to the one chosen by
Turner et al. (1996) ]. The distribution of the number, n,
of spikes per event was then plotted for each spike train (Fig.
6C) (n = 1 corresponds to isolated spikes
and n 2 to bursts). The resulting distributions
could be fitted well by an exponential function of the number of spikes
per events, pn, with
pn = ean+b, except
for one or two occasional outliers (Fig. 6C, arrow). The
parameters a and b represent the slope and the
intercept, respectively. For our sample, the distribution of values for
these two parameters a and b is given in Fig.
6D. The values for the slope a and the
intercept b were well correlated (Pearson coefficient r = 0.97) following the relation a = b + , with = 0.46 and = 0.81 (Fig.
6D, dashed line). Thus, points in the lower right of
Fig. 6D represent pyramidal cells with a spontaneous activity that exhibited more isolated spikes, p1 = ea+b, and relatively fewer bursts,
pn/p1 = ea(n 1). Conversely, points in the upper
left of the graph represent cells that showed more bursts in their
spontaneous firing patterns. The graph suggests a slight tendency of
cells in LS to fire more bursts during their spontaneous activity than
did most cells in CMS.
Encoding of the time course of RAMs by P-receptor afferents and
pyramidal cells
During stimulus presentation, the activity of pyramidal cells was
modulated by the RAMs of the EOD mimic (Fig. 7B,
C; see Fig. 9A, B).
Nevertheless, the statistical properties of spike trains obtained
during stimulation of pyramidal cells differed little from those
recorded during spontaneous activity. Values of mean ISIs and CVs were
similar to those observed during spontaneous firing, and pyramidal
cells retained their characteristic bursting patterns (Gabbiani et al.,
1996 , their Fig. 1A). Furthermore, the pyramidal cell
activity was more stable than the response of P-receptor afferents when
stimulus parameters were varied between repetitive stimulations.
Whereas changes in the mean stimulus amplitude, for instance, elicited
large sustained changes in the mean firing rate of P-receptor afferents
(Wessel et al., 1996 , their Fig. 5), similar changes did not
substantially alter the sustained responses of pyramidal cells (data
not shown). These observations are consistent with the presence of
adaptive mechanisms at the level of the ELL, such as gain control,
which normalizes the response of pyramidal cells (Bastian, 1986a ).

View larger version (30K):
[in this window]
[in a new window]
|
Figure 7.
Examples of linear and quadratic stimulus
estimations for responses of P-receptor afferents
(A), E-type (B), and I-type
pyramidal cells (C). For all three examples
(A-C), spike trains are symbolized in each
bottom row, the corresponding RAMs (= stimuli) are indicated in
the center and superimposed with their linear, and in
B and C only, quadratic estimates obtained from
the spike trains. Each top row contains two graphs showing
the power spectral density of the stimulus as a function of stimulus
frequency (left graphs) and SNR in the frequency domain for
linear estimation (right graphs). A, P-receptor
afferents encoded the detailed time-course of the stimulus by
modulating their instantaneous firing frequency. Note the much higher
sustained firing rate than that observed in pyramidal cells (see
B and C) (mean firing rate: 221 Hz; coding
fraction = 0.76; stimulus parameters: A0 = 1.2 mV, = 0.26 V, fc = 9 Hz).
B, Linear and quadratic estimation for an E-type pyramidal cell from CMS (Ecms; mean firing rate: 17 Hz; lin = 0.09; quadr = 0.13; stimulus
parameters: A0 = 3.0 mV, = 0.32 V,
fc = 18 Hz). The stimulus was resampled at a 50 Hz sampling rate to compute the quadratic estimate (see Materials and
Methods). The stimulus as well as the linear and quadratic estimates
are therefore illustrated at this sampling rate. C, Same as
in B for an I-type pyramidal cell from CMS
(Icms; mean firing rate: 13 Hz;
lin = 0.11; quadr = 0.15; stimulus
parameters: A0 = 5.0 mV, = 0.34 V,
fc = 9 Hz).
|
|
The ability of P-receptor afferents and E- and I-type pyramidal cells
to convey information about the time-course of the stimulus was
initially assessed by a simple linear estimation of the stimulus from
the spike trains, as illustrated in Fig. 7A-C. P-receptor afferents exhibited higher firing rates than pyramidal cells, and a
large fraction of the stimulus was recovered from trains of single
spikes. The stimulus estimation results were similar to those described
in Wessel et al. (1996) and are consistent with the encoding of
detailed stimulus time-course by modulations of the instantaneous
firing frequency of P-receptor afferents (Gabbiani and Koch, 1998 , Sec
9.6.3 and 9.7.3). At the lower cut-off frequencies used in the present
study, the fraction of the stimulus encoded reached values up to = 0.82, and occasionally signal-to-noise ratios as high as 100:1 were
observed. In contrast, all E- and I-type pyramidal cells analyzed
encoded the time-course of the stimulus only very poorly (Fig.
7B,C) (Gabbiani et al., 1996 , their Fig. 3C,D).
The signal-to-noise ratios obtained by estimating the stimulus from
pyramidal cell spike trains were always considerably smaller than those
observed by estimation from P-receptor afferent spike trains. They
typically peaked at a frequency that depended on the cell recorded from
and on the cut-off frequency of the stimulus. Two such examples are
shown in Figure 7B,C. When peak SNRs of pyramidal cells
located in CMS were compared with those from LS, no statistically
significant differences were found.
An alternative possibility to characterize pyramidal cell activity is
to determine the information conveyed by their spike trains on the
detailed time-course of positively half-wave rectified (E-type
pyramidal cells) or negatively half-wave rectified RAMs (I-type
pyramidal cells). Encoding of half-wave rectified stimuli by pyramidal
cells would be consistent with their response properties to sinusoidal
or step-wise amplitude modulations (Bastian and Heiligenberg, 1980 ;
Saunders and Bastian, 1984 ). To investigate this idea, we estimated
positively and negatively half-wave-rectified RAMs from pyramidal cell
spike trains and computed the corresponding coding fractions,
+ and  |