Abstract
Many fish rely on sounds for communication, yet the peripheral structures containing the hair cells are simple, without the morphological specializations that facilitate frequency analysis in the mammalian cochlea. Despite this, neurons in the midbrain of sound-producing fish (Pollimyrus) have complex receptive fields, extracting features from courtship sounds. Here we present an analysis of the initial encoding of sounds by the primary afferents and demonstrate that the representation of sound undergoes a substantial transformation as it ascends to the midbrain. Afferents were isolated as they coursed from the sacculus through the medulla. Tones (100 Hz-1.2 kHz) elicited synchronized spikes [vector strength (VS) >0.9] on each stimulus cycle [coefficient of variation (CV) <1.1], with little spike rate adaptation. Most afferents (67%) were spontaneously active and began synchronizing 10 dB below rate threshold. Rate thresholds for the most sensitive afferents (65 dB) were close to behavioral thresholds. The distribution of characteristic frequencies and best sensitivities was matched to the spectrum of sounds of this species and to its audiogram. Three clusters of afferents were identified, one including afferents that generated spike bursts and had v-shaped response areas (bursters), and two others that included entrained afferents with broad response areas (entrained types I and II). All afferents encoded the timing of clicks within click trains with time-locked spikes, and none showed selectivity for interclick intervals. Understanding the computations that yield complex receptive fields is an essential goal for auditory neuroscience, and these data on primary encoding advance this goal, allowing a comparison of inputs with feature-extracting midbrain neurons.
- auditory communication
- primary afferent
- computation
- electric fish
- hearing
- temporal processing
- neural transformation
- Mormyridae
In mammals, sound is place coded at the cochlea and carried into the brain through labeled lines. Within these channels, information about the temporal structure of sounds is encoded by synchronized spikes. Temporal analysis is particularly important at low frequencies (<3 kHz), where spike synchronization is strongest and where speech and many other animal communication sounds are produced. Because temporal analysis predominates in fish, they serve as valuable model systems for time-domain processing in vertebrate hearing (Fay, 1978, 1982; Fay and Passow, 1982; Bodnar and Bass, 1997, 2001; McKibben and Bass, 1999).
The mormyrid fish Pollimyrus produces low-frequency communication sounds (Crawford et al., 1986, 1997a,b; Bratton and Kramer, 1989). The courtship displays of males are composed of alternating grunts [250 msec click trains, interclick interval (ICI) of 18 msec] and moans (800 msec tones, 250 Hz fundamental) (Crawford et al., 1997a). The sounds of closely related species are distinct, and the sounds of males, within a species, are also individually specific. Behaviorally, the fish are sensitive to small differences in click trains and tones, indicating that even minute individual differences are readily detectable (Marvit and Crawford, 2000a).
The mormyrid fish ear consists of the sacculus and a gas-filled tympanic bladder (Stipetic, 1939; Fletcher and Crawford, 2001). The spherical bladder translates underwater sound pressure into the displacement that activates the hair cells (HCs) within the sacculus. The bladder is larger than the sacculus and is uniformly coupled to the sacculus over its entire extent. Thus, the auditory apparatus is much simpler than a cochlea, with a pulsating sphere driving the motion of the sacculus. This ear lacks the elongated sensory surface and basilar membrane that are important for peripheral frequency analysis in other vertebrates.
Despite the simplicity of the mormyrid ear, there are neurons in the CNS (midbrain) that are highly selective for click trains and tones of particular periods or frequencies. These neurons are suited to detecting features of grunts and moans (Crawford, 1993, 1997b). One physiological class is sensitive to click trains and is selective for particular interclick intervals common in grunts, whereas another is sharply selective for frequencies found in moans. It is likely that these represent the output of neural computations performed in the time domain (Crawford, 1993, 1997b; Kozloski and Crawford, 2000). These forms of feature selectivity could be produced by computational mechanisms that use synchronized spike trains as input, but to date, a detailed analysis of the primary afferent input to this system has not been available.
Here we present the first physiological analysis of the primary input to the mormyrid auditory system and show that complex receptive fields are an emergent property of the midbrain that does not exist in the primary afferent inputs to the brain. The primary afferents generate a faithful temporal representation of sounds through their synchronized spikes. This representation is relayed through the medulla and into the midbrain (Kozloski and Crawford, 1998, 2000), where it becomes transformed.
MATERIALS AND METHODS
Many of the methods used have been detailed previously (Crawford, 1993, 1997b; Kozloski and Crawford, 1998). The animal protocols were approved by the Institutional Animal Care and Use Committee of the University of Pennsylvania and comply with The Principles of Animal Care published by National Institutes of Health.
This study is based on the physiological characterization of 116 primary afferent auditory neurons. These neurons were recorded within the medulla because there is no peripheral portion of the nerve that can be exposed for recording; the sacculus is close to the brain surface, with auditory axons rapidly entering the medulla and coursing toward their midline targets (Kozloski and Crawford, 1998).
Single auditory axons were isolated on the basis of standard electrophysiological criteria, using dye-filled electrodes. Electrodes were advanced into the axon bundle in the lateral part of the medulla, distant from the medially positioned medullary nuclei. We used standard techniques to isolate single afferent neurons on the basis of spike waveform characteristics. One to four neurons were characterized per fish, all within 200 μm of each other, in one electrode track. Only neurons that were within 200 μm of the injection site were included. We then confirmed that the extracellular recordings were made within the axon bundle after iontophoresis of neurobiotin (Kawasaki and Guo, 1996; Kozloski and Crawford, 1998). We used a low injection current (1 μA), short injection time (2.5 min), and short survival time (10 min) to prevent spread of neurobiotin beyond the axons local to the recording site. In most cases, only one to several axons were stained, confirming that the recordings were made in the nerve (Fig.1). It was not necessary for our analyses to identify the particular axon from which each set of physiological data were obtained. If any label was detected within medullary auditory neurons, the corresponding physiology was eliminated from our sample because it was then possible that the primary afferents might have been secondarily labeled after uptake by first-order medullary neurons. An analysis of response latencies provided additional confirmation that our recordings were from primary afferents. Latency was measured as the delay between the peak of the click stimulus and the onset of the first evoked spike measured in the click peristimulus time histograms (PSTHs). The mean spike latency to a click stimulus was only 1.2 msec (±0.6 SD), shorter than expected for higher-order medullary neurons.
Animals and preparation. Two sound-producing mormyrid species, Pollimyrus adspersus and Gnathonemus petersii (Crawford, 1997a), were imported from Nigeria by commercial dealers. More is known about the acoustic repertoire ofP. adspersus, and the social settings in which the adult breeding fish use sounds, than for G. petersii. To date, studies of G. petersii have been based on juvenile fish and have revealed only simple clicking sounds used during agonistic behavior (Rigley and Marshall, 1973; Fletcher and Crawford, 2001). The acoustic repertoire of juvenile P. adspersus is similarly impoverished. It seems likely that adult G. petersii also use complex courtship sounds, but this matter awaits behavioral studies of breeding animals.
The ears of the two species appear nearly identical, both having a tympanic bladder coupled to the sacculus. The physiology of the primary afferents was also indistinguishable, on the basis of the distribution of characteristic frequencies (CFs) and synchronization coefficients (p > 0.05; Mann–Whitney U test). The data were combined for analysis and presentation. Any differences in the communication signals of these two species are more likely to be reflected in the physiology of higher-order neurons that extract features from the temporally encoded inputs. A total of 76 neurons from 25 Pollimyrus and 40 neurons from 16 Gnathonemuswere used for this study. These fish were immobilized with an intramuscular injection of Flaxedil (gallamine triethiodide: 0.4 μg/g body weight) and placed on a Plexiglas platform. The fish were respirated with fresh, oxygenated water pumped through the gills. The CNS electric organ discharge motor command was used to monitor the animal's condition during the experiment. The command was recorded with a differential silver wire electrode placed beside the electric organ in the caudal peduncle. Experiments were terminated if the normal continuous volley of command signals (∼10/sec) became abnormal.
After a submucosal injection of the local anesthetic Lidocaine, the skin above the dorsal region of the skull was removed. A brass-retaining rod was cemented on to the anterior portion of the exposed skull and clamped to the platform to immobilize the head. A 3 mm access hole was drilled in the dorsal–caudal region of the skull exposing a small region of the brain surface for microelectrodes. A 5 mm plastic well was cemented to the skull and filled with Fluoroinert (3M FC-77; 3M, St. Paul, MN), an inert liquid that protected the brain by preventing water from entering the brain case. Finally, the platform was centered in a calibrated acoustic tank (Crawford, 1997b) and lowered until the fish's ears were submerged 25 mm below the surface of the water. Sounds were produced by an underwater speaker (University UW 30; University Sound, Buchanan, MI) placed at the bottom of the tank that rested on a Technical Manufacturing Corporation (Peabody, MA) vibration isolation table. This entire apparatus was enclosed in an Industrial Acoustics Corporation (New York, NY) sound-attenuating booth (400 series). The stimulus generation and data collection computer (Gateway 486 microcomputer) and associated hardware were housed outside of the booth.
Signal generation. Tone bursts and click trains were generated by a Gateway 486 processor microcomputer and digital to audio (D/A) hardware from Tucker Davis Technologies. Tones were produced with cosine on/off ramps of 30 msec. The stimulus signal was processed through a D/A converter, a Crown power amplifier, and a programmable attenuator. Acoustic stimuli generated by the speaker were recorded with a hydrophone (Bruel & Kjaer 8103) at the location of the fish, for calibration and monitoring. Sound pressures are reported as dB rms re: 1 μPa for tones and dB peak re: 1 μPa for clicks. Sounds were presented up to 135 dB because this is the upper end of the natural range used by these animals (Crawford et al., 1986).
Single neuron extracellular physiology. Microelectrodes were made from glass pipettes pulled on a Flaming-Brown horizontal glass micropipette puller (P-97). These electrodes were backfilled with 2% neurobiotin (Sigma, St. Louis, MO) in 0.5 m KCl. A silver chloride wire was inserted into the electrode, and the tip was beveled (micropipette beveler, BV-10; Sutter Instrument Co.) until the impedance of the electrode was 15–30 MΩ. The electrode was mounted on a Burleigh Inchworm microdrive (IW-711–01) and micromanipulator. The electrode was manually positioned onto the dorsal surface of the brain in reference to brain surface landmarks. The electrode was then advanced from near the midline, along a track 30° off vertical, to intersect the auditory nerve close to the lateral border of the brain.
The physiological signal was preamplified (gain = 1000) by a battery-powered amplifier (DAM 50; World Precision Instruments, Sarasota, FL) located inside the booth. The amplified recording was filtered (Krohn-Hite filter; Model 3100A) and displayed on the oscilloscope (Tektronix, 2221A). This signal was sent to a hoop discriminator (Tucker Davis Technologies) to isolate and time stamp the action potentials with a resolution of 1 μsec. A custom computer program coordinated the stimulus presentation and data collection.
Auditory neurons were isolated by searching for responses while presenting acoustic stimuli that were composed of a series of tone bursts and click trains. Response areas for these neurons were mapped by recording responses (spikes) to tones for a range of frequencies (100–2500 Hz; 0.08 log10 Hz steps) and at different levels (50–135 dB re: 1 μPa; 3–5 dB steps). The threshold for an excitatory response was defined as the lowest level that evoked a spike rate (SR) at least 2 SDs above the spontaneous spike rate for the neuron (if SR = 0, response criterion ≥ 1 spike per second). The CF was determined as the frequency that had the lowest threshold in the response area; the threshold at the CF is called the best sensitivity (BS). Frequency tuning curves (FTCs) were also plotted by connecting the threshold at each frequency. The degree of tuning (Q10 dB) for each neuron was estimated by dividing the CF by the frequency bandwidth of the FTC at 10 dB above BS. Additionally, the bandwidth (BW125 dB) of the FTC was used to characterize each neuron during suprathreshold stimulation (measured in log units). Iso-level functions were also graphed, in which the effects of frequency were observed at a constant, suprathreshold level (125 dB). The frequency that evoked the maximal firing rate (MR) at 125 dB was called the best frequency (BF). The relationship between intensity and spike rate, the rate-level function (RLF), was obtained by plotting responses to a tone of a particular frequency (e.g., CF) over an increasing series of levels (5 dB steps).
To visualize temporal responses to tones, PSTHs were constructed by presenting 50 tone bursts at a particular frequency, level, and starting phase. The PSTH displays the probability of a spike occurring within a given time bin during the course of a stimulus presentation. To measure how well action potentials synchronized to the tone frequency (i.e., the degree of phase locking), the VS was calculated from a PSTH by converting spike times to unit vectors and computing the average vector length (Goldberg and Brown, 1969). VS values range from 1.0 (perfect synchrony) to 0.0 (no synchrony). The Rayleigh test was used to determine whether the angular distribution of VS was significantly different from random (Batschelet, 1981). Interspike interval histograms (ISIHs) were also examined, and interval dispersion was quantified with the CV (CV = SD/mean interval).
Responses to 400 msec click trains were also analyzed. Click trains were composed of broadband pulses (160–3000 Hz) with constant ICIs that ranged from 6 to 100 msec. The dependence of spike rate on ICIs was examined by plotting the number of spikes evoked as a function of ICIs. PSTHs and ISIHs were also used to characterize temporal responses to click trains. All-order ISIHs, formally equivalent to autocorrelation functions, were constructed to reveal any correspondence of the predominant interval in the spike train and the stimulus ICI (Licklider, 1951; Perkel et al., 1967; Moller, 1970). These were plotted by computing intervals between a given spike and all spikes within the spike train.
Tissue processing. After physiology, each fish was anesthetized with MS-222 and perfused with heparinized physiological saline (PBS) and phosphate-buffered (1.25%) paraformaldehyde/(1.25%) glutaraldehyde fixative. The brain was removed, postfixed, embedded in gelatin, and prepared for frozen sectioning.
Transverse sections (60 μm) were washed in PBS, bathed in 0.5% H2O2, and then incubated overnight in a solution containing avidin-conjugated horseradish peroxidase (0.3% Triton X-100, PBS; Vector Laboratories, Burlingame, CA). Sections were then soaked in phosphate-buffered diaminobenzidine (Sigma) containing H2O2 and 0.04% ammonium sulfate to visualize the neurobiotin–chromagen conjugate (Kawasaki and Guo, 1996; Kozloski and Crawford, 1998). The mounted sections were counterstained with cresyl violet.
Statistical analyses. Principle component analysis (see Table 1), descriptive statistics (see Table 2), nonparametric statistics (Mann–Whitney U and Wilcoxon signed rank tests), and cluster analysis (CA) were calculated with Statistica v 2.0 (StatSoft). The multidimensional data set (nine variables) was explored using principal components analysis (PCA) to examine the contributions of each physiological variable to those PCA factors capturing most of the variance in the data (factors 1–4). A CA was then performed to identify natural clustering (i.e., classes) among the neurons sampled. After standardizing the data, CA was performed using squared Euclidean distances and Ward's method linkage rules. These analyses (PCA and CA) were based on SR, MR, BF, CV, CF, BS, BW125 dB , and RLF slope. Q10 dB was excluded from these analyses because it is so closely related to two of the variables that were included (i.e., BW125 dBand CF). In cases in which multiple comparisons were made with the Mann–Whitney U and Wilcoxon tests, appropriate adjustments of the p values were made using Bonferroni correction (Howell, 1992).
RESULTS
The responses of mormyrid primary auditory afferents were markedly different from those recorded more centrally in this auditory system. The primary afferents produced sustained responses to tones, with spikes strongly entrained to the temporal structure of the acoustic stimulus (Fig. 2). Cycle-by-cycle stimulus following was observed at rates as high as 1.12 kHz in some afferents, with essentially no spike rate adaptation (Fig.2C,D). In contrast, tones elicited highly phasic responses in midbrain neurons, with poorer synchrony (i.e., lower VS and higher CV). Sharp onset responses were often followed by suppression of spike rate below the spontaneous rate and rebound–excitation at stimulus offset. Unlike the afferents, the rate-level functions of midbrain neurons were frequently nonmonotonic (Crawford, 1993, 1997b).
The majority of primary afferents produced a single synchronized spike on almost every stimulus cycle, thus entraining to the stimulus [i.e., 80% had VS >0.9 and 77% had CV <0.5; criteria for entrainment adopted from Joris et al. (1994)]. The median VS was high (0.96 at 125 dB) (Fig. 3A), and the median CV was low (0.28 at 125 dB) (Fig. 3B), corresponding to narrow, unimodal, ISI distributions. It is possible to obtain moderately low CVs without entrainment [e.g., in chopper neurons (Kozloski and Crawford, 2000)], but this was never observed in primary afferents.
The spontaneous activity of afferents ranged widely, from 0 to 400 spikes per second, but most (67%) had some spontaneous activity. Afferents with spontaneous activity (SR >1 spike per second) were ∼10 dB more sensitive (i.e., lower thresholds) than those without (median threshold: 100.0 vs 109.1 dB; U = 852; n = 51,57; p < 0.001).
In addition to entrained afferents, we also observed a smaller number of afferents that produced bursts of spikes on each stimulus cycle. This bursting degraded entrainment by increasing the CV of the ISI distribution and by reducing VS. The bursting afferents also had low-frequency CFs near 200 Hz and relatively symmetrical response areas, compared with the entrained afferents. These observations suggested that there might be two or more physiological classes within our afferent sample.
Characteristics of afferent response clusters
We examined the afferent sample using our physiological variables to determine whether there were distinct physiological classes. Principal components analysis of the multidimensional data set showed that 80% of the variance was captured by the first four factors, derived from nine physiological variables (Table1). We used the loading of our physiological variables to identify those contributing most heavily to these four factors. Among the important variables were the maximum firing rate, characteristic frequency, best excitatory frequency, ISI dispersion, and vector strength. Nevertheless, two-dimensional analyses of these variables failed to reveal distinct classes (Figs.3-5). We pursued this analysis further with a multidimensional cluster analysis.
The CA, based on nine physiological variables (Table 1), indicated that the sample consisted of three major clusters, one corresponding to bursting afferents and two others corresponding to entrained afferents (Fig. 6). The largest cluster (entrained type I: 35%) was composed of afferents that strongly entrained to a wide range of stimulus frequencies (Figs. 2D,7A). A synchronized spike was generated for each stimulus cycle, except at the lowest frequencies (<200 Hz) where synchronized bursts were sometimes produced (Fig.7A2). In an iso-level frequency function, the response (spikes per second) was essentially identical to a plot of the number of stimulus cycles as a function of tone frequency. Thus, these afferents exhibited high-fidelity frequency following (Fig.7A2, dashed line). As frequency was increased, an upper limit for entrainment was reached, the failure frequency, and the response rate dropped precipitously beyond this point (Fig.7A2). Failure frequencies were as high as 890 Hz for type I afferents. Consequently, the frequency-band of entrainment encompassed nearly the entire audibility range, as measured behaviorally (Marvit and Crawford, 2000a). Near threshold, these neurons responded best at lower frequencies, with CFs concentrated at ∼265 Hz (CF dispersion: 25–75% quartile = 217–308 Hz).
The second cluster of entrained afferents (type II: 15.9%) was quite similar to type I with respect to degree of tuning and entrainment. However, this second cluster was distinguished by higher CFs and BFs (median CF, 900 vs 265 Hz; median BF, 900 vs 400 Hz). These type II neurons also had lower spontaneous rates (median SR = 0 vs 5.2 spikes per second) and steeper rate level functions (median slope = 31.1 vs 11.6 spikes per second per decibel). The high BFs in this cluster meant that these afferents entrained at even higher frequencies, some >1.0 kHz (Fig. 2C, Table2).
The third cluster revealed by the CA corresponded to the bursting afferents mentioned above. The afferents in this cluster fired bursts of spikes on every stimulus cycle for stimuli within the response area (Fig. 7B1). The frequency range of stimulus following was also comparatively restricted (Fig. 7B2), and the response areas were more V-shaped than those of other afferents (Fig.8B1). The bursting resulted in broader period histograms, with a second mode in the ISIH corresponding to the intervals between the spikes within each burst (Fig. 9B). The CFs were also restricted to a narrow, low-frequency range (CF dispersion: 25–75% quartile range = 135–220 Hz) similar to that of the entrained type I afferents.
Intensity coding and thresholds
Although all primary afferents had monotonic RLFs, there were clear differences in RLFs between the afferent clusters identified above. The entrained afferents (type I and II) had steep, rapidly saturating, rate-level functions with near-perfect synchrony achieved within ∼10 dB of rate threshold (Fig.8A2,A3). In contrast, the RLFs of bursting afferents were shallow and only began to saturate at the highest levels (Fig. 8B2).
These contrasts in rate-level functions yielded differences in the RLF slopes within the dynamic range (DR) (dB range from 20 to 80% of maximum response) (Fig. 8B2). The bursting afferents had the shallowest slopes (median = 9.8 spikes per second per decibel) as compared with the steeper slopes of the entrained type I (median = 11.6 spikes per second per decibel) and type II afferents (median = 31.1 spikes per second per decibel). The slopes of type II afferents were significantly steeper than both bursting (U = 76; p < 0.003) and the type I afferents (U = 22; p < 0.008).
Synchronization increased as stimulus intensity increased, with entrained (type I) and bursting afferents showing significant synchronization at levels ∼8 dB below the rate-based threshold (Fig.8A3,B3). Because type II neurons usually had little spontaneous activity, synchronization could not be measured below rate threshold.
For the entrained type I neurons, the median difference between synchronization and rate thresholds was 10.5 dB (p < 0.05; Wilcoxon signed rank test). These afferents showed strong synchronization to tones at rate threshold (median VS at rate threshold = 0.75), and synchrony increased modestly as level was increased further (median VS at 125 dB = 0.93). Although the median level at which synchrony became saturated (median = 95.7 dB; 90% of maximal synchrony) was lower than the level at which rate became saturated (median = 103.5 dB; 90% of maximal firing rate), the difference was not significant for entrained type I afferents (p < 0.13; Wilcoxon signed rank test) (Fig. 8A3).
For the bursting afferents, the median difference between synchronization and rate thresholds was 12.0 dB (p = 0.07; Wilcoxon signed rank test). The bursters were more weakly synchronized at rate threshold (median VS at rate threshold = 0.5; median threshold = 97 dB) and improved their synchrony more slowly with increasing level (median VS at 25 dB = 0.8). The bursting afferents reached synchrony saturation before rate saturation (median level at synchrony saturation = 98.8 dB and median level at rate saturation = 119.7 dB;p = 0.07; Wilcoxon signed ranked test) (Fig.8B3).
Maximum spike rates, synchronization, and ISI dispersion
The data on maximum driven rates (MR), BF, ISI dispersion (CV), and synchronization (VS) have been plotted with separate symbols (Figs.4, 5) for each of the clusters identified in the CA (Fig. 6). The ratio of MR to BF was near unity for the entrained afferents because of their one spike per cycle firing behavior (Fig. 4). The MR/BF ratio was closer to 2 for the bursting afferents (Fig. 4A), and their MFs were clustered near 200 Hz (Fig. 4B). Note that the MR/BF ratio for many of the entrained neurons was slightly <1.0 because of the on and off ramps of the tone bursts; the ratio was very close to 1 during the steady-state part of the tone. Similarly, the MR/BF ratio was usually just under 2 for the bursting afferents. It should also be pointed out that there was a very small (12%), but statistically significant, difference in the MR/BF ratio between the two species used in our studies (G. petersii and P. adspersus), with the ratio being smaller for P. adspersus.
The synchronization of bursting afferents to BF tones (125 dB) was significantly less (median VS = 0.89) than the entrained afferents (type I: median VS = 0.97, U = 85, p < 0.001; type II: median VS = 0.95, U = 16, p< 0.003) (Fig. 5A). The ISI dispersion was significantly less among type I entrained afferents (median CV = 0.18) compared with the bursters (median CV = 0.61; U = 85;p < 0.0003). Type II afferents had a median CV of 0.35 and were not significantly different from type I afferents (U = 143; p = 0.17) and bursters (U = 43;p = 0.17) (Fig. 5, Table 2). There was a significant negative correlation between CV and VS for the afferent sample as a whole (r = 0.67; p < 0.05), reflecting the decline in interval dispersion as synchrony increased, and there was extensive overlap between the clusters in the CV versus VS plot (Fig. 5C).
Tone sensitivity and the audiogram
The response areas of the afferent population spanned both the amplitude spectrum of the vocalizations made by Pollimyrusand the audiogram (Fig. 10). Although CFs were widely distributed between 100 Hz and 1.0 kHz (Table 2), half of the CFs were clustered between 200 and 300 Hz, where the fundamental frequency of one of the key components of the courtship display lies (moan F0 = 250 Hz) (Crawford et al., 1997a).
To compare our neurophysiological data with the behavior, we constructed FTCs for the afferents and extracted a neural threshold sensitivity curve from these tuning curves. At each frequency, a number of FTCs overlapped, each contributing one threshold to our estimate of the neural threshold sensitivity for that frequency. The threshold corresponding to the 90th percentile was found for each of the frequencies. Thus, at each frequency 10% of the thresholds fell below this threshold (i.e., were more sensitive), and 90% fell above. These 90th percentile thresholds (decibels) were used to draw the neural threshold curve (Fig. 10). The neural threshold curve showed a sensitivity maximum near the 250 Hz fundamental of the courtship sounds. This curve closely matched the Pollimyrus audiogram measured behaviorally (McCormick and Popper, 1984; Marvit and Crawford, 2000b; Fletcher and Crawford, 2001).
Encoding single clicks and click trains
Clicks were encoded with either a single short latency spike or a burst of spikes. Afferents without spontaneous activity (<1 spike per second) responded with one or two spikes per click (Fig.11A), whereas spontaneously active afferents produced bursts of spikes (Fig.11B,C). The influence of clicks on neurons with the highest spontaneous rates was a temporal reorganization of the spikes rather than an increase in spike rate.
When peristimulus histograms were examined for single click stimulation, we noted that the bursting afferents seemed to ring, producing peaks at regular intervals that corresponded roughly to the period of the BF (Fig. 11C). In contrast, many entrained afferents produced a more chaotic pattern of PSTH peaks (Fig.11B). However, there was considerable variability in these single click responses, and click-response type did not map reliably with the clusters discussed above on the basis of tone responses.
All the afferents provided a faithful temporal representation of clicks trains and thus should also encode well the grunts of the courtship display. Neurons lacking spontaneous activity (silent) typically fired a single spike per click throughout the train (Fig.12A1), and consequently, the interspike intervals fell in a tight distribution around the train period (ICI) (Fig. 12A2). Additionally, the distribution generated by measuring all of the intervals, not just between adjacent spikes but between all peristimulus spikes (all-order interspike intervals), revealed intervals that were multiples of the click train period (Fig.12A3). This all-order interval analysis was particularly useful for examining the temporal structure of the spike trains produced by spontaneously active afferents.
The entrainment of spontaneously active afferents during click trains was not as apparent in raster and PST plots (Fig.12B1) as it was for the silent neurons. Nevertheless, the temporal structure of the spike train was clearly modulated, shifting the ISI distribution from a unimodal spontaneous distribution to a bimodal driven distribution (Fig. 12B2). The emergence of intervals corresponding to the ICI of the stimulus was revealed by the all-order ISI histogram where there were distinct peaks separated by the ICI of the click train (Fig.12B3).
Afferents encoded a broad range of click train periods and were never selective for particular ICIs (range = 10–80 msec). Because the number of clicks per constant duration train increased with ICI, the function relating spike rate to ICI for afferents without spontaneous activity showed a monotonic decrease with increasing ICI for the train (Fig. 12A4, left). The function relating spikes per click to the ICI was flat (Fig. 12A4,right). For spontaneously active neurons, the evoked spike rate remained fairly constant for all ICIs, whereas the spikes per click increased as the ICI increased (Fig.12B4). In contrast, about one-third of midbrain neurons exhibit interval selectivity, showing a highly facilitated response for a narrow range of interclick intervals (Crawford, 1997b).
DISCUSSION
This analysis of primary afferents has provided a view of the initial neural representation of sounds generated by a simple vertebrate ear and delivered to central computational circuits. The morphology of the ear, and the physiology of the midbrain, had led us to suspect that the feature selective responses of the midbrain were computed from an initial temporal representation created at the ear (Crawford, 1997b). The present data show that afferents provide an excellent temporal representation of acoustic stimuli but are not feature selective. The characteristic frequencies of these neurons were distributed within the measured audiogram for the animal, and tuning curves were generally quite broad.
Previous studies of the primary auditory afferents of other fishes, primarily goldfish, have focused on characteristics of the FTC, the position of the CF in particular, as a basis for defining physiological types of afferents (Furukawa and Ishii, 1967; Fay and Ream, 1986). However, we included temporal properties of spike trains in our classification of afferents because they are probably particularly important for central processing in the fish auditory system (Fay, 1978, 1982; Crawford, 1997b; McKibben and Bass, 1999; Bodnar et al., 2001).
Our entrained afferents (especially type II) were most like the goldfish high-frequency follower afferents (S1), and the bursters were most like low-frequency afferents (S2) of Furukawa and Ishii (1967).Fay and Ream (1986) identified four classes of goldfish afferents on the basis of the FTC, three of which were classified as tuned types. These tuned fibers had broad FTCs, with 95% having Q10dB ≤ 1.2; the afferents of goldfish are thus more weakly tuned than those of mormyrids, but like mormyrids do show a preponderance of CFs near 200 Hz (Fay, 1978; Fay and Ream, 1986). On the basis of a comparison of CF distributions [BF in Fay and Ream (1986)], our bursting afferents were comparable to the low-frequency type in goldfish, the entrained type I afferents were comparable to the medium-frequency type in goldfish, and the entrained type II afferents were comparable to the high-frequency type in goldfish. The relationship between CF and RLF slope that we observed here was also similar to that observed in goldfish by Fay and Ream (1986), with RLF slope increasing with higher CFs (Fig. 12). Because goldfish do not make communication sounds, their CFs cannot be easily related to the acoustic behavior.
In midshipman fish, the BFs of auditory afferents matched the relatively narrow vocalization spectrum for this species (McKibben and Bass, 1999). The distribution of BFs for the midshipman (60–300 Hz) was narrower than that of mormyrids or goldfish. Midshipman afferents entrained well to low-frequency tones (<200 Hz) and amplitude modulated signals (amplitude modulation rate <10 Hz) (McKibben and Bass, 2001).
Entrained afferents
Entrainment of afferent spikes in mormyrids appears to be exceptional in two ways. First, the degree of synchrony is very high, usually >0.9, whereas synchrony in other vertebrate auditory systems is usually less (Palmer and Russell, 1986; Hill et al., 1989; Joris et al., 1994; Koppl, 1997). Second, many of these afferents entrain essentially perfectly up to 1.0 kHz, whereas mammalian and avian afferents skip cycles, failing to entrain when stimulus frequencies exceed 300 Hz (Kiang, 1965; Joris et al., 1994). Examples of sustained firing rates of >1000 spikes per second (Fig. 2) are uncommon and comparable to those reported for intralaminar thalamocortical cells (1000 spikes per second) (Steriade et al., 1993) and spinal Renshaw cells (1500 Hz) (Walmsley and Tracey, 1981).
The entrained afferents of mormyrids could represent a specialization for temporal computation. The interval distributions are unimodal, thus lacking the ambiguities of other neurons that have excellent synchrony (high VS) but skip cycles (high CV). If faithfully relayed to the midbrain, this temporal code could be used in the computations producing interval selectivity for click trains (Crawford, 1997b) or in the generation of frequency-selective responses for tonal signals (Licklider, 1951; Simmons et al., 1996).
The ability to entrain at high frequencies (>300 Hz) could reflect specializations at the synapses formed between HCs and afferents. For example, the neurotransmitter pool might be relatively large, thus increasing EPSP size. This could increase the probability that a spike would be generated on each stimulus cycle (Trussell, 1997). Furthermore, a larger pool would reduce the likelihood of neurotransmitter depletion and thus allow the afferent to follow stimuli at higher frequencies, without skipping cycles. Convergence of multiple HCs on a single afferent could also increase the probability that at least one synapse would contribute to spike initiation on every cycle (Furukawa and Ishii, 1967). The primary afferents of fish are branched within the saccular epithelium, providing a morphological basis for convergent input from multiple HCs (Sento and Furukawa, 1987;Kozloski and Crawford, 1998; Edds-Walton et al., 1999; Edds-Walton and Popper, 2000). The bushy cells in the mammalian anterior ventral cochlear nucleus also entrain very precisely, and this have been hypothesized to arise from the convergence of multiple afferents on these cells (Rothman et al., 1993; Joris et al., 1994).
Bursting afferents
Because of their relatively shallow, nonsaturating, rate-level functions and wide dynamic ranges, the bursting afferents seem to be better suited for encoding intensity information than the other afferent types. Bursting afferents have bandwidths (Q10 dB = 1.57; quartile range 25–75% = 1.23–2.34) that are about the same as those of the low CF (i.e., 100–1000 Hz) afferent fibers of other vertebrates [mammals (Kiang, 1965) and turtles (Crawford and Fettiplace, 1980)].
We suspect that the bursting physiology of these afferents, in response to tones near 200 Hz, reflects intrinsic properties of the HCs as in turtles (Art et al., 1986; Eatock et al., 1993; Fettiplace and Fuchs, 1999). Tuning in turtle auditory afferents is a direct result of the electrically tuned HCs onto which the afferents form synapses (Crawford and Fettiplace, 1980). The frequency sensitivity of the isolated turtle HCs (Fettiplace and Crawford, 1978, 1980; Fettiplace and Fuchs, 1999) is similar to that of the tuned mormyrid afferents. The differences between mormyrid afferents (i.e., bursting vs entrained) may correspond to differences in the characteristics of the HCs that they innervate (Popper et al., 1993; Lanford et al., 2000). The different patterns of single click responses, including ringing, chaotic bursts, and single spikes, are consistent with the idea that there may be different HC types associated with the different afferent clusters.
The observation that there are both highly entrained neurons with steep RLFs and bursting neurons with shallow, nonsaturating RLFs suggests that there may be a segregation of time and intensity information that begins in the auditory nerve of these fish. Similar parallel processing of time and intensity has been suggested in other auditory systems but is thought to begin within the brainstem nuclei of birds and mammals (Koppl et al., 2000). However, differentiation of physiological response types in the auditory nerve, suited to processing distinct components of communication sounds, has been suggested previously in frogs (Capranica and Moffat, 1975; Narins and Capranica, 1980; Rose and Brenowitz, 1997).
Transformations of primary afferent encoding
The initial afferent representation of sound is markedly transformed as revealed by comparisons with the physiology of the second order medullary nucleus (Kozloski and Crawford, 2000) and auditory midbrain (Crawford, 1993, 1997b; Kozloski and Crawford, 1998). One of the most striking transformations is the emergence of selectivity for interclick intervals in the midbrain. Afferents produced single spikes, or bursts, that were synchronized to each click, and none of the afferents was interval selective. In contrast, approximately one-third of midbrain neurons exhibit interval selectivity, showing a highly facilitated response for a narrow range of interclick intervals (Crawford, 1997b).
A second transformation was revealed by the emergence of tone frequency selectivity in the midbrain. The response areas of primary afferents were typically very broad. They produced one spike on each cycle of a tone, over a range of periods that extended from ∼10 msec to just <1.0 msec (100 Hz to just >1.0 kHz). This type of response area was not observed in the midbrain. In contrast, midbrain neurons often had narrow-band excitatory response areas and flanking regions of inhibition [see also Lu and Fay (1993)]. The tuning curves were spindle shaped, and neurons were relatively insensitive to tone intensity. These level tolerant neurons resemble some of the neurons in the midbrain of cats, frogs, and bats (Katsuki et al., 1958; Fuzessery and Feng, 1982; Suga, 1995).
All of the responses in the auditory nerve were sustained for the duration of a tone stimulus, but midbrain neurons have either onset or phasic responses, and some have delayed inhibition followed by off responses. Midbrain neurons also show weaker entrainment, attributable to both poor synchrony and cycle skipping. Thus, the precise temporal representation of sound is diminished in the midbrain. Afferent spike rates always increased as a monotonic function of stimulus intensity, whereas nonmonotonic rate-level functions were common among midbrain neurons.
Footnotes
This research was supported by National Institutes of Health Grant R01 DC01252 (J.D.C.), National Institute of Mental Health (NIMH) Grant PBN F31 MH11270 (J.K.), and NIMH Grant 5 F31 MH12510-02 (A.S.). A. P. Cook, L. A. Palmer, V. Richards, J. Saunders, D. Sparks, and P. Sterling provided valuable input during the research, and P. Marvit assisted with programming.
Correspondence should be addressed to Aae Suzuki, Department of Psychology and Neuroscience Graduate Group, University of Pennsylvania, 3815 Walnut Street, Philadelphia, PA 19104. E-mail:suzuki{at}mail.med.upenn.edu.
J. Kozloski's current address: Biometaphorical Computing Group, IBM T. J. Watson Research Center, P.O. Box 218, Route 134, Yorktown Heights, NY 10598.