Auditory experience is critical for vocal learning in songbirds as in humans. Therefore, in a search for neural mechanisms for song learning and recognition, the auditory response properties of neurons in the anterior forebrain (AF) pathway of the songbird brain were investigated. This pathway plays an essential but poorly understood role during the period of song development when auditory feedback is most crucial.
Single-unit recordings demonstrated that both the lateral magnocellular nucleus of the anterior neostriatum (LMAN) and Area X (X) contain auditory neurons in adult male finches. These neurons are strongly selective for both spectral and temporal properties of song; they respond more robustly to the bird’s own song (BOS) than to songs of conspecific individuals, and they respond less well to the BOS if it is played in reverse. In addition, X neurons are more broadly responsive than LMAN neurons, suggesting that responses to song become progressively more refined along this pathway.
Both X and LMAN of young male finches early in the process of song learning (30–45 d old) also contain song-responsive auditory neurons, but these juvenile neurons lack the song and order selectivity present in adult birds. The spectral and temporal selectivity of the adult AF auditory neurons therefore arises during development in neurons that are initially broadly song-responsive. These neurons provide one of the clearest examples of experience-dependent acquisition of complex stimulus selectivity. Moreover, the auditory properties of the AF circuit suggest that one of its functions may be to mediate the auditory learning and feedback so essential to song development.
- auditory response properties
- song learning
- experience-dependent plasticity
- temporal processing
- complex selectivity
- Area X
- zebra finch
- song system
Birdsong is a complex learned vocal behavior, with similarities to human speech. Like speech learning, song acquisition occurs early in a songbird’s life and is critically dependent on auditory experience and feedback (Konishi, 1965; Marler, 1970). Moreover, songbirds possess brain areas specialized for vocal learning and production (Nottebohm et al., 1976). This combination of a learned behavior and a discrete neural substrate involved in its control provides an ideal opportunity to study the neural basis of learning. Furthermore, because song is both spectrally and temporally complex, the birdsong system is well suited for examining how the brain learns to process complex time-varying information.
Song learning consists of two characteristic phases (Fig.1 a). First, during a period of sensory learning, young birds hear and memorize a parent tutor song or “template” (Marler, 1970; Slater et al., 1988). Later, during sensorimotor learning, birds sing and gradually refine their song until it approximates the memorized song template. Young songbirds no longer need to hear the tutor during this period of vocal practice but do remain dependent on hearing to match their vocalizations to the template (Konishi, 1965). The strong dependence of song learning on auditory experience and feedback demonstrates that in the songbird brain, there must exist mechanisms for auditory learning and recognition and for auditory feedback-guided modification of the vocal motor pathway.
One likely location for the neural mechanisms underlying learning is the song system, a set of brain nuclei found only in birds that learn to sing using auditory feedback (Fig. 1b) (Nottebohm et al., 1976;Kroodsma and Konishi, 1991). This system is often divided into two interconnected circuits. The first of these forms the “motor” pathway for song and consists of a chain of nuclei including HVc (the acronym is used here as the proper name, as proposed by Fortune and Margoliash, 1995) and the robust nucleus of the archistriatum (RA) (Nottebohm et al., 1976; McCasland, 1987). The motor pathway must be intact at all ages for song to be produced normally (Nottebohm et al., 1976).
In contrast, a second circuit of brain nuclei is essential for normal song production only during song learning and modification. This pathway indirectly connects HVc to RA via the anterior forebrain (AF) and consists of Area X (X), the medial portion of the dorsolateral nucleus of the thalamus (DLM), and the lateral portion of the magnocellular nucleus of the anterior neostriatum (LMAN) (Okuhata and Saito, 1987; Bottjer at al., 1989) (Fig. 1 b). Unlike disruptions of the motor pathway, interruptions of this AF circuit do not affect stable adult song production (Nottebohm et al., 1976;Morrison and Nottebohm, 1993). Lesions of the AF pathway in animals learning their vocalizations, however, cause highly abnormal song (Bottjer et al., 1984; Nottebohm et al., 1990; Sohrabji et al., 1990;Scharff and Nottebohm, 1991).
One possible function of the AF circuit is to provide the auditory input essential to normal song development. By this hypothesis, this circuit should contain sensory neurons responsive to sounds. In adult songbirds, this pathway does contain auditory neurons, which are selective for the bird’s own song (BOS) (Doupe and Konishi, 1991), like the neurons found in the song sensorimotor nucleus HVc (Margoliash, 1983, 1986; Margoliash and Fortune, 1992; Margoliash et al., 1994; Sutter and Margoliash, 1994; Lewicki and Konishi, 1995;Lewicki and Arthur, 1996; Volman, 1996). The present work examines the auditory selectivity of neurons in two of the AF nuclei, LMAN and X, in anesthetized adult finches. These temporally and spectrally selective neurons are among the most complex sensory neurons known. The same nuclei were then investigated in juvenile birds, early in the process of song learning, at a time when the AF circuit is essential for song acquisition. The results demonstrate that this circuit is auditory in young birds, consistent with a sensory role of this pathway in learning. The properties of the juvenile neurons are dramatically different from those in adult birds; however, they are not song- or order-selective. The complex selectivity of adult AF neurons must therefore emerge during development, in parallel with vocal learning.
MATERIALS AND METHODS
Experiments with adult birds (>100 d old) were conducted primarily with male zebra finches (Taeniopygia guttata) obtained from local breeders or raised in our colony. A small number of recordings in adult X were also obtained from male Bengalese finches (Lonchura striata); these were included in the overall analysis, because a statistical comparison of selectivity indices (SI) (see below) showed no differences between the two species. Juvenile male zebra finches were bred and raised in our colony in sound attenuation chambers (IAC), where they were exposed to a single tutor, the male parent. Juvenile finches ranged in age from 29 to 45 d posthatch (in X, 13 birds ages 29–32 d; 14 ages 33–36 d; 17 ages 37–40 d; 16 ages 41–45 d; in LMAN, 7 birds ages 37–40 d; 10 ages 41–45 d). Young zebra finches normally begin singing at approximately day 25–40 (Immelmann, 1969; Arnold, 1975). Whether any of the juvenile birds studied here were actually already singing was not systematically determined; given the ages of the oldest birds, it was clearly possible.
Electrophysiological recordings and auditory stimuli. Before each experiment, the BOS or the parent tutor song (TUT) was recorded on analog tape and digitized at 20 kHz with 12 bit resolution with the aid of either a PDP-11/40 (Digital, Boston, MA), a Masscomp 5600 (Concurrent, Westford, MA), or a Sparc IPX (Sun Microsystems, Mountain View, CA) computer (with software written by Daniel Margoliash, Larry Proctor, and Michael Lewicki, California Institute of Technology). The song was then stored on computer disk, along with a library of songs of other zebra finch individuals, to be used for playback during the physiology experiments. Songs were also reversed and edited on the computer.
At least 1 d before the experiment, birds were anesthetized with Equithesin (2–4 ml/kg, i.m.; 0.85 gm of chloral hydrate, 0.21 gm of pentobarbital, 0.42 gm of MgSO4, 2.2 ml of 100% ethanol, 8.6 ml of propylene glycol to total volume of 20 ml with H20; chemicals from Sigma, St. Louis, MO) and placed in a stereotaxic head holder. A stainless steel post was then cemented to the skull in a fixed location centered on the midsagittal sinus. This stereotaxic post served to immobilize the head during the recording sessions and to provide a fixed point from which to measure the location of various song nuclei. On the day of the experiment, birds were anesthetized with 20% urethane (Sigma, 65–90 μl, i.m., delivered as 3–4 injections of 10–25 μl each, at 30 min intervals). Unlike barbiturates and ketamine (Vicario and Yohay, 1993), urethane does not appear to affect the stimulus selectivity of high-order neurons. Glass-coated platinum–iridium microelectrodes were used to make stable single-unit extracellular recordings of neuronal responses to a variety of acoustic stimuli played back from the computer. These stimuli were presented by a small calibrated speaker (JBL, Northridge, CA) 1.7 m in front of the bird inside a sound attenuation chamber lined with foam; the spectrum of the sound presentation system, measured with a calibrated microphone placed at the location of the bird, was flat ± 6 dB from 500 Hz to 10 kHz. The sound stimuli, the peak amplitude of which was 70 dB sound pressure level (SPL), included broad-band noise bursts, tone bursts from 500 Hz to 6 kHz, usually presented in 1 kHz increments, the BOS and/or TUT (including in reversed and edited versions), and the songs of other zebra or Bengalese finches [conspecific (CON) songs ]. Songs were approximately matched for overall intensity as well as peak amplitude; in a small number of cases, neurons were tested for sensitivity to sound intensity by varying the intensity of the BOS over a range of 5–10 dB SPL and showed no strong intensity dependence in that range. Estrildid finch song usually contains 4 to 10 syllables, defined as continuous acoustical signals separated from surrounding syllables by a fall in the amplitude to near zero and often by a silent interval; syllables are composed of one or more notes (continuous signal without abrupt frequency transitions). Strings of syllables are delivered in a fixed sequence known as a “motif” or “phrase”; a “strophe” or “bout” of adult finch song consists of a series of introductory notes followed by one or more repeats of the motif (Sossinka and Boehner, 1980). In some experiments, individual syllables or series of syllables were also played to the bird to investigate the basis of a song response. Search stimuli always included the BOS or TUT and, in most cases, a simple stimulus such as a broad-band noise burst. Stimuli were played with an interstimulus interval of 8–10 sec to prevent habituation, and collections consisted of 10–25 trials; in the later experiments (including all the recordings from juvenile birds and 30% of the adult units), a minimum of three different stimuli were delivered interleaved. In these experiments, the interstimulus interval was also randomly varied from 8–10 sec on each trial to prevent any possible entrainment of neural activity.
Neural activity was amplified and filtered from 300 Hz to 10 kHz, and single units were isolated by passage through either a level detector or a window discriminator (World Precision Instruments, Sarasota, FL); spike event times were stored in the computer, and single-unit activity was displayed by the PDP-11/40, Masscomp 5600, or Sparc IPX computers both as a raster pattern and as a summed peristimulus time histogram of 10 to 25 stimulus presentations.
The percentage of neurons in LMAN and X that were responsive to sound stimuli was not systematically quantified, although the majority of neurons that could be isolated appeared to be auditory. The auditory responsiveness of X and especially LMAN neurons was very sensitive to the depth of anesthesia and the state of arousal of the animal, however, so that in some birds (not included here), no auditory responses of any kind could be observed throughout the experiment, and in others, auditory responses would disappear after a period of recording. This was true both when spontaneous rates were unusually high or were very low. This property of AF neurons, along with the difficulty in isolating units, and the long time required to characterize each unit account for the small numbers of single units (1–11) analyzed per bird.
Electrolytic lesions were placed at the sites of selected units. At the end of an experiment, animals were given a lethal dose of Equithesin and fixed with 4% formaldehyde in 0.025 m PBS via intracardial perfusion. Electrode tracks and electrolytic lesions were located on 30 μm frozen sections stained with cresyl violet. The borders of X and of magnocellular LMAN (the core) were clearly identifiable in these sections. Neuronal data were only included in the data analysis if the recording site could be unambiguously localized histologically.
Data analysis. Data were analyzed off-line using software written by Michael Lewicki and Larry Proctor, California Institute of Technology, and by Frederic Theunissen and Jim Wright, UCSF. The firing rate to a given song stimulus was quantified from recorded data as the average spike rate during the song (spikes/sec) during a window that was of equal duration to the stimulus but that was delayed relative to stimulus onset by an amount equal to the unit’s response latency. The minimum response latency (accurate at best to within 5 msec) was determined by visual inspection of the spike rasters and histograms when a response to a simple stimulus made it possible. If no latency could be determined because the unit did not respond to any simple stimuli, the latency of other LMAN or X units from the same bird was used. If no units in a bird allowed latency determinations, default latencies characteristic of that nucleus were used (in adult birds, 35 msec for X and 50 msec for LMAN; in juvenile birds, 70 msec for X and 100 msec for LMAN). Neural responses to song generally began long after song onset and ended before song offset, so that the specific response latency chosen had little effect on the calculated firing rate. Units were only considered auditory and included in the data analysis if a paired t test showed that the spike rate during a stimulus was significantly different (p < 0.05) from the baseline firing rate, which was determined from 2–4 sec of spontaneous firing before and after each stimulus. The firing during the time period (1 sec) immediately after the stimulus was always excluded from the calculation of the spontaneous rate, because many stimuli elicited significant poststimulus inhibition of the baseline firing.
To compare the evoked responses with different stimuli, a unit’s response strength (RS) to a song stimulus was calculated as the average spike rate during the song minus the average baseline firing rate for the same trials (both measures as described above). This gives a measure of the total amount of firing (above spontaneous rate) elicited during a song. Although this measure is less sensitive for neurons that have very brief phasic responses, LMAN and X neuronal responses tend to be sustained over several syllables and, therefore, this measure accurately reflects the relative strength of neuronal responses to different song stimuli. It is an underestimate of the peak firing rate of these neurons, however, because neuronal firing, although sustained, does not occur throughout the entire song, but the number of spikes is still normalized to the entire song. All trials in which a particular stimulus type was presented to a unit were used to calculate the mean RS for that unit. For CON songs, the RS values to different individual songs were first calculated and used to determine the number of neurons significantly responsive to any conspecific stimulus (using a pairedt test, p < 0.05) (Table 1); song. data from responses of a single unit to all CON were then used to calculate the mean RS to CON for that unit. The mean RS for a particular stimulus for all neurons in a nucleus was calculated as the mean of the mean RS to that stimulus of all units analyzed. Mean RS values for different stimuli were compared using a one-way ANOVA and Scheffe tests for correction for multiple post hoccomparisons. The significance of the difference in RS to different stimulus classes (i.e., the selectivity) was assessed for each unit with an unpaired t test (p < 0.05).
Another measure used to quantify the responsiveness of neurons for one stimulus relative to another was the SI. The SI of a neuron for stimulus A relative to stimulus B was defined asSI (AvsB) = mean RS A/(meanRS A + mean RS B). SI will approach 1.0 if stimulus A is much preferred and zero if stimulus B is preferred, and will be close to 0.5 if the responses to both stimuli are similar. Because this index becomes highly nonlinear when RS values are negative (i.e., inhibitory), all negative RS values were set to zero for the calculation of SI, which then has a minimum of zero and a maximum of 1.0. This adjustment ignores the selectivity difference between stimuli that evoke no response and those that actually inhibit neural firing; the numbers of neurons showing significant response inhibition to any stimulus (p < 0.05 compared with spontaneous rate) are therefore shown separately in Table 1. All statistics were calculated with the aid of the software package StatView4.5 (Abacus Concepts, Calabasas, CA).
Selective auditory neurons in adult LMAN
I recorded a total of 64 single auditory units in the LMAN of adult male finches (n = 18). Small clusters of units as well as multiunit recordings showed qualitatively similar auditory responses but were not included in the quantification. The most striking feature of LMAN auditory neurons was their complex stimulus selectivity. These cells were much more responsive to complex acoustic stimuli than to simple ones such as tone or broad-band noise bursts. In particular, each BOS was a very effective stimulus for these neurons; 62/64 LMAN neurons responded significantly to the BOS (see Materials and Methods for definition of significant response). Moreover, 52/53 neurons responded more strongly to the BOS than to songs of conspecifics (other individuals of the same species, CON).
Representative examples of such song-selective units are shown in Figures 2 and 3. The neuron in Figure 2demonstrates the strong but irregular bursting that is characteristic of responses to the BOS (Fig. 2 a). The same neuron responded much more weakly to a conspecific song, despite the general acoustic similarity of the two finch songs (Fig. 2 b), and was actually inhibited below its spontaneous rate by another conspecific song (Fig. 2 c). The response of the neuron in Figure 3 was sustained over several syllables of the BOS, a property typical of many adult LMAN neurons (Fig. 3 a). Thus, although this neuron also showed a phasic response to some features of a conspecific song (Fig. 3 c), its overall response to the BOS was greater than that to any of the other stimuli.
To quantify the responses of neurons to complex song stimuli, I calculated their RS values (stimulus-evoked rate − spontaneous firing; see Materials and Methods) for different stimuli. The mean RS for the whole population of LMAN neurons studied was much higher for the BOS than for CON [3.53 spikes/sec ± 0.38 (SEM) vs 0.13 ± 0.12; n = 64 for the BOS, n = 53 for CON; first two bars in Fig. 4 a]. A one-way ANOVA revealed a significant difference in RS values among all stimuli tested (F (3,167) = 39.88,p < 0.0001; stimuli were the BOS, the CON, and the two types of reversed song, see below); a subsequent Scheffe test (to correct for multiple comparisons) confirmed that the RS to the BOS was significantly greater than the RS to CON (p < 0.0001). This song selectivity is a feature of individual neurons and not simply a property emerging from the responses of the entire population of neurons considered as a whole; a plot of RS to the BOS versus RS to CON for all single neurons for which both stimuli were tested shows that 52/53 neurons lie to the right of the line that indicates equal response to both stimuli (Fig. 4 b,open squares); 46/50 individual units analyzed also had a significantly greater response to the BOS than to CON (p < 0.05, unpaired t test). The data in Figure 4 b also show that song selectivity is neither found only in neurons with high (or low) RS nor a property found only in a subset of birds. More than 20% of LMAN neurons not only did not respond strongly to CON but were also significantly inhibited by at least one particular conspecific song (Table 1).
To quantify the relative selectivity of a cell for two different stimuli, an SI (SI (BOSvsCON)) was also calculated for each neuron analyzed above (see Materials and Methods); this index is 1.0 when the BOS is strongly preferred and 0.5 when responses to the BOS and CON are equal. Table 1 shows that the meanSI (BOSvsCON) for all LMAN neurons is close to 1.0.
In many cases, because of the complexity of zebra finch song, it is not straightforward to assess exactly which features of song account for the selective responses seen. Occasionally, however, as in Figure 3,a–d, some indication of this is given by a neuron’s response to simpler acoustic stimuli. Figure 3 dshows the phasic response of this neuron to a tone burst of 5 kHz. Close examination of the conspecific song of Figure 3 creveals that the two phasic responses occurred after a note in which most of the energy was in the same 5 kHz frequency range (circled in each of the two motifs). The BOS also contains a single note with energy in that range (circled in Fig.3 a), and part of the BOS response occurred at the same time after that note as for the conspecific and tone stimuli. It is clear, however, that the neuron was responding to a number of features of the BOS, including some that occur before that tonal note, overall giving a more sustained response to the BOS than to any of the other stimuli.
Unlike song selectivity, however, responsiveness to simple acoustic stimuli was not a universal feature of LMAN neurons. Of LMAN neurons fully tested for sensitivity to frequencies in the 1–5 kHz range (n = 27), only two-thirds responded to a tone burst of at least one frequency (Table 1), and no units responded to all frequencies. Although zebra finch songs are most notable for their many harmonic stacks (combinations of harmonically related frequencies) and complex noisy syllables, some songs also have syllables containing predominantly one frequency (see for instance the songs in Figs. 3,5). A pure tone response in LMAN might be explained by the presence of such syllables in the BOS; in fact, in 89% of the cases in which fully characterized neurons were found to respond to tone bursts, the BOS contained a tonal syllable in the same frequency range, and in half of those cases, the neurons responded to tone bursts only in that frequency range. Despite the noisy quality of many zebra finch syllables, broad-band noise bursts were very rarely effective stimuli for adult LMAN auditory neurons (Table 1).
Temporal order is an important feature of many acoustic stimuli, and birdsong, like speech, has particularly complex temporal structure. Therefore, I tested the sensitivity of LMAN neurons to temporal features of song. I did this in several ways, as follows. (1) I played the BOS entirely reversed; this completely alters the temporal structure of the song (both the sequence of syllables as well as the temporal order within syllables) while maintaining the overall power spectrum calculated over the whole song. (2) In some cases, I reversed the order of the syllables while preserving the normal temporal order within each syllable (e.g., dcba vs abcd); this reversed order (RO) song disrupts the global order or sequence of syllables while preserving the local order. In 41/43 single units tested, reversing the song (RO) dramatically reduced the effectiveness of the BOS as an acoustic stimulus for LMAN neurons.
A typical example of order selectivity is shown in Figure 5. This neuron responded well to the BOS, with clusters of spikes before and especially after the circled syllables, which contain a loud note with much energy near 5.2 kHz (Fig. 5 a). In contrast, the identical song played in reverse (REV) elicited no response from this neuron and even slightly inhibited the neuron’s spontaneous firing rate (Fig. 5 b). Consistent with the frequency content of this bird’s song, this neuron also responded phasically to a 5 kHz tone burst (Fig. 5 c). Nonetheless, the neuron showed no response to the 5.2 kHz frequency in REV, although the syllable containing that frequency is little changed by the reversal. This demonstrates clearly the temporal context dependence of these LMAN neurons. Although part of the neuron’s response to song may be attributable to the 5.2 kHz tone, the response to the tone depends not simply on its presence but also on the temporal context in which it occurs; preceded by the wrong features in the reversed song, the neuronal response is inhibited. Another aspect of the context dependence of this neuron is shown in Figure 5 c,d; although the neuron responded robustly to the 5 kHz tone burst alone (Fig. 5c), the addition of a 2.5 kHz tone to the 5 kHz tone burst completely eliminated the response (Fig. 5 d). This neuron thus requires both a specific set of frequencies and a specific order.
The mean RS for the whole population of LMAN neurons studied was much higher for the BOS than for REV (3.53 spikes/sec ± 0.38 vs −0.26 ± 0.16; n = 64 for the BOS,n = 43 for REV; p < 0.0001, Scheffe test for multiple comparisons) (Fig. 4 a). As was true for song selectivity, order selectivity was neither simply a property of the population of neurons as a whole nor of individual birds. A plot of the RS to the BOS versus RS to REV for all single units tested for both stimuli shows that 41/43 neurons lie to the right of the line that indicates equal response to both stimuli (Fig. 4 b,solid diamonds); 35/42 individual units analyzed also met a statistical criterion for greater response to the BOS than to REV (p < 0.05). In addition, many neurons were significantly inhibited by presentation of REV (Table 1).
To assess the relative importance of local temporal structure (within note timing) versus global temporal structure (note sequence), I tested some LMAN neurons with reverse order (RO) song (n = 12). This manipulation markedly reduced the responses of LMAN neurons to the BOS, although, in many cases, not as completely as the full reversal of song. This is seen in Figure 6,a–d. This LMAN neuron responded well throughout much of the first repetition of the motif of the BOS (Fig.6 a). Fully reversing the song completely eliminated the response (Fig. 6 b), whereas reversing the order of syllables left a phasic response in each motif (Fig. 6 c). Examination of this phasic response reveals that it occurred with ∼60 msec latency after each onset of the circled syllable “a,” which contains a harmonic stack with downsweeping frequencies. This syllable also elicited a response when played by itself (Fig. 6 d,left), consistent with the idea that the response to this portion of the song was less context-dependent than the rest of the neuron’s responses to the BOS. Songs in reverse order often revealed which syllables of song elicited context-independent responses. When syllable “a” was reversed (which changes the direction of the frequency-modulated sweep), it elicited no significant response (Fig.6 d, right). Some of the response to the BOS corresponded to the location of the circled syllable “a,” but much of it was before or after (Fig. 6 a), showing (as in Fig. 5) that LMAN neurons respond to a number of features of song, which occur over hundreds of milliseconds of time and in a particular order.
As expected from the RS of individual neurons, the mean RS to RO for all LMAN neurons tested was intermediate between the BOS and REV (1.06 spikes/sec ± 0.32 for RO vs 3.53 ± 0.384 for the BOS and −0.26 ± 0.16 for REV; RO < BOS, p < 0.005, Scheffe test; Fig. 4 a). The partial response to features of reverse order songs is consistent with the idea that both local note structure and note sequence contribute to the temporal selectivity of adult LMAN neurons.
The properties of the single units in LMAN strongly suggested that they respond to combinations of features of the song. Combination sensitivity is a nonlinear response property of neurons in which the response to a combination of features (e.g., syllables abc) is greater than the simple sum of the responses to each of those features presented alone (a + b + c). Although this was not the main focus of these experiments, in a number of cases (n = 8 units in 4 different birds), I demonstrated combination sensitivity in LMAN neurons by measuring responses to subsets of syllables of the song as well as the response to the whole song. An example of this is shown in Figure 7. This single LMAN unit responded robustly to the BOS, with the bulk of the response centered over the middle portion of the song (Fig. 7 a, syllables e–h). Presentation of the first four syllables alone (Fig. 7 c, syllables a–d) elicited little response from the neuron (2.4 spikes/sec); the following two syllables in isolation (Fig. 7 d, syllables e–f) also elicited only a weak response (3.2 spikes/sec). In combination, however, these two stimuli (syllables a–f) elicited a strong response from the neuron (Fig. 7 e; 17.0 spikes/sec) that not only exceeded the sum of responses to stimuli a–d and e–f, but was as strong as the response to the whole song (Fig.7 a; 12.0 spikes/sec; the RS to the whole song is less than the RS to a–f, because a very similar overall response is normalized to a longer total stimulus duration). The RS to each of the syllable combinations presented are shown in Figure 7 b; thedashed white lines represent the linear sum of the RS to the component syllables. Thus, this unit showed a strongly nonlinear combination sensitivity. Syllables a–f do not form the only combination to which the neuron responded nonlinearly, however. A later portion of the same motif (syllables g–i) elicited a small response primarily during syllable i (Fig. 7 f). Presentation of syllables e–f (ineffective alone, Fig. 7 d) in combination with syllables g–j, however, elicited a much enhanced response, primarily during syllables g–h (Fig. 7 g). Thus, there is a nonlinear response not only to syllables e–f in combination with a–d, but also to the syllables e–f combined with syllables g–i. This dissection of song reveals that there are multiple combinations of sounds contributing to the overall song selectivity of these very complex neurons.
The particular combinations of syllables presented to a neuron also influence the temporal pattern of the response. For instance, the combination g–i (Fig. 7 f) elicited a weak response with a latency of ∼300 msec after the onset of syllable g; inclusion of the preceding syllables e–f as well, however, in the combination e–i, revealed a response with a latency of 60 msec after the onset of syllable g (Fig. 7 g). Thus, a new response, with a shorter latency, emerged when e–f was added to g–i, and the subsequent response during syllable i was diminished. Another example of this shift in response is seen if one compares the response to syllables g–h within the whole song with the response to these syllables within the shorter combination e–j; the response to g–h in the combination e–j alone (Fig. 7 g) was much greater than the response to g–h when preceded by the entire a–f sequence, which elicited a burst of firing at syllables e–f (Fig. 7 a). Similarly, in Figure6, the response elicited by syllable a in isolation or in the reverse order song (Fig. 6 d,c) is greater than the response elicited by a in the forward song, when this sound is preceded by other syllables that also elicit responses (Fig. 6 a). The tendency for LMAN neurons both to respond nonlinearly to multiple combinations of features in the BOS and to have decreased responsiveness after a burst of firing explains why the temporal pattern of response to a series of syllables is not always predictable from the responses to subsets of these syllables.
General properties of LMAN neurons
In addition to their prominent selectivity, adult LMAN neurons had several other characteristic features. (1) The mean spontaneous firing rates of these cells were very low (2–3 spikes/sec on average, but ranging from 0.2 to 10.0 spikes/sec; Table 1). (2) LMAN neurons showed irregular bursting to auditory stimuli (e.g., Figs. 2 a,5 c). As seen in the spike raster in Figure 2 a, the unit’s firing, although clearly responsive to the BOS or parts thereof, was not tightly stimulus-locked and did not occur reproducibly on every trial. Similarly, the raster plot in Figure 5 cshows the usual irregular responses of LMAN neurons, and the scattered response latency (of ∼55 msec) to this tone burst. (3) LMAN neurons tended to show poststimulus inhibition of the spontaneous rate after an effective stimulus (e.g., Figs. 2 a, 3 a,7 a). (4) The second repetition of a finch song phrase (motif) did not necessarily elicit as robust a response as the first appearance of the same syllables, suggesting habituation of the neuronal response (Figs. 2 a, 6 a). This tendency to habituate was evident with overall song repetition as well; stimuli were always presented at 8–10 sec intervals, because interstimulus intervals much shorter than that led to a decreased overall response.
A striking feature of LMAN was that auditory neuronal responses to the BOS were similar throughout the nucleus within each bird. A comparison of neurons from the individual birds revealed that the major response to the BOS tended to occur during the same syllables for each neuron (examples are shown in Fig. 8 a,b). Moreover, if a particular conspecific song or tone burst elicited a phasic response, it tended to do so for many of the auditory neurons encountered in that LMAN. For instance, if a single tone burst elicited a response from an LMAN neuron, the same tone burst was effective for 87.5% of all neurons tested within the nucleus (data from 20 tone-responsive neurons from 4 birds in which at least 4 neurons were tested for 4 or more frequencies from 1 to 5 kHz). The tendency for all LMAN neurons recorded in a single bird to respond to similar features of the BOS was particularly evident when combination sensitivity was analyzed. For instance, four other units in the LMAN from which the unit in Figure 7 was recorded showed essentially the same requirements for syllable combinations (Fig. 8 c). These five units spanned the entire dorso-ventral extent of anterior LMAN. Thus, although it was not systematically mapped, there was also no strong correlation of response type with location within the nucleus.
Song-selective neurons in adult X
Song selectivity and order selectivity
The song selectivity of adult LMAN neurons could emerge in LMAN or might simply be a reflection of song selectivity present in neurons earlier in the auditory pathway (for instance, the song-selective neurons known to exist in HVc) (McCasland and Konishi, 1981;Margoliash, 1983, 1986; Margoliash and Fortune, 1992; Lewicki and Konishi, 1995; Lewicki and Arthur, 1996; Volman, 1996). To address this question, I investigated the auditory response properties of the forebrain song nucleus X, which provides the major input to LMAN via the intervening thalamic nucleus DLM. Like the units in LMAN, the majority of X neurons responded strongly to the BOS and exhibited song and order selectivity (Table 1, Figs. 9, 10). This was evident from recordings of 37 single units in X of adult finches (from 11 birds, 1–8 units/bird), as well as from numerous small clusters of units, which had similar properties but were not included in the quantitative analysis.
Examples of two typical X units are shown in Figures 9 and10. X units have higher baseline firing rates than LMAN neurons (Table 1). Figure 9 a–d shows a single X unit with a spontaneous rate of 6.9 ± 0.30 spikes/sec. Like LMAN units, this unit responded best to the BOS (Fig. 9 a) and much less to REV (Fig. 9 b) or CON (Fig.9 c,d). Figure 10 shows another X unit, with a very high background firing rate (46.33 ± 0.58 spikes/sec). This high firing rate was not associated with a loss of selectivity; the neuron responded better to the BOS (Fig. 10 a) than to the same song either in reverse order (Fig. 10 b) or completely reversed (Fig. 10 c). Similarly, the unit showed very little response to conspecific song (Fig. 10 d).
As in LMAN, these song-selective properties were present for virtually all of the auditory X neurons that I recorded; 35/37 single X units responded significantly to the BOS, and a one-way ANOVA revealed significant differences between the mean RS of X neurons to the song stimuli tested (F (3,96) = 7.267,p < 0.0003). Subsequent Scheffe tests confirmed that the RS to the BOS was significantly greater than the RS to REV or CON (Fig. 11 a) (6.49 spikes/sec ± 0.91 for BOS vs 1.94 ± 0.49 for REV, p < 0.002, and vs 2.80 ± 0.53 for CON, p < 0.007;n = 38 for BOS, n = 22 for REV,n = 28 for CON). The mean RS of X neurons to the BOS was also greater than their RS to RO (6.49 ± 0.91 to BOS vs 4.77 spikes/sec ± 1.03 to RO; n = 12 for RO) (Fig.11 a), but this difference did not reach statistical significance (although this lack of significance may be attributable to the much smaller number of units tested with RO). The song and order selectivity of X could be measured at the level of single cells and was not a result simply emerging from the properties of the entire population of X neurons considered in concert. A comparison of the RS of individual X neurons (Fig. 11 b) shows that a majority of X neurons were song- and order-selective (i.e., lie below the line that indicates equal response to forward and reversed or conspecific stimuli); 20/28 individual neurons responded significantly more (p < 0.05) to BOS than to CON and 12/21 more to the BOS than to REV. Like the selectivity of LMAN neurons, the selectivity of X neurons was neither a property found only in neurons with high (or low) mean response rates nor a characteristic of neurons in only a subset of birds.
There were no striking difference among the response properties of different X neurons within the same bird, suggesting a uniform population of neurons without clear topographic selectivity differences, as was true in LMAN. Moreover, although X neurons were less selective, the pattern of their responses to stimuli was similar to that of LMAN neurons in the same bird. For instance, if a particular conspecific song elicited a phasic response in LMAN, the same song tended to elicit a response in X.
Differences between adult X and LMAN
Although both X and LMAN neurons were song- and order-selective, there were several differences between them. (1) X neurons were more responsive to simple acoustic stimuli than were LMAN neurons (Table 1); 100% of X units tested for frequencies in the 1–5 kHz range showed onset responses to at least one tone burst (and 63% of these responded to frequencies covering a 3 kHz range). Only 66% of LMAN neurons tested for the same frequency range responded to at least one tone burst. Similarly, >40% of X neurons tested showed responses (usually onset) to broad-band noise bursts, versus 2% of LMAN units. Both of these differences between the two nuclei were significant (p < 0.006, χ2 = 7.87, df = 1 for tone bursts, p < 0.0001, χ2 = 29.62, df = 1 for noise bursts). (2) In X, the least effective stimuli still tended to evoke weak responses (e.g., Fig.9 b,c) or no response (Figs. 9 d,10 d) from neurons, in contrast to the inhibition of LMAN neurons frequently seen with non-BOS stimuli. Quantitative comparisons between these properties of X and LMAN neurons are shown in Table 1; a greater percentage of X neurons than of LMAN neurons was excited by reversed and especially conspecific stimuli, and a smaller percentage of X neurons was inhibited by these stimuli (p< 0.001, χ2 = 18.20, df = 1 for MAN vs X excitatory responses to CON; p < 0.0003, χ2 = 14.99, df = 1 for MAN vs X excitatory responses to REV;p < 0.05, χ2 = 4.08, df = 1 for MAN vs X inhibitory responses to CON). The greater response of X neurons to nonpreferred stimuli and their lack of inhibitory responses were also evident in the scatterplots of paired RS data from individual neurons (Fig. 11 b vs Fig. 4 b); many fewer X neurons had RS values below zero. (3) Finally, LMAN was more selective than X as measured by SI (Table 1). A two-way ANOVA with SI as one factor and nucleus (X and LMAN) as the second factor revealed both a significant selectivity effect (F (2,164) = 6.92;p < 0.002) and a significant nucleus effect (F (1,164) = 33.92; p < 0.0001), but no significant interaction. The significant nucleus effect and the lack of significant interaction indicate that the SI values in X were significantly lower than those in LMAN for all song stimuli. Because X neurons had higher spontaneous rates than LMAN neurons, an apparent difference in selectivity could result simply from differences in spontaneous firing rate. For example, neurons with lower thresholds (i.e., higher spontaneous rates) might appear less selective, and then selectivity would apparently emerge as spontaneous rates dropped and thresholds increased. However, very few X neurons showed inhibitory responses to songs, whereas a significant fraction of LMAN neurons did. The emergence of inhibition in LMAN cannot be attributable solely to higher firing thresholds in LMAN and thus is likely to reflect additional (inhibitory) circuitry in or between these two nuclei. In addition, a regression analysis showed that there was no correlation between spontaneous rate and SI values within either nucleus (R 2 = 0.005–0.028 for X, 0.087–0.189 for LMAN; in LMAN, these values reflect a slight trend for lower spontaneous rate neurons to be less rather than more selective). Both the presence of inhibition and the lack of correlation with spontaneous rate suggest that the increased selectivity in LMAN is not simply a function of firing rate and therefore of higher thresholds.
In seven birds, I recorded units from both X and LMAN (n = 19 and n = 26 for X and LMAN units, respectively), allowing a direct comparison of the response properties of these neurons. A comparison of SI values for X and LMAN units within birds confirmed that these were always lower for X than for LMAN neurons (p < 0.04 for CON, p < 0.03 for REV, paired t tests of mean SI values/nucleus for each bird). An example of two units from the same bird is shown in Figure 12. This demonstrates many of the differences between these neurons suggested by the analysis across birds; both types of neurons respond well to the BOS, but X neurons tend to fire in a more sustained and reproducible manner from trial to trial and are more likely to fire to a second repetition of a motif than LMAN neurons (Fig. 12 a,b). These units also demonstrate the frequent difference between X and LMAN responses to simple stimuli. The X unit responded both to a broad-band noise burst (with an onset and somewhat maintained response) (Fig. 12 e) and to a 3 kHz tone burst (with a short latency onset and an off response) (Fig.12 f). In contrast, the LMAN unit from the same bird was not excited by either of these stimuli (Fig.12 c,d). These cells, like the data from the nuclei considered as a whole, indicate that much song and order selectivity is already present at the first stage of the AF pathway, but that X neurons are in general more broadly responsive than LMAN neurons.
Auditory units in juvenile LMAN and X
Auditory units selective for song in a pathway required for song learning might play a role in the auditory feedback crucial to song learning. To assess this possibility, it is essential to examine these neurons in young birds to see whether they have auditory properties early in learning and if so, to determine whether and how these properties differ from those of adult neurons. In particular, to determine whether AF neurons are shaped by sensory experience of the tutor (and thus could represent the template), it would be ideal to examine the AF nuclei after song memorization but before sensorimotor learning. This cannot be done in a straightforward manner in zebra finches, however, because the rapid development (by 90–100 d) (Immelmann, 1969) is accompanied by an overlap between the phases of learning (Fig. 1 a). Thus, there is no normal stage in finches with a clear separation between sensory and sensorimotor learning. Therefore, I chose to record from young birds partway through the process of sensory learning and just beginning sensorimotor learning; that is, zebra finches 30–45 d old. Each of these birds was raised in a soundproof chamber containing only one adult male (the father), so that its TUT song experience was known. I recorded from the LMAN and X of these birds in the same manner as for adults, with the sole difference being that during the experiment, the birds were presented with TUT instead of the BOS, because they had not yet developed their own song.
In both juvenile LMAN and X, many units were auditory and responded well to song, consistent with an auditory role of these neurons during learning. Their selectivity differed markedly, however, from that seen in adult birds: they showed neither song nor order selectivity. This result was evident from 17 single units in juvenile LMAN and 61 single units in juvenile X, from a total of 14 birds (2–7 units/bird). As in adult birds, small clusters of units showed properties similar to those of the single units in both nuclei but were not used for data analysis.
A typical juvenile LMAN unit is shown in Figure13 a–d and demonstrates that juvenile neurons were indeed responsive to auditory stimuli. The rasters of individual spikes in Figure 13 demonstrate the variable bursty firing of juvenile LMAN neurons, which was even more irregular and significantly lower in rate than that of adult LMAN neurons (Table1). The tutor song (Fig. 13 a), with which this bird had been in contact for ∼1 month, elicited a significant response from this neuron (Fig 13a). In sharp contrast to neurons in adult birds, however, this neuron was not song- or order-selective; CON song and reversed tutor song elicited responses as strong as those to the TUT (Fig. 13 b,c).
Like juvenile LMAN neurons, juvenile X neurons were also auditory but nonselective. A representative example is shown in Figure14 and demonstrates the response to TUT both forward and in reverse and to CON. As in adult birds, X neurons had higher baseline firing rates than LMAN neurons (Table 1); furthermore, juvenile X neurons had significantly higher firing rates than adult X neurons (Table 1) (p < 0.01,t 97 = 2.639, unpaired t test).
The data for the entire set of juvenile X and LMAN neurons is shown in Figure 15. A one-way ANOVA showed that there was no significant difference between the mean RS for all song stimuli played to the juvenile units within each of the nuclei (Fig.15 a,d) (F (3,182) = 0.495,p > 0.68 for juvenile X;F (3,52) =1.125, p > 0.34 for juvenile LMAN; n = 61 in X and 17 in LMAN for BOS,n = 52 in X and 15 in LMAN for REV, n = 47 in X and 14 in LMAN for CON, n = 26 in X and 10 in LMAN for RO). As in the adult birds, this was evident at the level of single neurons. A plot of the RS of individual neurons to different stimuli shows that they cluster around the line that represents equal responses to the two stimuli being compared (Fig.15 b,e); 38/52 juvenile X neurons and 11/12 juvenile LMAN neurons showed no significant difference (p > 0.05) in response to TUT versus REV, and 36/47 juvenile X and 11/11 juvenile LMAN neurons showed no significant difference in response to TUT versus CON. Similarly, all of the SI values calculated for the same neurons also cluster around 0.5 (Table1). A plot of the distribution of the individual neuron SI values for each nucleus in juvenile birds demonstrates further that the majority of single neurons in juvenile birds are not selective (Fig.15 c,f) and illustrates how different the selectivity of these neurons is from that in adult birds. Analysis of the response selectivity with respect to the range of ages of birds examined (29–45 d posthatch) showed no correlation with the age of the juvenile birds (for X, R 2 = 0.00006–0.029,p > 0.22–0.96; for LMAN, R 2 = 0.003–0.107, p > 0.23–0.88).
Simple acoustic stimuli were also much more effective at driving juvenile LMAN and X neurons than was true in adult birds. More than 40% of juvenile LMAN neurons and 95% of juvenile X neurons responded to broad-band noise bursts, and these responses had long latencies and were more sustained than those in adult AF nuclei (Fig. 13 d, Table 1) (both nuclei in juveniles are significantly different from adult, p < 0.0001, χ2 = 25.14, df = 1 for LMAN, p < 0.001, χ2 = 22.05, df = 1 for X). The tone burst responses from juvenile LMAN and X neurons were also different from those in adults; these responses had much longer latencies (100–200 msec for LMAN and 50–70 for X) and were more sustained than the tone onset or on–off responses seen in adults (compare Fig. 14 d with Figs. 3 d,5 c, 12 f). Unlike adult birds, none of the SI values were different between juvenile LMAN and X (Table 1); however, as in adult birds, significantly more neurons in juvenile X responded to noise and tone bursts than did neurons in juvenile LMAN (p < 0.009,χ2 = 6.95, df = 1 for tone bursts; p < 0.0001, χ2 = 16.60, df = 1 for noise bursts).
This study demonstrates that in adult finches, nuclei necessary for song learning contain auditory neurons that are highly selective for the BOS and are sensitive to its temporal structure. Neurons with these properties are well suited for recognizing the spectrally and temporally rich information in complex vocalizations such as birdsong. The same nuclei also contain auditory neurons in young finches, early in the process of learning song. The auditory neurons in juvenile birds are strikingly different from those in adult birds; however, they show no song or order selectivity, demonstrating that the selective properties of these neurons must emerge during development, in parallel with vocal learning. These neurons represent one of the clearest examples of experience-dependent shaping of neuronal selectivity to complex stimuli. Moreover, the sensory properties of these AF nuclei in both young and adult birds are consistent with an auditory role of this circuit in the song-learning process.
Selectivity of adult LMAN neurons
Adult LMAN neurons are song-selective, responding better to the BOS than to any other stimuli, including songs of conspecifics. They are very similar to the song-selective neurons described in the sensorimotor song nucleus HVc (Margoliash, 1983, 1986; Margoliash and Fortune, 1992; Margoliash et al., 1994; Sutter and Margoliash, 1994;Lewicki and Konishi, 1995; Lewicki and Arthur, 1996; Volman, 1996; see also below). Like those neurons, the selectivity of LMAN neurons has several aspects. For one, spectral cues are important; consistent with this, single tone bursts or specific syllables of conspecific songs could elicit phasic responses. The same neuron’s response to the BOS was always of much longer duration, however, and was sustained over several syllables; these neurons thus respond to more than a single feature of the BOS.
A second important aspect of the selectivity of LMAN neurons, like those in HVc, is their sensitivity to temporal context. Even if all the spectral features of the BOS were unchanged, altering the order of these cues dramatically decreased the response of LMAN neurons. This was true not only when all the cues were reversed (by completely reversing the song), but also when the syllables of the song were played in reverse order. This manipulation preserves the local order of each syllable but alters the global context, thus demonstrating that the sensitivity to order is not just local but extends across syllable boundaries.
LMAN neurons also show temporal combination sensitivity, another level of auditory context sensitivity, in which neurons respond strongly to a combination of syllables of the BOS, in a particular order, even when they fail to respond or respond much less to the individual syllables or subsets of syllables in isolation. Similar properties have been described for LMAN neurons in experiments using a single call-like syllable and its components as stimuli (Saito and Maekawa, 1993). Temporal combination sensitivity is also seen in song-selective neurons in HVc (Margoliash, 1983; Margoliash and Fortune, 1992; Lewicki and Konishi, 1995) as well as in auditory neurons in bats (O’Neill and Suga, 1979; Suga, 1990). An additional feature of the temporal combination sensitivity noted here is that the same unit could show combination-sensitive responses to several different combinations within the BOS. Not all of these responses were evident when the entire song was played, but multiple different portions of the song played in isolation could evoke robust responses. This property might be useful for piece-wise recognition of song.
The fact that the firing of LMAN song- and order-selective neurons is sustained over multiple syllables has implications for the neural encoding of responsiveness to the BOS; peak firing rate on a short time scale (<100 msec) was not always greatest for the BOS, but the mean firing rate over a longer time window (>500 msec) was almost always greatest for the BOS. Responses of LMAN neurons, like those of some other high-level sensory neurons (Gross, 1972; Newman and Wollberg, 1973; Desimone et al., 1984; Richmond et al., 1987), also had long and very variable latencies, did not occur reliably on every trial, and were not tightly time-locked to the stimulus. Thus, precise spike timing over short time frames seems not to be a mechanism by which LMAN neurons encode song.
Another feature of LMAN neurons is their similarity in response properties throughout the nucleus of an individual bird. The number of single LMAN neurons sampled from an individual bird was often small, however, and to be certain of this lack of topography, a more exhaustive mapping of LMAN responses will be necessary. Nonetheless, within the limits of these sample sizes, there is not an obvious “library” of different LMAN neurons, each tuned to particular features of a bird’s song. Instead, many neurons appear to be tuned to a similar set of features of the BOS. This is also true for neurons in HVc (Sutter and Margoliash, 1994). This result contrasts with a topography for auditory response properties that might have been expected in LMAN, given the recently described topography of LMAN projections to the RA (Johnson et al., 1995).
Comparison of LMAN and HVc neurons
The major source of auditory input to LMAN is from HVc (via X) (Katz and Gurney, 1981) (A. Doupe, unpublished observations), and thus the selectivity observed in adult LMAN likely reflects that of neurons in HVc. The song-selective neurons seen in adult LMAN are in fact strikingly similar to those in HVc in many respects, including their spectral and temporal selectivity for the BOS, their temporal and harmonic combination sensitivity, and their lack of topography (Margoliash, 1983, 1986; Margoliash and Fortune, 1992; Margoliash et al., 1994; Sutter and Margoliash, 1994: Lewicki and Konishi, 1995;Lewicki and Arthur, 1996; Volman, 1996). Numerous individual neurons in HVc show selectivity equal to that of the LMAN neurons presented here, including inhibition by non-BOS stimuli. Although it is difficult to compare data collected in different laboratories and in different nuclei, comparison of LMAN selectivity indices for the BOS versus REV or CON with indices calculated similarly for HVc suggests that these are similar for the two nuclei, although slightly higher for LMAN than for HVc (Margoliash and Fortune, 1992; Margoliash et al., 1994; Volman, 1996). HVc also seems to be more heterogeneous than LMAN; a number of studies (Margoliash, 1983; Saito and Maekawa, 1993; Lewicki and Arthur, 1996) have described neurons in HVc that respond well to simple stimuli or equally to forward and reversed song, whereas very few such neurons are found in LMAN. Lewicki and Arthur (1996) found that only 50% of song-responsive HVc neurons had a significantly greater response to forward than reversed song, versus 83% using the same criterion in the present study of LMAN. Anatomical and neurophysiological studies have shown that HVc contains two separate populations of auditory projection neurons, one to RA and the other to the AF (Katz and Gurney, 1981;Gahr, 1990; Doupe and Konishi, 1991; Kirn et al., 1991; Sohrabji et al., 1993; Vicario and Yohay, 1993). Thus, the properties of LMAN actually reflect a subset of HVc neurons, but whether the song-selective properties of this HVc subpopulation differ from the properties of RA-projecting neurons or from HVc as a whole has never been systematically studied.
Selectivity of adult X neurons
X neurons are interposed between two song-selective nuclei, HVc and LMAN, and thus might be expected to be very similar to neurons in these areas. Neurons in X are indeed very song- and order-selective, but their properties differ to some extent from those of both HVc and LMAN neurons. X neurons tend to be more broadly responsive to a variety of stimuli, including simple tone bursts and conspecific songs, and they are less likely than HVc and especially LMAN neurons to be inhibited by nonpreferred stimuli. This apparent loss of selectivity of X neurons relative to their inputs may result primarily from their high spontaneous rates (and thus perhaps lower thresholds for responding); moreover, the effects of inhibitory responses to song in HVc cannot be fed forward. Because there are two classes of auditory projection neurons in HVc, however, it is also possible that larger numbers of HVc neurons with simpler auditory response properties project to the AF pathway. Recent results with intracellular recording and filling of HVc neurons, however, suggest that at least some of the X-projecting HVc neurons are very song-selective (Lewicki, 1996).
Because LMAN neurons are more narrowly responsive and more likely to show inhibitory responses than X neurons, and because X provides their only known auditory input, the circuit between X and LMAN may create these differences in response properties, perhaps via inhibitory processing in X, LMAN, or the intervening thalamic nucleus DLM. In many sensory systems, gradual increases in stimulus selectivity are the result of hierarchical circuits such as the AF pathway (DeYoe and Van Essen, 1988; Konishi et al., 1988; Livingstone and Hubel, 1988; Rose et al., 1988). Nonetheless, the differences in selectivity along the adult AF are slight, and the functional implications of a sequence of very song-selective nuclei remain unclear.
X has recently been shown to receive a second song system input from recurrent collaterals from LMAN neurons (Nixdorf-Bergweiler et al., 1995; Vates and Nottebohm, 1995) (Fig. 1 b). The functional consequences of these inputs are unknown, however, and are not clarified by the present study of auditory response properties. A study in which LMAN alone was selectively inactivated while the auditory properties of X neurons were recorded would be revealing in this regard.
The AF in developing birds
The complex selectivity for spectral and temporal features of the BOS seen in adult AF neurons raises the questions of when and how this selectivity develops. Auditory units specialized for song within a pathway required for song learning might function in the auditory feedback crucial to normal song development. AF neurons in birds 30–45 d old are indeed auditory, consistent with a sensory role of the AF pathway in learning. Their properties are strikingly different from those of neurons in adult birds, however. Although all birds had heard only one tutor song for 4 to 6 weeks, there was no significant preference for the tutor song over conspecific songs. Moreover, the tutor song was equally effective played forward, in reverse order, or fully reversed.
The complex song- and order-selective properties of adult AF neurons are thus clearly not present in young songbirds and must emerge during the course of vocal development. A multiunit study of HVc neurons suggests that this is the case in HVc as well (Volman, 1993). A summary comparison of the strength of responses to the test stimuli in juvenile and adult AF neurons (Fig. 16) illustrates that the developmental increase in selectivity results both from a decrease in responsiveness to nonpreferred stimuli as well as from an increase in response to the BOS. Moreover, the lack of any selectivity in individual juvenile auditory neurons argues against a purely selective model, in which a broad pool of neurons each tuned to different stimuli exists initially, and then is narrowed by selection of appropriately tuned neurons (see also Margoliash, 1983). Instead, AF neurons apparently acquire their auditory selectivity during development. Like neurons in the visual system (Hubel et al., 1977; Shatz, 1990; Chapman and Stryker, 1993), these auditory neurons provide a clear example of experience-dependent shaping of neuronal selectivity.
Numerous anatomical and neurophysiological mechanisms could underlie this change in AF neural response properties. NMDA receptor binding sites in LMAN (Aamodt et al., 1992), the density of DLM inputs to LMAN (Johnson and Bottjer, 1992), and the spine density of LMAN neurons (Wallhausser-Franke et al., 1995) all decline during the time when LMAN neurons narrow their song responsiveness, suggesting that experience of song and tuning of neurons is associated with pruning of neural connections. The results here, which show increased responsiveness to the BOS as well as increased selectivity, also raise the possibility that some synapses in or prior to LMAN and X are being strengthened or increased in number, whereas others are being pruned.
The clear selectivity difference between auditory AF neurons in adult and 30- to 45-d-old juvenile birds raises the issues of when the neural selectivity seen in adults develops and whether it is a reflection of sensory or motor learning or both (Fig. 1 a). Does the lack of auditory selectivity for tutor song in these birds partway through sensory learning imply that the neural selectivity seen in the adult AF is not associated with sensory exposure to the tutor and template formation? To answer this, it is crucial to know what the birds have memorized after 1 month of sensory experience. This question is not settled. A number of behavioral studies indicate that the bulk of tutor song memorization in zebra finches happens later, between days 35 and 60–65, after which the critical period appears to end (Slater et al., 1988). However, other work indicates that tutor learning can take place before day 35 (Eales, 1989; Boehner, 1990; Slater and Jones, 1995). To determine whether tutor song exposure affects AF neurons, it would be informative to investigate the properties of AF neurons at the close of the most active period for sensory learning, that is, at 60–65 d in zebra finches.
Possible functions of the AF song-selective circuit
The function of the auditory selectivity of the AF will not be clear until animals further along in sensory learning are studied, as discussed above. If AF neurons prove to be selective for the tutor, they would be well suited to act as a template; they could provide information, encoded in the strength of their firing rate, about how well certain vocalizations match the memorized song model. Furthermore, these neurons are found in a pathway that projects back into the vocal motor system at the RA and thus could provide an error signal to guide premotor neurons. The lack of selectivity in the juvenile neurons presented here certainly raises the possibility, however, that song selectivity in the AF will prove not to reflect sensory learning of the tutor.
The origin of selectivity has been addressed in a multiunit study of HVc neurons in developing white-crowned sparrows (Volman, 1993). This showed that as a population, HVc neurons in young sparrows that had completed sensory learning but were not yet singing did not in fact show song selectivity or order selectivity for the tutor song. Instead, these neurons began to show song selectivity only during plastic song and then displayed a preference for the bird’s own developing vocalizations over the tutor song. Thus, in HVc, song selectivity appears to reflect the bird’s experience of its own vocalizations.
Although HVc provides the only known auditory input to the AF pathway (Katz and Gurney, 1981) (Doupe, unpublished observations), the lack of selectivity in HVc after sensory learning in sparrows does not settle the question of when selectivity emerges for the AF. The results here raise the possibility that the AF circuitry can sharpen response properties even in adult birds. Moreover, a comparison of responses to simple acoustic stimuli suggested that, as in adults, juvenile X is more broadly responsive than juvenile LMAN. Thus, in young finches, auditory selectivity may also become progressively refined along the AF pathway. Therefore, it is possible that the circuitry of the juvenile AF could synthesize selective neurons, even if its inputs from HVc were nonselective. The questions of when, and to what stimuli, selectivity emerges in the AF will require direct recording from this pathway at later stages of development.
If auditory neurons prove to be tuned to the bird’s emerging song rather than to the tutor, they could be very useful in the vocal practice phase of song learning. They clearly have the potential to provide the developing bird with information about its own vocalizations, which must be a prerequisite for modifying those vocalizations to match the tutor song template. Perhaps the AF pathway and song-selective neurons in general are crucial to sensorimotor learning but have little to do with sensory learning, despite their intriguing auditory properties. It is difficult to assess the status of sensory learning after AF lesions, because such lesions disrupt song production (Bottjer et al., 1984; Sohrabji et al., 1990; Scharff and Nottebohm, 1991). The fact that the AF circuit and its synaptic connections develop long before singing, however (by posthatch day 12 in zebra finches) (Nordeen et al., 1992; Sohrabji et al., 1993; Johnson and Bottjer, 1994; Mooney and Rao, 1994), points to some role for this pathway early in development and sensory learning. This function might be limited to the normal survival and sexual differentiation of the motor pathway, for which LMAN is known to be essential (Akutagawa and Konishi, 1994; Johnson and Bottjer, 1994). On the other hand, recent preliminary experiments also support the idea that LMAN must be physiologically active during sensory learning for normal song learning to occur (Basham et al., 1996). It remains to be seen whether the crucial function of LMAN during sensory exposure to the tutor involves acquisition of neuronal selectivity.
Auditory neurons with strong selectivity for the BOS, such as those found in the AF of adult birds, have been suggested to be useful for recognition of conspecific songs, by providing a “reference” against which other songs can be matched (Margoliash, 1986). Consistent with this notion as a role for the adult AF are preliminary results (Cynx et al., 1991) suggesting that acquisition by adult zebra finches of a discrimination between two conspecific songs becomes more difficult when X is lesioned.
The present study demonstrates the remarkable spectrally and temporally selective properties of auditory neurons, the tuning of which emerges during the development of a complex vocal behavior. Additional investigation of these neurons at stages of learning intermediate between those studied here should not only reveal the origin of their selectivity, but may also shed light on neural mechanisms that endow the cells with these properties and suggest possible functions of this circuit in learning.
This work was supported throughout by the Lucille P. Markey Charitable Trust and, in later phases, by the McKnight Foundation, the Searle Scholars Program, the Sloan Foundation, and the Klingenstein Fund. I gratefully acknowledge the invaluable intellectual, technical, and financial support of Mark Konishi, in whose laboratory this work was begun. In addition, these experiments would not have been possible without the initial training provided by Susan Volman and the technical help throughout of Gene Akutagawa, Christiane Draeger, Virginia Herrera, Michael Lewicki, Jamie Mazer, Larry Proctor, Frederic Theunissen, and Jim Wright. I also thank Michael Brainard, Dean Buonomano, Neal Hessler, Mark Konishi, Michael Lewicki, Michele Solis, and Frederic Theunissen for thoughtful comments on this manuscript.
Correspondence should be addressed to Dr. Allison J. Doupe, Department of Physiology, Box 0444, University of California, San Francisco, 513 Parnassus Avenue, San Francisco, CA 94143-0444.