Although vocal signals including human languages are composed of a finite number of acoustic elements, complex and diverse vocal patterns can be created from combinations of these elements, linked together by syntactic rules. To enable such syntactic vocal behaviors, neural systems must extract the sequence patterns from auditory information and establish syntactic rules to generate motor commands for vocal organs. However, the neural basis of syntactic processing of learned vocal signals remains largely unknown. Here we report that the basal ganglia projecting premotor neurons (HVCX neurons) in Bengalese finches represent syntactic rules that generate variable song sequences. When vocalizing an alternative transition segment between song elements called syllables, sparse burst spikes of HVCX neurons code the identity of a specific syllable type or a specific transition direction among the alternative trajectories. When vocalizing a variable repetition sequence of the same syllable, HVCX neurons not only signal the initiation and termination of the repetition sequence but also indicate the progress and state-of-completeness of the repetition. These different types of syntactic information are frequently integrated within the activity of single HVCX neurons, suggesting that syntactic attributes of the individual neurons are not programmed as a basic cellular subtype in advance but acquired in the course of vocal learning and maturation. Furthermore, some auditory–vocal mirroring type HVCX neurons display transition selectivity in the auditory phase, much as they do in the vocal phase, suggesting that these songbirds may extract syntactic rules from auditory experience and apply them to form their own vocal behaviors.
As a fundamental interface for communication, vocal signals convey information in their acoustic and sequence patterns. While vocal signals, including human languages, are composed of a finite set of acoustic elements, syntactic rules expand the diversity of sequence patterns (Chomsky, 1965; Bolhuis et al., 2010). To enable such syntactic vocal communication, neural systems must extract the sequence patterns from auditory information and establish syntactic rules to generate motor commands for vocal sequences (Chomsky, 1969; Santelmann and Jusczyk, 1998; Saffran, 2002; Kuhl, 2004). However, the neural processing of syntax for learned vocal signals remains largely unknown.
Bengalese finches assemble a fixed number of vocal elements called syllables into variable sequences (Woolley and Rubel, 1997) based on a finite-state type of syntax (Okanoya, 2004). These songbirds, like humans, establish and maintain variable songs under the guidance of auditory information (Konishi, 1965; Brainard and Doupe, 2000). Therefore, the brain circuits involved in both perception and expression of vocal signals should contain the neurons that transmit syntactic information. In the songbird brain, telencephalic nucleus HVC (used as a proper name) functions in both sensory and motor processing of songs (Nottebohm et al., 1976; Vu et al., 1994; Yu and Margoliash, 1996; Gentner et al., 2000; Hahnloser et al., 2002; Long and Fee, 2008). Hence, analysis of neural activity in Bengalese finch HVC is expected to provide fundamental insights into the neural basis of syntactic organization.
HVC directly and indirectly receives auditory input (Kelley and Nottebohm, 1979; Katz and Gurney, 1981; McCasland and Konishi, 1981; Margoliash, 1983). Information processed in HVC is transmitted by two segregated types of projection neurons (Dutar et al., 1998; Mooney, 2000): HVCRA neurons that project to the robust nucleus of the arcopallium (RA), which relay the information to motor neurons for vocal organs (Nottebohm et al., 1976); and HVCX neurons, which innervate to basal ganglia area X (Nottebohm et al., 1982) and are involved in audition-dependent vocal plasticity (Scharff and Nottebohm, 1991; Brainard and Doupe, 2000). Both types of projection neurons generate sparse burst spikes during singing (Hahnloser et al., 2002; Kozhevnikov and Fee, 2007; Prather et al., 2008). In the awake nonsinging state, however, HVCRA neurons are completely inactive (Hahnloser et al., 2002; Kozhevnikov and Fee, 2007; Prather et al., 2008). In contrast, a subpopulation of HVCX neurons exhibits both motor-related activity and auditory response to the individual bird's own song (BOS) playback and similar songs of other birds (Prather et al., 2008). Furthermore, the firing patterns of the auditory response to specific song sequences closely recapitulates the firing patterns while vocalizing the same sequences (Prather et al., 2008), suggesting that HVCX neurons function as a primary sensorimotor integration site for vocal signals. To explore the neural representation of syntax, we focused on HVCX neurons, and we report that these neurons code hierarchical structures of syntax with syllable-selective and transition-selective burst spikes.
Materials and Methods
All procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee of Kyoto University.
Adult male Bengalese finches and zebra finches (>130 d posthatch) were used for experiments. Bengalese finches were raised in our aviary (14:10 h light:dark cycle) and transferred to sound-attenuating chambers during the recording of their songs. Zebra finches were reared using a protocol previously described (Funabiki and Konishi, 2003; Funabiki and Funabiki, 2008). A 2.8 s Bengalese finch song sequence (a gift from Dr. Y. Funabiki, Kyoto University Faculty of Medicine, Kyoto, Japan) was used for the tutoring session for zebra finches.
Microdrive implantation surgery for electrophysiological studies.
All electrophysiological studies were performed using Bengalese finches. Animals were initially anesthetized with intramuscular injection of ketamine and xylazine. A motorized microdrive and stimulus electrodes were implanted into one hemisphere (five into the left and four into the right). The microdrive was made based on the original design as previously described (Fee and Leonardo, 2001; Hahnloser et al., 2002) with some modifications. In brief, the microdrive system included three motorized positioners and a headstage preamplifier. Each positioner was equipped with custom-made micromotors (Namiki Precision Jewel) and the motion of each electrode was controlled independently. After skin incision, small skull windows were made over area X, RA, and HVC using stereotaxic coordinates. A small hole (200–300 μm) was made in the dura over each area and bipolar stimulus electrodes were inserted to the proper depth in area X (all animals) and RA (eight of nine animals), and fixed with dental acrylic. Another small hole was made in the dura directly over HVC. The position and boundary of HVC were confirmed by the auditory response to the individual BOS. Subsequently, the exposed brain surface was protected with 2% low melting agarose in PBS and then the microdrive was implanted and secured to the skull using dental acrylic.
After complete recovery with sufficient vocalization of songs (>100 bouts/d), electrophysiological recordings were performed. During recording, microdrive electrodes were carefully advanced into HVC (5–15 μm/step). Antidromic stimulation (Swadlow, 1998; Hahnloser et al., 2002; Fee et al., 2004) was delivered to the stimulus electrodes in area X or RA (single monopolar pulses of 200 μs duration, 0.5 Hz, 30–150 μA). HVCX neurons were identified by their fixed-latency action potential responses to stimulation in area X. All the identified HVCX neurons had no response to stimulation in RA. When a single unit of neural activity had successfully been isolated and identified, we first conducted recordings of motor-related activity (n = 9 birds) during unprompted vocalization. After recording in the vocal phase, we next recorded sensory-related activity in the presence of BOS playback as an auditory stimulus (n = 4 birds). During the recording of the auditory response, behavioral states were carefully monitored using a video camera, and the arousal level of each bird was evaluated by its opened eyes and movements. When the animal was frequently vocalizing, flying, hopping, or eating, we suspended the recording in the auditory phase. The BOS (and conspecific songs and white noise if possible) were replayed with >10 s intervals at 70 dB through a speaker placed 5 cm away from the cage in the soundproof recording chamber. Electrophysiological data were amplified and bandpass-filtered between 0.3–3 kHz. Both electrophysiological and sound data were digitized (sampling rate, 50 kHz) and stored. All spike trains were confirmed to be single unit by complete suppression of the spike autocorrelation functions within ±1 ms. After recording sessions, electrode placement was verified via electrolytic lesions made by passing a DC current (10 μA for 20 s) through recording electrodes; birds were deeply anesthetized with 300 μl of 20% urethane and perfused transcardially with saline and then 4% paraformaldehyde. The electrode placements were verified in histological brain sections.
Acoustic features of vocal signals were discerned using spectrograms bandpass filtered between 0.5–8 kHz. Transition diagrams of individual Bengalese finch song sequences were constructed based on >100 song renditions.
Raster plots and histograms of action potential activity were constructed by aligning action potentials to the relevant syllable onset (5 ms bin size). A burst was defined as the spike activity whose instantaneous firing rate exceeded 100 Hz (Hahnloser et al., 2002). Among the total of 120 transition segments of successive syllable pairs where burst events were observed in 47 HVCX neurons, two segments (1.7%) contained two burst events during the silent period and the vocalization of the target syllable. In these two segments, the relative latency was measured from the onset of the first burst spikes and the firing rate was calculated from all spikes, including both of the two burst spikes.
In the statistical analysis of the neural activity in the alternative transition segments, firing rates of the individual burst spikes were defined as the average of reciprocals of interspike intervals, calculated from spikes in a time window of 100 ms. For the analysis of the firing rate, the median of the time window was determined from the time of a peak in the histogram of action potential activity. The difference in firing rates among the observed alternative trajectories within six segments upstream or downstream was analyzed using the Kruskal–Wallis ANOVA test. Neuronal reliability of burst events was defined as the probability of activity per the same time window used in the analysis for the firing rate (Prather et al., 2008). When the neuronal reliability in a nonprimary transition segment was <0.3, the transition-selective activity was classified as an all-or-none type. Otherwise, the transition-selective activity was classified as an intermediate type. The ratio of firing rate or neuronal reliability between primary and nonprimary transitions was <0.1 in all the all-or-none type transition-selective activities. In all statistical analyses, the minimum significance level was set at p < 0.05.
To evaluate the phasic activity in consecutive repetitions of the same syllables, differences in firing rate among the cycles were examined using the Kruskal–Wallis ANOVA test. Since the total numbers of repetitions were variable, the cycle numbers were counted from both the beginning and the end. The first and last cycles were excluded in these statistical analyses because burst spikes in the first or last cycles were unstable in some cases. Phasic activity that showed no significant differences was regarded as phasic bursts with a constant firing rate during the repetition. When phasic bursts displayed significant differences in firing rate, we further examined the relationship between firing rate and cycle numbers. As a result, all of the phasic bursts displaying significant differences in firing rate showed a significant correlation between firing rate and cycle numbers. In all statistical analyses, the minimum significance level was set at p < 0.05.
Bengalese finches vocalize variable songs consisting of acoustic elements called syllables, which are punctuated by periods of silence and have distinct acoustic properties from each other (Woolley and Rubel, 1997). Variable song sequences of the individual birds, composed according to finite-state type syntax (Okanoya, 2004), can be described using transition diagrams (Okanoya and Yamaguchi, 1997; Sakata and Brainard, 2006; Wohlgemuth et al., 2010). As illustrated in Figure 1A, a transition diagram consists of a finite number of nodes (circles A–J) and node-connecting arrows that indicate the identities of syllables and the possible transition directions between syllables, respectively. Syntactic rules that generate diversity in sequences are described as “alternative transitions” to or from syllables: convergent transitions from multiple different syllables to a junction syllable (junction transition) and divergent transitions from a branching syllable to multiple different syllables (branching transition). For example, in Figure 1, syllable B forms a junction transition segment with syllables A and F (red arrows), and therefore can follow syllable A or F (Fig. 1B, top). In turn, because syllable B also forms a branching transition segment with syllables C, D, and J (Fig. 1A, blue arrows), syllable B can be followed by syllable C, D, or J (Fig. 1B, top). When a transition diagram contains no alternative transition and is composed of deterministic linear transitions, each song rendition becomes stereotypic. Thus, Bengalese finches generate complex and variable song sequences by recurrently taking different trajectories in junction and branching transition segments between syllables (Fig. 1B, bottom). To explore the neural representation of syntax for generating variable sequences, we investigated the activity profiles of single HVCX neurons, projecting to the basal ganglia area X, which is critical for the development and maintenance of learned vocalization (Scharff and Nottebohm, 1991; Brainard and Doupe, 2000).
HVCX activity in variable vocal sequences
We identified HVCX neurons by antidromic activation from area X (Fig. 2A) and recorded their single-unit activity (n = 48 neurons, nine Bengalese finches). All HVCX neurons except one exhibited intermittent burst spikes during singing. Although syllable sequencing of Bengalese finch song is highly variable, spike trains of HVCX neurons displayed phase-locked patterns while vocalizing song sequences with the same syllable arrangement (Fig. 2B). To systematically evaluate the temporal relationship of the burst spikes to the associated sequences in variable songs, we divided song sequences into transition segments of successive syllable pairs and measured relative latencies (Fig. 2C). Because previous studies have demonstrated that singing-related activity in HVCX neurons transmits motor-related information (Hahnloser et al., 2002; Kozhevnikov and Fee, 2007; Prather et al., 2008) and precedes the vocal events by approximately several tens of milliseconds (Kozhevnikov and Fee, 2007), the relative latencies were defined as follows: when a burst was observed during a silent period between syllables, the relative latency was measured from the onset of the target syllable just preceded by the silent period to the onset of the burst (Fig. 2C, blue tick marks); if a burst was generated while vocalizing the target syllable, the relative latency was measured from the onset of the target syllable to the onset of the burst (Fig. 2C, red tick marks). Identified burst spikes were generated with a relative latency ranging from −100 to 100 ms. However, the relative latencies of burst events in the individual neurons grouped by their associated transition segments (n = 120 segments) were distributed in a more concentrated manner (average SD of the relative latency in each transition segment, 4.41 ms) (Fig. 2D), suggesting that each HVCX neuron transmits the information related to a particular transition segment in variable vocal sequences. Twenty-three percent of HVCX neurons (n = 11) exhibited burst spikes only in single transition segments (Fig. 2D, black dots, arrows), whereas the remaining 77% of the neurons (n = 36) displayed burst spikes in multiple different transition segments ranging from 2 to 6 (Fig. 2E). Notably, although the associated syllables of the multiple burst events did not have obvious similarities in phonological structure, these neurons generated burst spikes at similar onset timing for the different transition segments (Fig. 2E). Such preferred timing in HVCX activity suggests that the burst spikes may not directly participate in the transmission of the phonological information; rather, burst spikes of these neurons may be important for transmitting the temporal information of vocal signals. We thus focused our search on the relationship between the individual phase-locked burst spikes and syntactic information.
Syllable selectivity and transition selectivity
Assuming that the burst spikes of HVCX neurons code syntactic structures, the individual burst events may correspond to specific syllable types or specific transition directions, because the syntactic structures can be described with a transition diagram consisting of these two types of syntactic constituents (Fig. 1A). First, we tested the possibility that HVCX neurons can exhibit syllable-selective burst spikes in variable sequences. Figure 3A illustrates the activity profiles of neuron 11 in Figure 2B while vocalizing the junction and branching transition segment including syllable B, according to the syntactic rules represented in the transition diagram of Figure 1A. Neuron 11 preserved the identical spike pattern with the same relative latency to the onset of syllable B independently of the adjacent syllable types (Fig. 3A). Furthermore, we considered the effect of the upstream syllable arrangement in remote segments, because syllable sequencing in a branching transition segment is not a simple stochastic process in which the probability distribution of the next transition is determined by the syllable at the branching point. For example, when syllable B followed syllable A, the bird produced a primary transition from syllable B to syllable C in 52.1% of cases (Fig. 3B). In contrast, when syllable B followed syllable F, the transition probabilities were dramatically changed: the probability of the transition from syllable B to syllable C was reduced to 9.8% (Fig. 3B). Thus, HVCX neuronal activity may be affected by the upstream syllable arrangement in remote segments. We therefore analyzed the effect of the upstream syllable sequencing on the firing rate of the burst spikes associated with syllable B. Within six transition segments upstream, three different junction transitions (Fig. 3C, black lines) formed alternative trajectories toward syllable B. Statistical analysis for the effect of the difference in upstream trajectories on the firing rate revealed that the upstream syllable sequence had no effect on the activity profiles (p ≥ 0.05) (Fig. 3C, bottom). Together, the burst spikes in the alternative transition segment can represent the identity of a specific syllable and not the history of syllable transitions. Next, we examined the possibility of whether the burst events can represent a specific transition direction among alternatives. If HVCX neurons display such transition selectivity, the same spike pattern should emerge only when the same transition is executed. Neuron 1, whose activity was recorded from the same animal as those in neuron 11, displayed burst spikes only while taking the trajectory from syllable B to D among the alternatives in the branching transition segment including syllable B and in the junction transition segment including syllable D (Fig. 3D). This indicates that HVCX neurons can also generate transition-selective burst spikes that signal a specific transition direction in alternative transition segments.
To confirm whether the burst spikes in alternative transition segments consistently represent specific syllable identities or specific transition directions, we analyzed the relationship between the syllable arrangement and the burst profiles in 43 junction or branching transition segments (n = 27 neurons). In 15 segments, burst spikes were generated independently of syllable patterns within six transitions upstream (Fig. 4A,B, s1–s15) or downstream (data not shown), similarly to those shown in Figure 3, A and C. Their average relative latencies to the associated syllables were closely matched among alternative trajectories (Fig. 4C). Thus, HVCX neurons can consistently display syllable selectivity in variable sequences. In contrast, in 20 segments (Fig. 4A, a1–a20), firing rates changed in an all-or-none fashion (Fig. 4D), similarly to those shown in Figure 3D. In two cases where a junction or branching transition segment adjoined other alternative transition segments, the firing rate was affected by the syllable sequence in these neighboring segments (Fig. 4A, asterisks). However, alternative paths in remote segments had no effect on the firing properties, indicating that spike rate changes precisely reflect the particular trajectories in a single alternative transition segment, or two successive ones. Thus, such all-or-none type transition-selective burst spikes can explicitly signal a particular combination of two (or three) syllables in a specific order. In the remaining 18% of the alternative transition segments (i1–i8), burst spikes were observed in all transition directions and were reliable (probabilities of activity per transition segment in primary transition and secondary transitions: 0.99 ± 0.01 and 0.87 ± 0.13, respectively; n = 8 cells) (Fig. 4E,F), and there existed no difference or only a subtle one in spike onset timing to the relevant syllables (Fig. 4F). Nonetheless, significant differences in spike rates among alternative transitions distinguished this intermediate-type activity from syllable-selective activity (Fig. 4A,B). Thus, in the branching or junction transition segments, HVCX burst spikes are categorized into three types: syllable-selective activity, all-or-none type transition-selective activity, and intermediate type transition-selective activity.
Transition dynamics in syllable repetition
Syllable repetition in the Bengalese finch is not stereotypic. In some repetition segments, the number of repeated syllables is fixed to two. However, in most of the repetition segments, the number of repetitions varies in every rendition, and such a variable repetition also contributes to producing diverse vocal patterns. Because each cycle of variable repetition can be produced according to either context-dependent or context-independent operations, we first analyzed the transition dynamics of repetitive sequences. Assuming that syllable repetition is generated simply through a repeated random process, individual transition operation between the same syllables can be regarded as an equivalent process that occurs at a constant probability. Based upon this assumption, the expected frequency distribution of syllable repeats should exponentially decrease with an increase in the number of times repeated (Fig. 5A). To test whether every transition process in a given repetition segment is actually equivalent, we examined the frequency distributions of syllable repetitions in Bengalese finches (11 segments from eight birds). Only two repetition segments displayed a monotonic decrease of distribution (Fig. 5B). All the other repetition segments (82%) displayed unimodal normal-distribution curve-like profiles (Fig. 5B). To investigate whether the transition dynamics observed in Bengalese finch songs could be generalized to birdsongs of other species, we also analyzed repetitive sequences in zebra finches (Taeniopygia guttata). Zebra fiches generally vocalize stereotyped linear song sequences. However, they also have the ability to learn songs, including variable syllable repetition from Bengalese finch tutors (Funabiki and Konishi, 2003; Funabiki and Funabiki, 2008). We examined the frequency distributions of variable repetitions in zebra finches that successfully learned repetitive phrases consisting of Bengalese finch syllables (n = 9 birds) (Fig. 5D). All of the repetition segments in zebra finches displayed single-peaked, normal-distribution curve-like profiles (Fig. 5C). Thus, the assumption that repetitive transition occurs according to equivalent Markov processes does not account for the actual distribution profiles, indicating that neural states representing repetitive transitions of the same syllable change in every repetition cycle. Consequently, in cases in which spike trains with the same properties were displayed reproducibly in every repetition cycle, such activity should code the identities of repeated syllables.
Syntax-related activity in repetition
Next, we addressed whether the patterns of activity in HVCX neurons are constant or varied during consecutive repetition in Bengalese finches. Four neurons generated burst spikes only at the first (n = 2 neurons) or the last (n = 2 neurons) cycles of the repetition phrases in an all-or-none manner (Fig. 6A,B, respectively). These all-or-none type transition-selective activities are thought to signal the initiation or termination of the syllable repetition in variable vocal sequences. We also identified a total of 20 neurons (seven birds) that generated phasic activity while vocalizing repetition segments (n = 21 segments). All of them exhibited brief bursts in a precisely phase-locked manner to the onset of each repetition syllable. These phasic bursts during repetition were categorized into two types: one was characterized by a constant spike rate in every repetition cycle (n = 14 segments) (Fig. 6C,D) and the other by a monotonically increasing or decreasing spike rate as the same syllable was repeated (n = 7 segments) (Fig. 6E,F). The former type preserved spike rates independently of the total number of times repeated (Fig. 6C,D, bottom right). Therefore, it represented the identities of repeated syllables as a syllable-selective activity. In contrast, the latter type represented numerical information of repetitive cycles by increasing or decreasing the spike rate during repetition and was regarded as an intermediate-type transition-selective activity. It is notable that some HVCX neurons with an intermediate-type transition selectivity increased or decreased the firing rate adaptively to the total number of syllable repetitions, to exhibit burst spikes at a fixed firing rate every time that the first and last repetition cycle was executed (Fig. 6E,F, bottom right). Such a coding strategy is useful in indicating the progress and state-of-completeness of the variable repetition. More importantly, these findings support the idea that the total number of repetition times is not stochastically determined during the repetition cycle, but is approximately determined in advance; that is, birds can settle on an adequate repetition number before or during the performance.
Syntactic attributes of multiple bursts in the same neurons
HVCX bursts code hierarchical syntactic structures with syllable-selective and transition-selective activities; HVCX neurons represent a specific syllable type and a particular combination of syllable pairs using syllable-selective activity and transition-selective activity, respectively (Fig. 7A,B). Furthermore, they signal the temporal boundaries of repetitive segments and the numerical information associated with repetitive cycles using all-or-none type and intermediate type transition-selective activity, respectively (Fig. 7B). To compute such different syntactic properties may require different types of information processing at a cellular or circuit level, leading to the idea that the difference in selectivity to specific syntactic attributes may be due to differences in cell types. To address the question of whether HVCX neurons can be classified based on the relationship of their activity to the syntax, we examined the syntactic attributes of the individual burst events of neurons 1–36 in Figure 2E, which displayed activity in multiple different segments. Single HVCX neurons showed different syntactic properties in different segments in various combinations (Fig. 7C). The syllable- and transition-selective properties in repetition segments were also equally integrated with those in junction and/or branching transition segments within the burst spikes of single neurons. This indicates that syntactic selectivity of a burst event is established independently of the basic cellular profiles of the individual HVCX neurons.
Syntax-related auditory response to self-generated vocal sequences
As previously reported (Prather et al., 2008), selective auditory response to BOS or to similar song sequences in other birds' songs observed in HVCX neurons closely recapitulates the motor-related phasic patterns. This auditory–vocal mirroring property poses the question of whether these neurons can extract syntactic information directly from auditory inputs and represent the syntactic structures in the auditory phase in a manner correspondent with those in the vocal phase. To address this question, we compared the activity profiles of these mirroring-type HVCX neurons in alternative transition segments between the auditory and vocal phases. Because the level of wakefulness profoundly affects the auditory response in HVC (Schmidt and Konishi, 1998), the behavioral state was monitored by a video camera in a soundproof recording chamber, and auditory stimuli were carefully delivered while birds were active. When the BOS playback was presented as an auditory stimulus, 31% of HVCX neurons (nine of 29 neurons, four birds) preferentially exhibited phase-locked brief activity in some of the transition segments, where the same neurons generated burst spikes during singing, and spontaneous firing of the remaining neurons in the nonsinging state was unchanged or suppressed. Although spike rates of these nine HVCX neurons in the sensory phase were typically lower than in the motor phase, the onset timings of the auditory-related activity to the relevant segments were approximately similar to those of the vocal-related activity (Fig. 8A).
Among the identified nine auditory–vocal mirroring HVCX neurons, four neurons generated bursts in the alternative transition segments in response to the BOS playback. Unlike vocal-related activity, it is logically counterintuitive that auditory transition-selective responses in branching transition segments are generated before a target syllable is presented. We therefore considered the onset timing of the bursts relative to the syllable onset. Two of these four neurons generated motor-related transition-selective bursts after the onset of the target syllables in the associated alternative transition segments (Figs. 4A,B, a13 and i8; 8A, red square and red triangle). Both of them changed their burst profiles in the segment in response to the auditory presentation of the BOS, similar to the motor phase (Fig. 8B–D,G,H). Such sensorimotor correspondence in transition selectivity suggests that these auditory–vocal mirroring neurons extracted not only phonological but also syntactic information from auditory signals of naturally composed vocal sequences. The remaining two neurons exhibited transition-selective bursts before the onset of the target syllables in the vocal phase (Figs. 4A, a8 and a12; 8A, red inverted triangle and red circle). In the a12 segment, the burst spikes lost their transition selectivity in the auditory phase and were observed in all transition directions (Fig. 8E,I). In contrast, in the a8 segment, transition-selective changes in firing rate and neuronal reliability were preserved in the auditory phase (Fig. 8F,J). However, in the a8 segment, the HVCX neuron exhibited bursts with slightly greater latency in the auditory phase than in the vocal phase. As a result, the auditory response was generated after the onset of the target syllable (Fig. 8A). Such a temporal shift suggests that the information processing that generates the transition-selective activity may be different depending on the respective onset timing in the transition segments. Although auditory–vocal mirroring HVCX neurons cannot perfectly reproduce transition selectivity in the auditory phase as they do in the motor phase, our observations indicate that at least some auditory–vocal mirroring HVCX neurons can directly extract syntactic information from sensory inputs and represent it as they do in the motor phase.
Our results demonstrate that burst spikes in HVCX neurons represent hierarchical syntactic structures in variable vocal sequences. HVCX bursts during the vocalization of nonstereotyped sequences display syllable selectivity or transition selectivity (Fig. 7). In alternative transition segments between different syllables, transition-selective HVCX bursts indicate a specific transition direction among alternatives. In consecutive repetition segments of the same syllables, all-or-none type transition-selective bursts signal entry into or exit from the repetitive sequence, and intermediate type transition-selective bursts indicate the progress and state-of-completeness of the repetition (Fig. 7B). The combination of different types of HVCX bursts represents unique internal state changes in the course of generating variable vocal sequences. Our results also suggest that syntactic attributes of the individual HVCX neurons are not determined by their basic cellular types but acquired during the maturation of HVC circuitry (Fig. 7C). Finally, we show that auditory–vocal mirroring type HVCX neurons can display transition selectivity in response to the auditory presentation of syntactic song sequences as they do in the vocal phase (Fig. 8).
Neural representation of learned behavioral sequences
In addition to vocal signaling analogous to that in avian species, mammalian subjects also learn to create complex and diverse behavioral sequences from combinations of elemental movements. In the primate supplementary motor area, some neurons exhibit activity shortly before and during the execution of a specific movement in the course of sequential tasks (Tanji and Shima, 1994) and others become active during the transition from a specific movement to another specific one (Tanji and Shima, 1994), similar to avian HVCX neurons displaying syllable and transition selectivity, respectively. Thus, to represent the precise ordering of elemental actions, neural systems use transition-selective activity that codes a particular combination of movements in a specific order, in addition to movement-selective activity.
The organization of behavioral sequences also requires the proper initiation and termination of the sequence. Neurons in the primate prefrontal cortex, intimately connected with the striatum, exhibit enhanced firing at the start and end of a sequential saccade task (Fujii and Graybiel, 2003). Dopaminergic neurons in the rodent nigrostriatal pathway signal the initiation and termination of self-paced action sequences (Jin and Costa, 2010). Furthermore, our study demonstrates that basal ganglia projecting HVCX neurons explicitly indicate the initiation and termination of syllable repetition segments, even in continuous vocal behaviors (Fig. 6A,B). Together, the hierarchical organization of learned behaviors may require the cortico–basal ganglia circuits to signal the temporal boundaries of particular domains in the process of generating variable behavioral sequences.
To control behavioral sequences with repetitive elements, animals must settle on an adequate repetition number before or during the performance. Neurons in the primate parietal cortex display number-selective activity while performing numerically based behavioral tasks (Sawamura et al., 2002). This number-selective activity is classified as early-, late-, or middle-trial selective, according to its temporal preference. During the syllable repetition, HVCX neurons signal the progress and state-of-completeness of the repetition with intermediate-type transition-selective bursts similar to early- or late-trial selective activity in primates (Fig. 6E,F). Such numerical representation may be useful for monitoring execution of individual actions and keeping track of the number of actions performed, to select an appropriate forthcoming action (Sawamura et al., 2002).
Because the mammalian subjects described above were all trained with intensive operant conditioning, the neural mechanisms of their learning process may differ from those of a natural learning process of imitation, such as vocal learning. Nevertheless, despite the differences in learning process, motor-related activity in avian subjects and that in mammalian subjects during the execution of complex behavioral sequences share many common features. The neural coding strategy to represent syntactic rules observed in these animal subjects may have implications for human advanced skills, including languages.
Redundant coding for syntactic information
HVCX neurons code syntactic structures in a redundant manner with syllable-selective and two types of transition-selective activities (Fig. 7). Because syllable identities can be included in transition-selective activity, one might therefore expect that syntactic information could be described simply by transition-selective activity. How, then, is such a redundant coding used for the organization of syntactic vocal signals?
All-or-none type transition-selective activity constitutes nonoverlapping neural ensembles and can explicitly indicate a specific transition direction among the alternatives (Fig. 3D). However, if syntactic rules were described only with all-or-none type transition-selective activity, information about the same syllable in different syllable transitions would be represented by entirely different combinations of HVCX activities. As a result, neurons in area X would have to extract information for the same syllable in different trajectories from completely different spatiotemporal patterns of HVCX neuronal ensembles. To get around this problem, the intermediate type transition-selective activity, which is generated at the same timing at identical syllables, and syllable-selective activity, which displays exactly the same spatiotemporal patterns at identical syllables, may play a part to establish and maintain precise phonological structures by presenting the identities of the same syllables in variable sequences (Fig. 7A).
Development of syntactic selectivity
Computing different syntactic properties may require different types of information processing at a cellular or circuit level, leading to the idea that differences in syntactic attributes observed in HVCX neurons may be due to a difference in cell types. However, our results indicate that HVCX neurons cannot be classified from their functional relationship to syntax (Fig. 7). Both syllable and transition selectivities were randomly intermingled within single HVCX neurons (Fig. 7C). In contrast, each HVCX neuron generated burst spikes at a similar latency to syllables with different phonological profiles (Fig. 2E). Such temporal preference of the individual neurons despite the lack of their association with specific syntactic or phonological information may be a reflection of the developmental process of the HVC circuitry.
Song develops gradually through plastic vocalizations toward a crystallized form. In this process, syllables of different types need not emerge from primitive versions of each of these types but originate in common prototypes (Tchernichovski et al., 2001). Therefore, HVCX neurons may first tune their approximate spike timing to the common syllable prototype before each syllable becomes differentiated in acoustic and syntactic aspects. Subsequently, HVCX activity may be shaped and establish syllable or transition selectivity in accordance with maturation in the phonological and syntactic profiles.
Intracellular electrophysiological studies have so far demonstrated that the burst spikes in HVCX neurons arise through different subthreshold processes from those in HVCRA neurons (Lewicki, 1996; Mooney, 2000; Mooney and Prather, 2005; Long et al., 2010): the temporal precision of HVCRA bursts is mainly determined by selective excitatory synaptic inputs, whereas inhibitory synaptic inputs play primary roles in the establishment of HVCX bursts. Because inhibitory synaptic inputs to HVCX neurons presumably originate from interneurons within HVC, syntactic attributes in HVCX activity may be formed through the development of inhibitory interneurons after the establishment of the excitatory network. In support of this idea, functional maturation of the inhibitory interneuronal network in the mammalian cortex is a protracted process, extending into late development (Dorrn et al., 2010). However, the postnatal developmental process of both excitatory and inhibitory networks in HVC remains largely unknown. A full understanding of the development of syntactic processing will require further studies focused on functional maturation of the HVC circuitry.
Neural representation of syntax in auditory information
Imitative learning requires forming a memory of others' performance on the basis of sensory experience and adjusting motor commands using sensory feedback (Meltzoff, 1990). Therefore, the brain must establish a correspondence between sensory and motor representation of the performance. Such sensorimotor correspondence at a single neuron level was first identified in visual–motor mirroring neurons in primate premotor cortex (area F5) (Rizzolatti et al., 1996), regarded as a homolog of the human language area. In birds, auditory–vocal mirroring HVCX neurons in the premotor–basal ganglia pathway (Prather et al., 2008) constitute a part of the vocal control system. Because vocal behavior requires both phonological and syntactic control, auditory–vocal correspondence should be established not only in phonological but also syntactic aspects. Although auditory–vocal mirroring HVCX neurons were less active in the auditory phase, our results demonstrate that these mirroring HVCX neurons can display transition selectivity in response to the auditory presentation of syntactic song sequences, just as they do in the vocal phase (Fig. 8). Because HVC receives several different inputs (Nottebohm et al., 1982; Bottjer et al., 1989; Vates et al., 1996; Bauer et al., 2008), attenuation of the HVCX activity in the auditory phase may reflect a difference in active input between the auditory and vocal phases. In addition, such an attenuation in the auditory phase may be partly due to selective attention, which modulates neural responses to sensory stimuli (Maunsell, 1995; Desimone, 1996; Newsome, 1996). In this regard, selective attention may suppress auditory responses to the playback of the BOS sequences as unattended stimuli.
Although the physiological role and significance of HVCX activity during the auditory phase is still open to question, the fact that HVCX neurons can directly extract syntactic information from the playback of naturally composed variable vocal sequences suggests that these songbirds may have the ability to learn syntactic rules from auditory experience and apply them to form their own vocal signals. Because syntax of the birdsong is coded by neurons that are analogous to mammalian cortical neurons projecting to basal ganglia circuits (Jarvis et al., 2005; Bolhuis et al., 2010) that are involved in the sensorimotor and cognitive processes of language (Ullman, 2001; Frey and Gerry, 2006), the principles obtained from the studies of the avian vocal control system may contribute to the further understanding of the neural basis of syntax in human language.
This work was supported by research grants from the Takeda Science Foundation, the Uehara Memorial Foundation, and the Ministry of Education, Culture, Sports, Science, and Technology of Japan including the Strategic Research Program for Brain Sciences. We thank S. Nakanishi and M. Konishi for discussions, advice, and criticisms that greatly benefited this project. We thank K. Tashiro for experimental assistance; A. Takashima for developing custom-made microdrives; Y. Funabiki and T. Hayakawa for developing tutoring methods; and R. Yamada, D. Koketsu, and A. Nambu for valuable discussion. We thank J. Hejna for careful reading and criticisms.
- Correspondence should be addressed to Dr. Dai Watanabe, Department of Biological Sciences, Faculty of Medicine, Department of Molecular and Systems Biology, Graduate School of Biostudies, Kyoto University, Yoshida, Sakyo-ku, Kyoto 606-8501, Japan.