Abstract
Birdsong is hierarchically organized in time, like speech and other communication behaviors. Syllables are produced in sequences to form song motifs and bouts. While syllables are copied from tutors, the factors that determine song temporal organization, including syllable sequencing (syntax), are unknown. Here, we tested the roles of learning and species genetics in song organization. We manipulated juvenile song experience and genetics in three species of estrildid finches (zebra finches, Taeniopygia guttata castanotis; long-tailed finches, Poephila acuticauda; Bengalese finches, Lonchura striata var. domestica). We analyzed the adult songs of male birds that were: (1) tutored by conspecifics; (2) untutored; (3) tutored by heterospecifics; and (4) genetic hybrids. Song macrostructure, syllable sequencing, and syllable timing were quantified and compared within and across species. Results showed that song organization was consistent within a species and differed across species, regardless of experience. Temporal features did not differ between tutored and untutored birds of the same species. The songs of birds tutored by other species were composed of heterospecific syllables produced in sequences typical of conspecific songs. The songs of genetic hybrids showed the organization of both parental species, despite the fact that only males sing. Results indicate that song organization is predicted by species rather than experience.
Significance Statement
Like speech, birdsong is a complex and learned behavior that is hierarchically organized in time. Previous work suggests that species identity influences song temporal organization. We tested the roles of genetics and learning in song organization in three songbird species and genetic hybrids. Birds were either tutored, untutored, or tutored by another species. Results showed that song organization was consistent within a species and differed across species, regardless of experience. Our findings suggest that the organization of behavioral sequences is shaped by both genes and experience, with the influence of experience acting at the level of units in a sequence and the influence of species genetics acting at the level of sequence organization.
Introduction
Many natural behaviors are organized as temporal sequences, which differ in flexibility across taxa. Fixed action patterns such as egg retrieval in ground-nesting birds are stereotyped sequences of distinct behaviors that progress to completion regardless of ongoing feedback (Lorenz and Tinbergen, 1938). These behaviors are species-specific, are directly related to fitness, and do not require learning (Tinbergen, 1951). Other sequential behaviors like speech are flexible, convey highly specific information, and depend on learning (Kuhl, 2004; Hickok and Poeppel, 2007). Learned behavioral sequences are constrained by species-specific sensory, motor, and cognitive capacities and their underlying neural structures (Podos, 1996; Saffran, 2003; Redford, 2008; Lipkind et al., 2017; Santolin and Saffran, 2018). An unresolved issue in the study of learned behaviors is how species-specific constraints and experience interact in the development of temporal sequences.
Vocal communication signals, such as speech and birdsong, are often composed of multiple acoustic units that are precisely arranged into temporal sequences (Doupe and Kuhl, 1999; Filippi et al., 2019; Yi et al., 2019). Birdsong is composed of learned sound units (syllables) arranged in vocal sequences that convey social information to others (Fig. 1). The temporal organization of birdsong is hierarchical (Todt and Hultsch, 1998; Berwick et al., 2011; Mol et al., 2017; James et al., 2021); levels of structure progress from the acoustic features of syllables (microstructure), to the temporal organization of syllable sequences and transitions between syllables (syntax), to the repeated sequences of syllables (motifs) that form singing bouts (macrostructure; Sossinka and Böhner, 1980; Clayton, 1990; Okanoya, 2004; James et al., 2020a).
Temporal organization of song. a, Spectrogram of zebra finch song, with lines indicating syllables, motifs, and a bout. Syllables are labeled with letters. b, Syllable transition diagram showing the sequencing of the song in (a). Each lettered node represents a syllable type. S/E represents the start and end of a song bout. The syllable order is clockwise. Line thickness scales with transition probability.
The acoustic features of syllables (Zann, 1976; Marler and Peters, 1977; Sossinka and Böhner, 1980; Catchpole and Slater, 2008; Rivera et al., 2023) and their temporal organization, or syntax (Beecher and Brenowitz, 2005; Berwick et al., 2011), are species-specific, suggesting that genetic traits shape song development. Juvenile songbirds can learn a wide range of syllables, including those of other species (Thorpe, 1958; Marler and Tamura, 1964; Eales, 1987; Clayton, 1989; Soha and Marler, 2000; Moore and Woolley, 2019) and synthetic syllables (Gardner et al., 2005). Individuals of the same species produce songs with similar temporal features (Immelmann, 1969; Eales, 1987; Clayton, 1989; Wang et al., 2019), and song tutoring manipulations suggest that species-level “universals” contribute to syllable sequence organization (Gardner et al., 2005; James and Sakata, 2017). Furthermore, manipulations of sensory and neural structures suggest that syllable acoustics and order are controlled by distinct neural circuits in the songbird brain. For example, in Bengalese finches, adult deafening results in rapid degradation of normal syllable sequencing, but slow and gradual degradation of syllable structure (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997; Sakata and Brainard, 2006; Wittenbach et al., 2015). Neural damage also affects syllable structure and sequence differently, depending on which circuits are damaged (Basista et al., 2014), suggesting that syllable acoustics and order are independently controlled. Because complex behaviors such as birdsong differ by species and individual, detailed quantification of behavioral structure at multiple temporal scales allows the identification of genetic and experience-dependent mechanisms of behavior across species and individuals, including the neural control of complex behavioral sequences.
Here, we tested the hypothesis that song temporal organization is predicted by species rather than learning. We predicted that birds would learn their tutors' syllables and arrange those syllables with the macrostructure, sequencing, and timing of their own species' songs, regardless of experience. We studied three species of estrildid songbirds with known relatedness (Olsson and Alström, 2020) and acoustic differences in song (Woolley and Moore, 2011). We compared the songs of normal birds, birds that were untutored (reared without exposure to song), birds that were tutored by adults of another species, and genetic hybrids of two species that were tutored by adults of the same paternal species. For this, we quantified song macrostructure including the organization of motifs in bouts (Clayton, 1989, 1990; Glaze and Troyer, 2006; James et al., 2021), syllable sequencing including syllable transitions and repeat probabilities (Lipkind et al., 2013; James and Sakata, 2017; Veit et al., 2021), and syllable timing using standard measures such as syllable duration and intersyllable interval (Glaze and Troyer, 2006; Long and Fee, 2008; Thompson et al., 2011).
Materials and Methods
In this study, “temporal organization” is operationally defined as song macrostructure, syllable sequencing, and syllable timing. We studied the temporal organization of adult male song in zebra finches (Z; Taeniopygia guttata castanotis), long-tailed finches (L; Poephila acuticauda), and Bengalese finches (B; Lonchura striata var. domestica; Fig. 2). Songs were recorded from 105 birds divided into four groups with different rearing conditions: (1) birds that were reared and tutored by conspecific adults (normal); (2) birds that were reared without exposure to adult song (untutored); (3) birds that were reared and tutored by heterospecific adults (cross-species–tutored); and (4) birds that were genetic hybrids of two species (hybrids). All procedures were approved by the Columbia University Institute for Animal Care and Use Committee.
Song temporal organization in birds of three species. a, Spectrograms of zebra finch (Z), long-tailed finch (L), and Bengalese finch (B) song, with lines indicating motifs above and syllables labeled with letters below (left). Motif line shading indicates distinct motif “types.” Syllable transition diagrams for the same songs (right). b, Syllable transition matrices for the songs in a. c, Transition entropy for the songs in a. Red lines indicate average transition entropy across syllable types. d, Spectrograms of Z, L, and B songs, with lines indicating motifs above and syllables labeled with letters below (left). Scatter plots of principal component analysis of syllable acoustics (middle). Plots show all syllables recorded from the birds in d (left); each point represents one syllable. Syllable type clusters are indicated by letter labels and point colors. Syllable spectrogram cross-correlation heatmaps used for validation of syllable type labeling (right). Same/different type cut-off value of 0.45 is indicated by black bar on the color scale. e, Histograms showing distributions of similarity scores across all birds' syllables labeled as the same type (gray bars) and different types (white bars). Vertical black bars indicate mean similarities of the same and different types. Dashed blue bar indicates same/different cut-off values. f, Histograms showing distributions of intersyllable intervals (ISIs) within motifs (gray bars) and surrounding motifs (preceding the first motif syllable and following the last motif syllable; white bars) in Z (top) and B (bottom) songs. Dashed blue bar indicates cut-off value of 2 SD > mean ISI within motifs of B songs. g, Average motif duration, motif number per bout, syllable number per motif, and number of motif types in the songs of each species (left). Average syllable repeat probability, sequence linearity, and transition entropy in the songs of each species (middle). Average syllable durations, syllable duration variability, and intersyllable intervals in the songs of each species (right). Bars show means and error bars show standard errors. ***p < 0.001; **p < 0.01; *p < 0.05; n.s., not significant.
Song-learning experimental design
All birds were bred in single-family enclosures, housed in our finch colony at Columbia University. Breeding birds reared their young (genetic or fostered) and served as tutors (total number of different tutors: Z, n = 8; L, n = 10; B, n = 10; median age, 186 d; age range, 120–1,360 d). Tutors were normally reared. Where possible, tutors for normal and cross-species–tutored pupils were included. All analyzed songs were recorded from adults. Birds were ≥110 d old and <4 years old, with the exception of one hybrid that was 97 d old.
Normal birds of the three species were bred from parents of the same species and received typical tutoring. Song was recorded from two generations of normal birds so that the songs of tutors (Z, n = 8; L, n = 5; B, n = 6) and pupils (Z, n = 9; L, n = 10; B, n = 10; median age, 151 d; age range, 110–809 d) could be compared. Untutored birds were reared by nonsinging females until 30 d of age and then housed in isolation until reaching adulthood. We recorded and analyzed adult songs from untutored Z (Ziso; n = 10) and L birds (Liso; n = 2; median age, 261 d; age range, 132–434 d). Untutored B birds were not generated in this study because normal B birds were used to generate cross-species–tutored birds and genetic hybrids, in order to test the influences of species and tutoring in the two focal species' songs.
Cross-species–tutored birds were raised by heterospecific foster parents. Z and L eggs and nestlings (≤10 d after hatching) were moved to the nests of B adult pairs. The eggs of B pairs were removed, so that each clutch was a single species. Z and L juveniles were raised by B foster parents until maturity, and we recorded their songs as adults (ZB, n = 9; LB, n = 10; number of tutors: ZB, n = 5; LB, n = 7; median age, 144 d; age range, 116–418 d).
Hybrid birds were generated by breeding adult L males with adult Z or B females. Hybrid juveniles were reared normally by both parents and tutored by their L fathers (Z × L, n = 5; B × L, n = 10; number of tutors: Z × L, n = 3; B × L, n = 3; median age, 129 d; age range, 97–682 d). Mothers were genetically normal Z and B birds, reared by their parents. The fathers/tutors were L birds in both crosses so that differences in the songs of the two hybrid groups would reveal the influence of maternal genetics on song temporal organization, while controlling for other variables. Like other tutored birds, hybrids were raised in family cages and housed in colony rooms. Each cage contained a single breeding pair so that the identity of the parents was known. All offspring reared in the same cage at the same time were genetic hybrids. Each hybrid juvenile had unrestricted social access to both parents and hybrid siblings. Because the number of Z × L hybrids was limited, we included one bird for which song was recorded at 97 d of age. The acoustics and temporal organization of that bird's syllables were highly similar to those of its older brothers, all >115 d old.
Song recording and syllable extraction
To record singing behavior, we placed birds individually in sound-attenuating chambers equipped with microphones routed through an audio interface to a computer running Sound Analysis Pro (Tchernichovski et al., 2000). Recorded song bouts (epochs of continuous singing) were digitized at 44,100 Hz and stored as uncompressed wave files. Birds typically produced enough song for analysis over 2–4 d of recording. From each bird's recorded song bouts, we randomly selected 25–30 s of song for analysis. Recordings were bandpass filtered (250–8,000 Hz) and root-mean-square power matched (65 dB SPL).
To identify syllables and measure syllable onset and offsets in song bouts, we used the amplitude of the sound pressure waveform (Moore and Woolley, 2019). Syllables were units of continuous energy that were at least 50% above the noise floor. For complex syllables that contained some amplitude fluctuations, we considered a syllable to be a single unit if any drops below threshold were shorter than 5 ms. We excluded introductory notes, which occur only at the beginning of a song bout (Zann, 1976; Woolley and Rubel, 1997; Zevin et al., 2004; Kao and Brainard, 2006; Wohlgemuth et al., 2010). This procedure resulted in a total of 18,894 syllables included for analysis.
Syllable type labeling
We labeled syllables using letters, according to their order in each bird's song recordings (A, B, C, D, etc.) from principal component analysis (PCA) of each syllable's acoustic features (Rivera et al., 2023). Briefly, we measured 26 acoustic features of each syllable using the warbleR package (Araya-Salas and Smith-Vidaurre, 2016) for R (v4.1.2), which in turn uses the seewave library (Sueur et al., 2008). Features measured for each syllable included mean frequency, frequency bandwidth, and entropy. These features capture a broad range of spectrotemporal acoustics and have been used to discriminate syllables of different species (Keen et al., 2021; Rivera et al., 2023). After performing PCA, syllables with similar acoustic features formed clusters. Syllable labeling was conducted as in Moore and Woolley (2019), using custom software in which song spectrograms and PCA clusters were analyzed simultaneously (Fig. 2d). To analyze PC clusters, the PC axes were rotated to visualize cluster separability. All syllables within a cluster were then visualized using spectrograms to ensure that each syllable in the cluster was of the same type. Syllables of the same type were then labeled with the same letter (Figs. 1, 2). Once every syllable in a bout was labeled, the string of letters representing the sequence of syllables was analyzed to produce syllable transition diagrams and matrices (Fig. 2a,b).
To assess pupils' learning of tutors' syllables, we used a set of shared labels for each tutor–pupil pair. We first assigned syllable type labels for a tutor bird as above. After labeling each tutor syllable by type, we projected the pupil's syllables onto the tutor's PC coordinates (Fehér et al., 2009) and assigned type labels to pupil syllables based on proximity to tutor clusters. For example, if a pupil produced a syllable cluster near or overlapping the tutor's Type A cluster, those syllables were labeled A.
The classification of individual syllables into syllable types (in a bird and between tutors and pupils) was verified by cross-correlation of syllable spectrograms (Moore and Woolley, 2019; Fig. 2d). For both self-comparisons and tutor–pupil comparisons, we randomly selected between 20 and 5 (mean = 16.8) individual syllables in each syllable type for analysis. We computed syllable spectrograms (log-transformed) with 2 ms temporal resolution and 100 Hz spectral resolution and rescaled them to range from 0 to 1, setting pixel values of <0.5 to 0. We then cross-correlated each pair of syllables: the shorter spectrogram was convolved with the longer in 2 ms time increments, and the peak of the correlation function was taken as the similarity measure. A cross-correlation coefficient (r) of 0.45 reliably discriminated between syllables of the same type (r > 0.45; mean r = 0.65) and different types (Fig. 2e; r < 0.45; mean r = 0.22).
Song macrostructure, syllable sequencing, and syllable timing measures
Bouts of singing were identified as sequences of syllables separated by intervals of silence that were longer than 300 ms for Z, L, and hybrid birds (Veit et al., 2011; Okubo et al., 2015; Norton and Scharff, 2016) and 2 s for B birds (Woolley and Rubel, 1997; Troyer et al., 2017; Veit et al., 2021). We analyzed syllable order in a bout to create syllable transition diagrams and matrices, with the beginning and end of each bout represented as “start/end” (S/E; Fig. 2a,b). For example, a bout ABCDABCD would include an additional transition from S/E to A, and one from D to S/E. All syllables in bouts were included in the syllable sequencing and timing analyses.
Song macrostructure was defined as the temporal features of syllable sequences composing motifs and bouts. A motif was defined as a stereotyped sequence of syllables that was repeated in and across a bird's song bouts. Each adult male Z and L bird has one song motif, which is sung once or repeated multiple times to form a song bout. Songs of B birds have multiple motifs that are short and occur consistently but at different places in bouts and are interspersed with unpredictable sequences (Okanoya, 2004) and are identified as having shorter intersyllable intervals (ISIs) within motifs than outside motifs (Fig. 2f; Takahasi et al., 2010; Tachibana et al., 2015; James et al., 2020a).
Song macrostructure was quantified as motif duration, the number of motifs per bout, the number of syllables per motif, and the number of motif types, in a bird's song (Figs. 2⇓⇓–5; Clayton, 1989, 1990; Glaze and Troyer, 2006; James et al., 2021). Motif duration was measured as the time difference between the onset of the first syllable and the offset of the last syllable in a motif. To quantify motif duration in each bird's song, the durations of individual motifs were averaged. For birds with multiple motif types (B and B × L), motif durations were first averaged within each type and then averaged across types. The number of motifs per bout was quantified by counting motifs in each bout and then averaging counts across bouts for each bird. The number of syllables per motif was quantified by counting and averaging the number of syllables within each motif, for each bird. For birds with multiple motif types, syllable numbers were first averaged within type and then averaged across types. Distinct motif types were identified as motifs having <50% of the same syllable types and sequence overlap as other motifs in a bird's song (James et al., 2020a) and that were present in >50% of song bouts. All songs were analyzed for the presence of multiple motif types. For analyses, we included only complete motifs (Sturdy et al., 1999; Williams and Mehta, 1999; Menyhart et al., 2015; Norton and Scharff, 2016; Ward et al., 2023). For example, if a bird's motif was ABCD, then instances of incomplete motifs, such as ABC, were not included in the analysis of macrostructure.
We defined syllable sequencing as the order of syllables in a bout (Scharff and Nottebohm, 1991; Woolley and Rubel, 1997; Takahasi et al., 2010; James et al., 2020a). Each bird's syllable sequence was quantified from that bird's syllable transition matrix, which included all syllables from all bouts for that bird (Fig. 2b). A syllable transition matrix shows the probability of a transition from each syllable type to every other syllable type, with transition probabilities organized into a symmetric matrix where rows and columns represent the syllable types in a bird's song. To generate matrices, transition counts were converted to probabilities so that every row summed to 1 across all columns, representing the probability of a transition from one syllable type to another syllable type. Transition matrices were plotted in grayscale with black representing a probability of 1 and white representing a probability of 0 (Fig. 2b). Only transitions with a probability ≥0.05 were included.
We quantified and compared syllable sequencing in experimental groups with three measures taken from syllable transition matrices: syllable repeat probability, sequence linearity, and transition entropy (Scharff and Nottebohm, 1991; Woolley and Rubel, 1997, 2002; Sakata and Brainard, 2006). For each syllable type in a bird's song, we computed repeat probability as the number of times the syllable type was repeated out of the total number of transitions for that syllable type. We then averaged, for each bird, all repeat probabilities for all syllable types in the bird's song.
Linearity is defined as the number of unique syllable types divided by the number of unique transition types (Scharff and Nottebohm, 1991; Woolley and Rubel, 1997, 2002; Zevin et al., 2004). Linearity of 1 represents a sequence that proceeds in the same order each time it is sung and when each syllable type transitions to a different type (syllables are not repeated). Linearity is <1 when syllable order differs in and across a bird's song bouts or when syllables are repeated.
Transition entropy is an information-theoretic quantity that represents uncertainty around which syllable type will follow from another syllable type (Sakata and Brainard, 2006; James et al., 2020a). Transition entropy for a syllable type i is defined as follows:
Syllable timing was quantified with two measures: syllable duration and intersyllable interval. Syllable duration was defined as the time difference between syllable onset and syllable offset. Intersyllable intervals (ISIs) were defined as the duration of silence between the end of one syllable and the beginning of the next.
Statistical analysis
We first assessed the within-group normality of song macrostructure, syllable sequencing, and syllable timing measures using Shapiro–Wilk tests. Because 72% of distributions were normal, we used parametric statistical tests, with α = 0.05 as the significance threshold for all tests. We tested for group differences using one-way ANOVAs with species/rearing group (e.g., Z vs L vs B) as factors. We performed post hoc multiple comparisons using the Tukey–Kramer method. We used two-tailed independent samples t tests with equal variance to test for differences between untutored and tutored birds of the same species.
To quantify the relative influences of species identity and experience on song structure in cross-species–tutored and hybrid birds, we computed a bias score which quantified the relative difference between the songs of an experimental group (e.g., ZB) and the songs of comparison groups (e.g., Z and B). For each temporal feature, we mean-centered and rescaled (z-scored) the data using the pooled mean and standard deviation across groups (e.g., across Z, ZB, and B) and then computed the bias as follows:
Results
Song temporal organization differs across species
Normal adult songs in the three species differed in song macrostructure, syllable sequencing, and syllable timing (Fig. 2). Figure 2a shows the representative spectrograms and syllable transition diagrams of adult song in each species. Zebra finch (Z) songs are composed of broadband syllables, with prominent harmonics (Price, 1979; Zann, 1993). Syllables are produced in a highly stereotyped order, forming repeated motifs (ABCDABCDABCD), and rarely repeat in succession (AAA; Sossinka and Böhner, 1980; Scharff and Nottebohm, 1991; Zann, 1993; Fig. 2a, top). Long-tailed finch (L) songs begin with short-duration, broadband syllables, which are followed by more tonal mid-duration syllables and then long, frequency-modulated syllables, with sequences always including repeats of the same syllable (AAABBBCCDD; Zann, 1976; Woolley and Moore, 2011; Moore and Woolley, 2019; Fig. 2a, middle). Bengalese finch (B) songs are composed of short-duration, broadband, and harmonic syllables that occur at a high rate and in complex sequences with frequent repeats of the same syllable and probabilistic transitions from one syllable to another (ABCCDDEFBACCDEFD; Fig. 2a, bottom; Okanoya, 2004; Takahasi et al., 2010). B songs include short motifs, also known as chunks (Okanoya, 2004), in which a sequence of syllables is produced in the same order over bouts of singing. One bird's song has numerous motif types, unlike Z and L songs. Figure 2a shows the four motif types in one B song, indicated with shades of gray above the spectrogram of B song. B song motifs are interspersed with other syllables, occur in no fixed order, and have shorter intersyllable intervals (ISIs) than outside of motifs (Fig. 2f; Okanoya, 2004; Tachibana et al., 2015; James et al., 2020a).
Song macrostructure differed across species, as shown by differences in song motifs (Fig. 2g). Motif duration was longer in L song than in Z and B song (F(2,47) = 77.52; p = 1 × 10−15). Motif duration did not differ between Z and B song (p = 0.15). The number of motifs per bout was higher in B song than in Z and L song (F(2,47) = 106.57; p = 3 × 10−18), and did not differ between Z and L song (p = 0.18). The number of syllables per motif was higher in L song than in Z or B song (F(2,47) = 16.90; p = 3 × 10−6), and did not differ between Z and B song (p = 0.87). The number of motif types was higher in B song than in Z and L song (F(2,47) = 107.16; p = 3 × 10−18). One motif type was found in each Z and L song, whereas B songs had an average of four motif types, consistent with previous reports (Zann, 1976; Sossinka and Böhner, 1980; James et al., 2020a).
Syllable sequencing differed significantly between species, as shown by differences in syllable repeat probability, sequence linearity, and transition entropy (Fig. 2g). Syllable repeat probability, the probability of the same syllable type occurring in succession, differed across species (F(2,55) = 59.57; p = 2 × 10−14). L and B birds were likely to repeat syllables of the same type (L–B, p = 0.080), and Z birds were unlikely to repeat syllables (Z–L, p = 9 × 10−10; Z–B, p = 1 × 10−9). Syllable sequence linearity, the number of unique syllable types divided by the number of unique transition types, differed across species (F(2,55) = 37.91; p = 4 × 10−11; Fig. 2g); Z songs had the highest linearity, L songs were intermediate, and B songs had the lowest linearity (all p < 0.0015). We further assessed sequence regularity using transition entropy, which quantifies the probability distribution of transitions from one syllable type to other syllable types (Fig. 2b,c; Sakata and Brainard, 2006; James et al., 2020a). Entropy is high for syllables that are followed by multiple syllable types (A–A, A–B, A–C) and zero for syllables that are always followed by the same syllable type (A–B). Transition entropy differed among species (F(2,55) = 32.63; p = 5 × 10−10). Entropy was lower in Z song than in L song (p = 9.10−5) and B song (p = 2 × 10−9) and lower in L song than in B song (p = 0.0017; Fig. 2g). The low sequence linearity and high transition entropy found in B song are consistent with previous reports of the probabilistic structure of B syllable sequences (Okanoya, 2004; Sakata and Brainard, 2006; Jin, 2009). Results of syllable sequencing comparisons showed that each species differed in syllable sequencing and that sequencing was consistent across birds of the same species.
Syllable timing differed between species and was consistent within a species (Fig. 2g). Syllable duration differed across species (F(2,55) = 35.39; p = 1 × 10−10). Average syllable duration was longer in Z and L songs than in B songs (B–Z, p = 1 × 10−9; B–L, p = 2 × 10−7), but did not differ in Z and L songs (p = 0.13). The variability of syllable duration differed across species' songs (F(2,55) = 37.72; p = 4 × 10−11); duration varied more across L syllables than across Z (p = 3 × 10−7) and B (p = 1. × 10−9) syllables, reflecting the larger range of syllable durations in L song. ISIs differed across species (F(2,55) = 28.44; p = 3 × 10−9). ISIs in L song were longer than those in Z (p = 1 × 10−8) and B (p = 6 × 10−7) song. ISIs in Z and B song did not differ (p = 0.49). Results of syllable timing comparisons showed that species differed in syllable duration and ISI, but both were consistent across birds of the same species.
Song temporal organization does not differ between tutored and untutored birds
Because song macrostructure, syllable sequencing, and syllable timing were consistent within species and different between species, we tested the hypothesis that song temporal organization is predicted by species rather than experience. This hypothesis predicts that song temporal organization will not differ in tutored and untutored birds. To test this, we raised Z and L birds with adult females only so that they were isolated (subscript “iso”) from exposure to adult song. Once untutored birds (Ziso and Liso) reached adulthood, we recorded and compared their songs to those of normal, tutored conspecifics. Tutored and untutored birds were from the same families. This controlled for the effects of familial genetics on song differences in tutored and untutored birds. Song analyses showed that Ziso and Liso birds produced abnormal syllables that were organized into species-typical sequences, like those of tutored birds (Fig. 3). Like normal Z songs, Ziso songs consisted of stereotyped sequences of syllables (motifs) that did not include successive repeats of the same syllable (Fig. 3a). Motifs were repeated multiple times in a bout, also like normal Z song. Like normal L songs, Liso songs were sequences of repeated syllables (motifs) that began with broadband, short-duration syllables and ended with tonal, long-duration syllables (Fig. 3b).
Adult song temporal organization in tutored and untutored conspecifics. a, Spectrograms and syllable transition diagrams of a zebra finch tutor's song (top), the pupil's adult song (middle), and the song of the pupil's untutored brother (Ziso; bottom). b, Spectrograms and transition diagrams of a long-tailed finch tutor's song (top), the pupil's adult song (L; middle), and the adult song of the pupil's untutored brother (Liso; bottom). Lines above the spectrograms indicate motifs, and syllables are labeled with letters. c, Average motif and syllable durations in the songs of tutored and untutored zebra finches and long-tailed finches (left); bars show means and error bars show standard errors. ***p < 0.001; n.s., not significant. Z-scored differences in song macrostructure, syllable sequencing, and syllable timing between tutored and untutored birds (right). Each point shows the mean difference for one feature of song temporal organization (orange, Z vs Ziso; gray, L vs Liso). Lines around each point show 95% confidence intervals.
Quantification of song macrostructure, syllable sequencing, and syllable timing showed no differences between tutored and untutored birds of the same species (Fig. 3c). Descriptive statistics for all measures and t test results are shown in Table 1. In no measure were the songs of tutored and untutored conspecifics different (all p > 0.12; Table 1). To visually compare measures of song temporal organization in tutored and untutored birds, we computed the mean difference in each measure (z-scored) and 95% confidence interval between tutored and untutored birds of a species (Fig. 3c). None of the nine features differed between tutored and untutored birds (all 95% confidence intervals contained zero; the number of motif types is not shown because all birds had one motif type), consistent with previous reports of species-typical macrostructure and sequencing in the songs of untutored and early-deafened Z birds (Price, 1979; Williams et al., 1993; James et al., 2020b). Results suggest that song learning is not required for normal temporal organization to develop in Z and L song.
Descriptive statistics and t test results for all measures of temporal organization in the songs of normal and untutored (isolate) birds
Song temporal organization is the same in normal and cross-species–tutored birds
Birds reared without tutoring produced songs with the same temporal organization as tutored birds (Fig. 3). Studies on the flexibility of song learning show that juveniles copy heterospecific song syllables if they are reared and tutored by adults of that species (Eales, 1987; Clayton, 1989; Woolley et al., 2010; Araki et al., 2016; Moore and Woolley, 2019). We tested the hypothesis that birds tutored by another species (cross-species–tutored) copy the syllables of heterospecific tutors but organize those syllables into sequences with the temporal organization of conspecific song. We raised zebra finches and long-tailed finches that were tutored by Bengalese finches (cross-species–tutored) and compared song temporal organization in those birds (ZB and LB) to temporal organization in conspecific song. We predicted that, if song temporal organization is predicted by species identity, then the temporal organization of ZB and LB songs would be similar to that of normal conspecifics.
Cross-species–tutored pupils copied B tutors' syllables, but did not copy B tutors' temporal organization (Fig. 4; Table 2). Figure 4a (left) shows representative spectrograms of a B tutor's song, a ZB pupil's adult song, and an LB pupil's adult song. The colored boxes demarcate four of the same syllable types in the tutor's and pupils' songs. Like normal Z and L birds, ZB and LB birds organized syllables into repeated motifs. Figure 4a (right) shows the syllable transition diagrams for the same songs. The ZB and LB pupils did not copy the variable syllable sequences of their B tutor's song. Figure 4b shows high-magnification spectrograms of the same syllable types in the tutor's and pupils' songs. ZB and Z pupils copied the same proportion of tutors' syllables (Fig. 4c, left; p = 0.99). LB and L pupils also copied the same proportion of tutors' syllables (p = 0.14). LB pupils copied fewer syllable types than did normal B pupils (p = 0.0088). We quantified acoustic similarity between the syllables of tutors and pupils using spectrogram cross-correlation. Quantification of acoustic similarity showed that LB syllables were as similar to tutors' syllables as were conspecific tutor and pupil syllables (Fig. 4c, right; p = 0.85). ZB syllables were less similar to tutors' syllables compared with conspecific tutor and pupil syllables (p = 0.019).
Song temporal organization in normal and cross-species–tutored birds. a, Spectrograms of songs recorded from a Bengalese finch tutor (top), a zebra finch pupil (middle), and a long-tailed finch pupil (bottom; left). Lines above the spectrograms indicate motifs; line shading indicates distinct motif “types.” Colored boxes outline the same syllable types in the tutor's and pupils' songs. Syllable transition diagrams for the same songs (right). b, High-magnification spectrograms of the same syllable types in the tutor's and pupils' songs. Colored boxes and lines correspond to colored boxes in a. c, Percentage of pupils' syllables learned from tutors (left). Acoustic similarity of syllables by type in one bird (same), between syllables of different types in a bird (other), and between syllables by type in tutor and pupil (copy; right). Center lines show medians, and boxes outline the upper and lower quartiles. Whiskers extend to ±1.5×IQR. Points show outliers. ***p < 0.001; **p < 0.01; *p < 0.05; n.s., not significant. d, Average motif number per bout in the songs of normal and cross-species–tutored birds (left). Bars show means and error bars show standard errors. Bias scores for each temporal feature in the songs of cross-species–tutored birds (right). Zebra finch pupils are on the top, and long-tailed finch pupils are on the bottom. Colored triangles indicate direction of bias, with positive (blue) indicating bias toward songs of tutors' species and negative (orange/gray) indicating bias toward songs of normal conspecifics. Points show mean bias scores, and surrounding lines show bootstrapped 95% confidence intervals. Confidence intervals excluding zero (significant bias) are colored black, and those including zero (no bias) are colored gray.
Descriptive statistics and ANOVA omnibus results for all measures of temporal organization in the songs of normal and cross-species–tutored birds
Descriptive statistics and omnibus ANOVA results for each temporal feature in the songs of normal and cross-species–tutored birds are shown in Table 2. We predicted that the songs of ZB and LB would have a similar macrostructure to the songs of normal conspecifics. Within-species post hoc comparisons of normal and cross-species–tutored birds showed no differences in any of eight measures of macrostructure (Table 2). Motif duration and the number of motifs per bout did not differ between normal and cross-species–tutored birds (Z and ZB, p = 0.94; L and LB, p = 0.59) but differed between pupils' and tutors' songs. Motifs were longer in LB song than in B song (p = 0.0024), and the number of motifs per bout was lower than in B song (p = 7 × 10−8). The number of syllables per motif did not differ between Z, ZB, and B song (p > 0.052), LB and L song (p = 0.99), or LB and B song (p = 0.054). The number of motif types did not differ between normal and cross-species–tutored birds of the same species (Z and ZB, p = 0.77; L and LB, p = 0.75) and was higher in B tutors' song than in their cross-species–tutored pupils (p < 7 × 10−5). We found two out of nine ZB pupils that sang two motif types and one out of ten LB pupils that sang three motif types. The results of four song macrostructure measures showed that the songs of cross-species–tutored birds were organized with the same macrostructure as the songs of conspecifics.
We predicted that syllable sequencing in ZB and LB songs would be similar to that in normal Z and L birds. One of six syllable sequencing measures differed between normal and cross-species–tutored birds in post hoc comparisons (Table 2). While syllable repeat probability did not differ between normal and cross-species–tutored birds (Z and ZB, p = 0.064; L, LB, and B, p > 0.27), sequence linearity was lower in ZB song than in Z song (p = 0.040); in the songs we analyzed, the average syllable repeat probability was fourfold higher in ZB song than in Z song. We found no differences in sequence linearity in any other comparison: ZB and B songs (p = 0.26), B and LB songs (p = 0.62), or LB and L songs (p = 0.067). Transition entropy did not differ in any comparison: Z and ZB song (p = 0.061), ZB and LB song (p = 0.97), ZB and B song (p = 0.21), B and LB song (p = 0.50), or LB and L song (p = 0.53). Results of syllable sequencing comparisons suggested trends in the direction of heterospecific tutors' syllable sequencing in normal and cross-species–tutored birds, but differed only in sequence linearity between ZB and Z birds.
Because cross-species–tutored pupils copied B tutor syllables, we predicted that syllable timing comparisons would show shorter syllable durations in ZB and LB song than in Z and L song because B song contains only short-duration syllables. We also predicted that LB songs would have the long ISIs of L song rather than the short ISIs of B song. Quantification of syllable timing in normal and cross-species–tutored birds confirmed both predictions (Table 2). Syllable durations were shorter in the songs of cross-species–tutored birds than in the songs of conspecifics (Z and ZB, p = 3 × 10−4; L and LB, p = 0.0092), reflecting the short-duration syllables copied from B tutors. Syllable durations in ZB and LB song did not differ from those in B song (B and ZB, p = 0.55; B and LB, p = 0.62). ISIs did not differ between Z, ZB, and B song (p = 0.13). ISIs did not differ between LB and L song (p = 0.93) and were longer in LB song than B song (p = 0.033).
To assess the relative influences of species genetics and experience on song temporal organization, we computed a “bias” score for each of the nine temporal organization features in the songs of cross-species–tutored birds (Fig. 4d). Analysis of bias toward tutors' songs or conspecifics' songs in the songs of cross-species–tutored pupils showed that only one of nine song temporal features was significantly biased toward tutors' songs. Consistent with the finding that ZB and LB birds learned the short-duration syllables of B tutors' songs, syllable duration was the only feature in the songs of cross-species–tutored birds found to be significantly biased toward the tutor species. In contrast, the number of motifs per bout and the number of different motif types in the songs of both ZB and LB were significantly biased toward the songs of normal conspecifics. The number of syllables per motif in the songs of LB birds was significantly biased toward the songs of conspecifics. For each remaining temporal feature, we did not detect significant bias toward either tutors' songs or the songs of conspecifics.
The songs of species hybrids have the temporal organization of both parent species
We next tested whether the songs of species hybrids, birds with mothers and fathers of different species, showed the temporal organization of the father's species or of both parental species. Maternal genes could shape song temporal organization even though females do not sing. We bred hybrids with L fathers and either Z or B mothers. Both parents reared hybrid offspring, and L fathers were hybrid juveniles' tutors. We then recorded the adult songs of hybrid pupils (Z × L and B × L) and compared song temporal organization in hybrids to song temporal organization of both parental species. Results showed that both parental species contributed to hybrid song temporal organization (Fig. 5; Table 3). Figure 5, a and b, shows representative spectrograms and syllable transition diagrams of father/tutor songs, hybrid pupil songs, and maternal grandfather songs for Z × L birds (Fig. 5a) and B × L birds (Fig. 5b). The Z × L song has stereotyped syllable transitions and motifs, like both Z and L song. In contrast, the B × L song has variable syllable transitions and multiple motif types, like B song. Figure 5, c and d, shows additional exemplar spectrograms of father/tutor songs and hybrid pupil songs for Z × L birds (Fig. 5c) and B × L birds (Fig. 5d) from different families as those shown in Figure 5, a and b.
Song temporal organization in normal birds and species hybrids. a, Song spectrograms and transition diagrams of a father/tutor long-tailed finch (top), an adult hybrid Z × L pupil (middle), and the hybrid's maternal grandfather (bottom). b, Song spectrograms and transition diagrams of a father/tutor long-tailed finch (top), an adult hybrid B × L pupil (middle), and the hybrid's maternal grandfather (bottom). Lines above spectrograms indicate motifs; line shading indicates distinct motif “types.” c, Additional exemplar spectrograms of L fathers'/tutors' songs and songs of hybrid Z × L sons/pupils. Individuals are labeled by number. d, Additional exemplar spectrograms of L fathers'/tutors' songs and songs of hybrid B × L sons/pupils. Birds labeled B × L 1 and B × L 2 are brothers. e, Percentage of pupils' syllables learned from tutors (left). Acoustic similarity of syllables by type in one bird (same), between syllables of different types in a bird (other), and between syllables by type in tutor and pupil (copy; right). Center lines show medians, and boxes outline the upper and lower quartiles. Whiskers extend to ±1.5×IQR. Points show outliers. ***p < 0.001; **p < 0.01; *p < 0.05; n.s., not significant. f, Bias scores for each temporal feature in the songs of hybrids. Hybrids with zebra finch mothers (Z × L) are on the top, and hybrids with Bengalese finch mothers (B × L) are on the bottom. Colored triangles indicate direction of bias, with positive (gray) indicating bias toward songs of the paternal species, and negative (orange/blue) indicating bias toward songs of the maternal species. Points show mean bias scores, and surrounding lines show bootstrapped 95% confidence intervals. Confidence intervals excluding zero (significant bias) are colored black, and those including zero (no bias) are colored gray.
Descriptive statistics and ANOVA omnibus results for all measures of temporal organization in the songs of normal and hybrid birds
Analysis of syllable acoustics showed that hybrid Z × L pupils copied the same proportion of tutors' syllables as did normal Z (p = 0.96) and L (p = 0.99) pupils (Fig. 5e, left). In contrast, hybrid B × L pupils copied a smaller proportion of tutors' syllables than did normal B (p = 2 × 10−5) and L (p = 0.0011) pupils. Z × L birds copied a larger proportion of tutors' syllables than did B × L birds (p = 0.012) despite being fathered and tutored by the same species. Quantification of syllable similarity showed that syllables copied by Z × L and B × L pupils were as similar to their tutors' syllables as the syllables of normal birds of each parental species were to their tutors' syllables (Fig. 5e, right; all p > 0.24).
Descriptive statistics and omnibus ANOVA results for each temporal feature in the songs of normal and hybrid birds are shown in Table 3. Because conspecifics that were tutored normally, untutored, and tutored by heterospecifics shared the same species-specific song macrostructure, we predicted that hybrid birds would show contributions of both parental species to macrostructure. Comparisons of motif measures in hybrids and parental species supported this prediction; hybrid song macrostructure showed the features of both the paternal species and the maternal species (Table 3). Motif duration was longer in Z × L song than in Z song (p = 0.0090) and did not differ from L song (p = 0.056), matching the father's species but not the mother's species. In contrast, motif duration in B × L song was shorter than in L song (p = 0.0035) and did not differ from B song (p = 0.13), matching the mother's species but not the father's species. The number of motifs per bout in hybrid song was the same as in the father's species song and differed from the mother's species song. The number of motifs per bout was lower in Z × L song than in Z song (p = 0.046) and did not differ from L song (p = 0.99). Similarly, the number of motifs per bout was lower in B × L song than in B song (p = 2.10−6) and did not differ from L song (p = 0.47). The number of syllables per motif did not differ between groups (Z × L and Z, p = 0.21; Z × L and L, p = 0.23; B × L and B, p = 0.95; B × L and L, p = 0.073). The number of motif types was higher in B song than in B × L song (p = 0.014) and higher in B × L song than in L song (p = 0.022), suggesting that both parents contributed to hybrids' song macrostructure. All Z × L, Z, and L birds produced one motif type. Results showed that hybrids' song macrostructure was a combination of that of the paternal and maternal species.
Because syllable sequencing in the songs of cross-species–tutored birds matched that of other conspecifics regardless of experience, we predicted that syllable sequencing in the songs of hybrids would have features of both parental species, reflecting the genetic contributions of both parents. Quantification of syllable sequencing in normal and hybrid birds showed contributions of both parental species to hybrid syllable sequencing (Table 3). Syllable repeat probabilities in hybrid Z × L song were intermediate to both parental species; syllable repeat probabilities were significantly higher in Z × L song than in Z song (p = 0.0082) and were lower in Z × L song than in L song (p = 6 × 10−4). Syllable repeat probabilities did not differ in B × L, L, and B song (p = 0.12). Sequence linearity did not differ in Z × L, Z, and L song (p = 0.080). Sequence linearity did not differ between B × L and L song (p = 0.087) or between B × L and B song (p = 0.50). Transition entropy did not differ in Z × L, Z, and L song (all p > 0.077), between B × L and L song (p = 0.22), or between B × L and B song (p = 0.71). Results indicate that syllable sequencing in the songs of species hybrids arises from contributions of both paternal and maternal influence, despite the absence of singing in females of these species.
We tested the prediction that parental species contribute to syllable timing by quantifying and comparing syllable durations and ISIs in the songs of normal and hybrid birds. Results showed that both parental species contributed to syllable timing in the songs of hybrids (Table 3). Syllable durations did not differ between Z × L, Z, or L song (p = 0.068). Syllable durations were significantly longer in B × L song than in B song (p = 0.0042), and did not differ between B × L song and L song (p = 0.33). ISIs were longer in the songs of Z × L hybrids than in Z songs (p = 0.013), and did not differ between Z × L and L song (p = 0.98). ISIs in B × L songs did not significantly differ from those in L (p = 0.58) or B (p = 0.11) songs, suggesting that B × L ISIs trended toward values intermediate to those of the two parental species.
Finally, to assess the relative contributions of paternal and maternal influence on song temporal organization, we computed a bias score similar to that shown in Figure 4 for each of the nine temporal features in the songs of hybrids (Fig. 5f). Analysis of bias toward paternal or maternal species' songs showed that Z × L hybrids' songs were biased toward paternal species' songs in three of nine measures, and not biased toward either species' songs in six of nine measures. Hybrid Z × L songs were significantly biased toward the fathers' species in the number of motifs per bout, sequence linearity, and intersyllable intervals. Hybrid B × L songs were biased toward the fathers' species in the number of motifs per bout and syllable duration. Hybrid B × L songs were biased toward the mothers' species in the number of syllables per motif, and were not biased toward either parental species' songs in the remaining six of nine measures.
Discussion
Birdsong is a hierarchically organized communication signal, with species differences in syllable acoustics (microstructure), the timing of syllables, transitions between syllables (sequencing or syntax), and the organization of syllable sequences into motifs and bouts (macrostructure; Berwick et al., 2011; Mol et al., 2017; James et al., 2021). While syllables are copied from adult tutors during development, we found that song macrostructure, syllable sequencing, and syllable timing differed across species and were shared across conspecifics that were tutored, untutored, and tutored by heterospecifics. This finding suggests that the temporal organization of song is predicted by species identity rather than learning. We conclude that song development is shaped by both genes and experience, with the influence of experience acting at the level of syllables and the influence of species genetics acting at the level of temporal organization. Our findings advance the study of genes and experience in sequenced behaviors by quantitatively demonstrating a separation of these two driving forces in the same behavior.
The seminal birdsong studies describe “inborn” temporal song traits (Thorpe, 1958, 1961; Hinde, 1969; Marler, 1970). For example, untutored chaffinches produce adult songs composed of three phrases and lasting 2.5 s, as do tutored chaffinches (Thorpe, 1958). This species-specific song structure is used despite the development of abnormal syllables. These and similar findings led to the longstanding model of song development, which begins with an innate “auditory template” that guides song development (Marler, 1970). Young birds attend to adult songs with species-typical structure, biasing social communication toward conspecifics and facilitating song copying (Marler, 1997; Adret, 2004b; Soha, 2017). Recent studies that tutored juveniles with songs composed of natural syllables with atypical temporal structure found that pupils' songs included copied syllables and sequences with some temporal features of normal conspecific song (Rose et al., 2004; Gardner et al., 2005; James and Sakata, 2017; Peters et al., 2022). Our finding that song temporal organization is predicted by species identity supports earlier findings and the existence of an auditory template that guides song development.
We found that species identity predicts song development most strongly at the higher levels of song organization, syllable sequences, motifs, and bouts. Regardless of tutoring, birds of the same species arranged syllables into motifs and bouts with the same organization; the number of syllables per motif, motif types per bird, motifs per bout, as well as motif duration, were consistent. In the songs of hybrids, song macrostructure showed contributions of both parental species, despite the absence of singing in females. Hybrids with Bengalese finch mothers sang with variable sequences and multiple types of short motifs, like males of their maternal species. Motifs of cross-species–tutored birds were composed of more and shorter heterospecific syllables, keeping the species-typical motif duration. These findings are consistent with studies in which cross-species–tutored birds copy only enough syllables to form a motif (Eales, 1987; Clayton, 1989). Similarly, the songs of untutored and early-deafened birds show abnormal syllables arranged in repeated motifs, like conspecific songs (Price, 1979; Williams et al., 1993; James et al., 2020b). Together, these results suggest that innate biases in song macrostructure drive the organization of syllables into motifs.
Specific features of temporal organization have been examined in the same species we studied. Consistent with our findings, cross-species–tutored birds copy syllables but not intersyllable intervals (ISIs) from their tutors (Araki et al., 2016), and birds tutored with variable ISIs in songs produced adult songs with stereotyped ISIs, fitting conspecific song (James et al., 2023). Unlike syllable durations, ISIs do not vary across days and social contexts (Glaze and Troyer, 2006), suggesting that ISIs constrain song “tempo,” a feature of macrostructure (Clayton, 1990).
Our hybrid experiments suggested a genetic influence on syllable repeat probability, including maternal genes, even though females do not sing. In our two groups of hybrids with different maternal species and the same paternal species, syllable repeat probability was higher in birds with Bengalese finch mothers than in birds with zebra finch mothers. Canaries, a species that normally sings trills of the same syllable type repeated, tutored with syllable sequences lacking repeats produced adult songs organized into trills. Juveniles faithfully copied those sequences, but their adult songs were composed of few syllable types, each repeated many times to form a phrase (Gardner et al., 2005). Clayton (1989) proposed that syllable repetition was limited by species constraints; syllable repeats in cross-species–tutored song were higher than in conspecific song but were much lower than in heterospecific song. In contrast, Funabiki and Konishi (2003), with the same species and manipulation, found that cross-species–tutored birds produced songs with a heterospecific number of syllable repeats. Our results agree with those of Clayton's findings.
Previous studies of hybrid songbirds have found that hybrid song temporal organization is, on average, intermediate to both parental species (Güttinger et al., 1978; Clayton, 1990; Grant and Grant, 1997; Takahasi et al., 2006; Wang et al., 2019; Toji et al., 2024). The degree of similarity to either parental species is specific to each individual. For example, Wang et al. (2019) and Toji et al. (2024) found that hybrids produced motifs that were like the father's species, like the mother's species, or in between, depending on the individual. Individual variation despite the same tutoring is consistent with a genetic contribution to song development, because parental alleles governing temporal traits would independently sort during meiosis (Hirsch, 1963). Alternatively, hybrids' social interactions with mothers may influence song development (Adret, 2004a; Carouso-Peck and Goldstein, 2019; Bistere et al., 2024); however, our results are consistent with those of Toji et al. (2024), whose hybrids were socially isolated from fathers and learned male song from playbacks. Our study advances these findings by demonstrating a contribution of maternal genes to offspring song temporal organization, despite the absence of singing in females.
Studies using artificial song tutoring or auditory manipulations have found that syllable sequencing can be altered with experience. Normal zebra finches can learn to reorder syllables when provided a different tutor song before their own song crystallizes (Lipkind et al., 2013). This ability was found in ∼10% of individuals studied, suggesting that syllable transitions are difficult to update once song development has progressed. Bengalese finches can learn to shift transition probabilities following auditory perturbations (Sakata and Brainard, 2006; Veit et al., 2021) and learn some pairwise syllable transition probabilities from conspecifics (James et al., 2020a). Together, results suggest experience and genetics combine to drive syllable sequencing.
Syllable sequencing likely arises from both experience and genetics because birds of some species show an unlearned bias for placing specific syllables in specific sequence positions. For example, the phrased songs of white-crowned sparrows always start with a whistle, whether birds are tutored, untutored, tutored with phrases in isolation, or tutored with no whistles (Soha and Marler, 2001; Rose et al., 2004; Plamondon et al., 2010). Zebra finches tutored with songs composed of the same syllables arranged randomly into sequences developed songs in which those syllables are produced in nonrandom order, with particular syllables occurring in specific sequence positions (James and Sakata, 2017). While we found lower sequence linearity in cross-species–tutored zebra finch song than in conspecific song, this measure did not differ in tutored and untutored birds of the same species. These findings suggest that if regularities in tutor syllable transitions are absent or difficult to learn, an innate sequencing template may guide syllable sequence development. Our finding that cross-species–tutored zebra finches produced poorer copies of tutors' syllables suggests species genetics may also bias syllable acoustics.
Our findings support previous behavioral and neural results suggesting that syllables and sequence are controlled by distinct neural circuits. Deafening in adult Bengalese finches leads to a rapid and complete loss of normal syllable sequencing, but slow and gradual degradation of syllable acoustics (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997; Sakata and Brainard, 2006; Wittenbach et al., 2015). Lesions and manipulations of neural structures differentially affect song syllables versus sequencing (Williams and Vicario, 1993; Hosino and Okanoya, 2000; Foster and Bottjer, 2001; Kobayashi et al., 2001; Hampton et al., 2009; Basista et al., 2014; Kubikova et al., 2014; Otchy et al., 2015; Koparkar et al., 2024). Spectrotemporal tuning of deep-layer auditory cortical neurons, in zebra and long-tailed finches, is shaped by experience with tutors and song learning (Moore and Woolley, 2019). In contrast, a subpopulation of auditory neurons in the same region is tuned to intersyllable intervals found in conspecific song, in both normal and cross-species–tutored birds (Araki et al., 2016). One hypothesis is that syllable acoustics and sequence are processed independently in the auditory stream, possibly by different cell types or circuits (Schneider and Woolley, 2013; Calabrese and Woolley, 2015; Araki et al., 2016; Spool et al., 2021). Future comparative work could test this hypothesis by analyzing differential tuning to syllable acoustics, timing, and sequence in closely related species that differ in these song parameters (Woolley and Moore, 2011). From the standpoint of social communication, the organization of copied syllables into songs with species-specific structure may adaptively balance the communication content of song to convey information on individuals' learning skills (syllables) and species identity (temporal organization). This work may also provide insight for human neuroimaging studies that investigate the separation of neural systems underlying production and perception of speech units versus speech sequences (Bohland and Guenther, 2006; Friederici et al., 2006, 2010).
Footnotes
This study was supported by a National Research Service Award (F31DC020904) to J.A.E., fellowships from the City University of New York to M.R., and grants from the US National Science Foundation (IOS-1656825), and National Institutes of Health (DC009810) to S.M.N.W. We thank Rawan Zayter for assisting with data analysis.
The authors declare no competing financial interests.
- Correspondence should be addressed to Sarah Margaret Nicolay Woolley at sw2277{at}columbia.edu.