Songbirds and humans both rely critically on hearing for learning and maintaining accurate vocalizations. Evidence strongly indicates that auditory feedback contributes in real time to human speech, but similar contributions of feedback to birdsong remain unclear. Here, we assessed real-time influences of auditory feedback on Bengalese finch song using a computerized system to detect targeted syllables as they were being sung and to disrupt feedback transiently at short and precisely controlled latencies. Altered feedback elicited changes within tens of milliseconds to both syllable sequencing and timing in ongoing song. These vocal disruptions were larger when feedback was altered at segments of song with variable sequence transitions than at stereotyped sequences. As in humans, these effects depended on the feedback delay relative to ongoing song, with the most disruptive delays approximating the average syllable duration. These results extend the parallels between speech and birdsong with respect to a moment-by-moment reliance on auditory feedback. Moreover, they demonstrate that song premotor circuitry is sensitive to auditory feedback during singing and suggest that feedback may contribute in real time to the control and calibration of song.
Song is a complex learned vocalization that has many similarities to human speech (Doupe and Kuhl, 1999). Both song and speech depend critically on hearing for two aspects of vocal learning: (1) to encode sensory models of the normal communication sounds produced by others; and (2) to monitor and refine the quality of the developing individual's own vocalizations (Konishi, 1965, 1985; Brainard and Doupe, 2000). Once learned, vocalizations generally remain quite stable in adults. However, a continued requirement for feedback is evidenced by the deterioration of song and speech that follows deafening in adulthood (Lane and Webster, 1991; Cowie and Douglas-Cowie, 1992; Nordeen and Nordeen, 1992; Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997). For humans, at least, there is additional evidence that there are real-time contributions of auditory feedback to the control of ongoing vocal production. This has been demonstrated by various perturbations of auditory feedback that elicit disruptions of ongoing speech (Lee, 1950; Elman, 1981; Howell and Archer, 1984; Houde and Jordan, 1998; Larson et al., 2000). For example, exposure to delayed auditory feedback results in speech abnormalities including missequencing of syllables and slowing of speech (Lee, 1950; Howell and Archer, 1984). Despite the great similarity in the ontogeny and phenomenology of vocal control in birds and humans, the degree to which song is similarly influenced by real-time feedback remains unclear. This question is of interest not only because it addresses the extent to which similar functional roles of auditory feedback are shared across widely divergent species, but also because it may shed light on the interaction between performance-based feedback and motor control as it relates to learning and production of complex patterned sequences.
For Bengalese finches, removal of auditory feedback by deafening leads within days to deterioration of the normal sequencing of song syllables (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997) (Fig. 1). The progressive modification of Bengalese finch song after deafening indicates the importance of hearing for maintenance of normal song in this species, but it does not address whether and how auditory feedback contributes in real time to the control of ongoing vocalizations. Here we examined real-time contributions of auditory feedback to vocal control in Bengalese finches by using a computerized system to transiently alter feedback during precisely targeted portions of ongoing song (see Fig. 2). Songs for which feedback was perturbed (“feedback trials”) were randomly interleaved with songs produced under normal feedback conditions (“catch trials”). Because trials were randomly interleaved, any differences in song between these conditions must reflect a moment-by-moment influence of feedback on vocal control. This manipulation also enabled us to examine the consequences of perturbing feedback at a much more fine-grained temporal scale than was possible in previous studies that used continuously delayed feedback or irreversible surgical disruptions such as deafening (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997; Cynx and von Rad, 2001). We demonstrate that for Bengalese finch song, as for human speech, alteration of feedback elicits perturbations within tens of milliseconds to both the sequencing and timing of vocalizations.
Materials and Methods
Forty-one adult male Bengalese finches (>86 d old) were used in this study. During experiments, birds were housed individually in a sound-attenuating chamber (Acoustic Systems, Austin, TX) and food and water were provided ad libitum. Photocycles were maintained at a 14/10 light/dark pattern during development and throughout all experiments. All procedures were performed in accordance with established animal care protocols approved by the University of California, San Francisco Institutional Animal Care and Use Committee.
All songs were recorded from birds that were isolated in sound-attenuating chambers and hence were “undirected.” A computerized, song-activated recording system was used to detect and digitize song for later off-line analysis using software written in the Matlab programming language (Mathworks, Natick, MA).
For purposes of description and analysis, we use the term “syllable” to refer to individual acoustic elements of Bengalese finch song that are separated from each other by ≥5 ms of silence (Okanoya and Yamaguchi, 1997). This term has also been applied to groups of such acoustic elements that occur in Bengalese finch song in stereotyped sequences (Woolley and Rubel, 1997). We refer here to such stereotyped sequences as “motifs.” There may be several motifs in any given Bengalese finch song, and these are typically separated by intervals of longer duration than the median of all intersyllable interval durations (Woolley and Rubel, 1997).
Altered auditory feedback
Birds were housed individually in sound boxes for at least 24 h, during which time baseline songs were recorded using a song-activated recording system [A. Leonardo (California Institute of Technology, Pasadena, CA) and C. Roddey (University of California, San Francisco)]. Thereafter, a target syllable was chosen and spectral templates to that syllable were made. Syllable identification and targeting were identical to those of Kao et al. (2005). Targeted syllables were detected based on the pattern of spectral features present in the syllable (Fig. 2). After detection, prerecorded sounds (syllables from the male's repertoire) were played back at a short and fixed latency via a free-field speaker (∼70–100 dB) so that the singing bird experienced a superposition of extraneous feedback introduced by the experimenter onto his own normal feedback. This range of intensities approximates the intensity of the bird's own vocalizations during song production measured within 10 cm of the bird (Cynx and von Rad, 2001). On randomly interleaved catch trials (equal probability of feedback or catch), targeted syllables were detected, but disruptive feedback was omitted. With this experimental design, we could directly assess the real-time consequences of altering feedback by comparing songs produced under interleaved normal and disrupted conditions.
In the experiments with a single feedback stimulus, the durations of targeted syllables ranged from ∼40 to 150 ms (mean, 67.3 ± 5.3 ms), and the durations of feedback stimuli ranged between ∼60 and 110 ms (mean, 78.9 ± 4.6 ms). For 20 experiments, the feedback stimulus was identical to the targeted syllable, and for 35 experiments, the feedback stimulus was another syllable from the bird's repertoire. Consistent with studies of delayed auditory feedback in humans (Howell and Archer, 1984; Howell and Powell, 1987), the spectral structure of the feedback stimulus relative to the target syllable did not have a significant effect on either the prevalence or magnitude of effects (see Results), and consequently these two sets of data were pooled for analysis of effects on tempo and sequence. The delays from the onset of the targeted syllable to the onset of the playback syllable (feedback delay) ranged from ∼30 to 80 ms (mean, 54.4 ± 3.7). In some cases, the same syllable appeared in multiple sequences, and each sequence was treated as a separate data set. Some birds (n = 10) were exposed to altered auditory feedback (AAF) at systematically varied delays after syllable detection. In these experiments, birds received at least four, and at most seven, different delays. This translated into feedback onsets ranging from 1 to 165 ms after syllable onsets across birds and sequences. For these variable feedback delay experiments, catch trials and each of the different feedback delays occurred with equal probability. A subset of the birds (n = 13) used for studying effects of AAF was equipped with chronic neural recording implants for a separate experiment. The prevalence and magnitudes of effects elicited by AAF did not differ from those of nonimplanted birds. Data from all of the birds are pooled in Results.
The interleaved nature of catch and feedback trials in our experimental design allowed us to document on-line effects of AAF. Because trials were randomly interleaved, differences between catch and feedback trials effectively controlled for any variation over time in tempo or sequencing that might occur because of factors outside of experimental control, such as the motivational state of the bird or the time of day (Sossinka and Böhner, 1980; Derégnaucourt et al., 2005; Kao and Brainard, 2006). Our presumption, therefore, was that the differences that we detected between catch and feedback trials would predominantly reflect the acute influence of disrupted feedback on targeted portions of song. An additional possibility is that the catch trials (produced during sessions in which feedback was altered) would differ systematically from baseline songs produced before alteration of feedback. To test this possibility, in a subset of birds (n = 27), we recorded baseline songs before AAF was initiated and compared the structure of those songs with the structure of songs produced on catch trials. There was indeed relatively little difference between songs produced in these two “normal feedback” contexts. However, there was, on average, a 1.1% increase in the tempo of targeted segments during catch trials relative to baseline songs. This change was small relative to the difference between feedback and catch trials (4.6% on average; see Results), but it was nevertheless significant. The increase in song tempo between baseline and catch trials suggests that the songs produced during sessions of AAF were slightly faster than songs produced under baseline conditions. The magnitude of this tempo change was comparable to that elicited by changes in social context in zebra finches (Sossinka and Böhner, 1980; Kao and Brainard, 2006) and suggests that one possible source of increased tempo might be an enhanced arousal or motivational state of birds in AAF sessions relative to baseline sessions. For transition probabilities at targeted portions of song, there was no significant difference in “transition entropy” (see below, Sequence analysis) between baseline and catch trials (n = 15 birds; paired t test, p = 0.69; entropy baseline, 0.913 ± 0.077; entropy catch, 0.894 ± 0.073), again indicating that differences reported in Results predominantly reflect changes to song immediately precipitated by altered feedback.
For measurement of tempo changes, syllable boundaries were determined using an automated, amplitude-based segmentation algorithm. The same criteria were used for segmentation of syllables on interleaved catch and feedback-altered trials. Segment durations were measured from the start of the targeted syllable to the start of the first syllable after the termination of introduced feedback. An average of 111 trials (range, 26–292) in the catch and feedback conditions were analyzed to determine the magnitude and significance of any changes to segment duration.
When comparing the effects of AAF on song tempo between stereotyped sequences and branch points (i.e., sequences with variable transitions), our dataset consisted of 11 males that experienced AAF only at stereotyped sequences, 15 males that experienced AAF only at branch points, and 7 males that experienced AAF both at stereotyped sequences and branch points. This means that seven males are represented twice in the analysis. The exclusion of these seven males did not affect the significance of reported effects. The duration of the feedback and target syllables and the feedback delay were not significantly different across males in the stereotyped versus branch point groups.
Syllable structure and identification.
After amplitude-based syllable segmentation, we used custom Matlab software to facilitate propagation of user-defined syllable labels. Human observers initially labeled at least 30 exemplars of each syllable of the bird's repertoire. For each labeled syllable, and all remaining unlabeled syllables, a feature vector was calculated that contained the values for seven features described below. Each user-labeled syllable was defined in seven-dimensional feature space by the associated set of ≥30 feature vectors. Unlabeled syllables were assigned labels based on the shortest Mahalanobis distance from the vector representing the unlabeled syllable to each of the sets of vectors representing labeled syllables. All automated labeling was verified subsequently by human observers.
At a qualitative level, there were no gross changes to syllable structure on feedback-disrupted trials. Hence, there was no ambiguity in ascribing syllable identity for subsequent sequence analysis. To verify this qualitative impression, we also quantitatively compared syllables immediately after disruptive feedback with the same syllables on catch trials. We measured and compared seven features (mean frequency, frequency slope, amplitude slope, duration, spectral entropy, amplitude entropy, spectrotemporal entropy; see supplemental material, available at www.jneurosci.org, for definitions). These features were chosen because of their similarity to syllable features that have proven useful for documenting changes to syllable structure over time (Tchernichovski et al., 2000), and also because they can be collectively used effectively to segregate and identify different syllables of an individual Bengalese finch's song (our unpublished observations). For each syllable, we compared distributions of all features between catch and feedback trials using t tests. Across all feature comparisons performed in this manner, only 12 of 392 (3%) were significantly altered (with α = 0.01 for multiple comparisons), and there was no consistency to the affected parameters. A similar lack of consistent, robust changes in spectral features of syllables after delayed auditory feedback was noted in zebra finches (Cynx and von Rad, 2001). Because these changes were small, occurred infrequently, and were not consistent across syllables, it is not clear that they reflect real feedback-driven differences, and we do not report on them further here.
To quantify changes to syllable sequencing, we focused specifically on the probability of different transitions immediately after the targeted syllable (first-order transitions) in the presence and absence of AAF. Typically, there were one to three first-order transitions present during catch trials with normal feedback. The transition that occurred with the highest probability under these conditions was considered the “primary transition.” We measured the probability of primary and other transitions after the targeted syllable across all renditions produced during interleaved catch and altered feedback trials (the average number of trials per targeted syllable was 264 with a range of 59–605). Song interruptions were categorized as instances in which singing did not resume within 300 ms after the onset of the playback of the feedback syllable. Often, these interruptions were followed by introductory notes, which were then followed by a “restarting” of song. For analysis of the effects of AAF on syllable sequencing, we excluded instances of song interruptions, to separate the effects of AAF on syllable sequencing within the context of song from the effects of AAF on song interruptions. The inclusion of song interruptions as sequence transitions did not change the qualitative nature of the results. For some experiments, the targeted syllable occurred at the end of one motif and was followed by two distinct alternative motifs. In a minority of cases (n = 2), these two motifs began with syllables that were spectrally very similar to each other and that might otherwise have been considered the same syllable. Because the ensuing motifs were distinct, the syllables in question were treated as distinct for purposes of analysis. This treatment did not affect the significance of any of the reported effects.
The entropy of syllable sequencing was quantified as the transition entropy by the following equation: transition entropy = Σ − p i × log2(p i), where the sum is over all transitions after the targeted syllable and p i is the probability of the “ith” transition. In the results presented, song interruptions were excluded as transitions unless otherwise noted. The inclusion of interruptions did not alter the significance of the reported changes in entropy.
Changes in transition probabilities and probability of song interruptions for a given experimental session were analyzed using a likelihood ratio test (χ2). Changes in song tempo were analyzed using ANOVA or t tests (two tailed), and Tukey's HSD was used for post hoc comparisons. Unless otherwise indicated, the criterion for significance was set at α = 0.05 (two tailed). We found no correlation between the age of birds and the magnitude of AAF-induced behavioral effects. Hence, data from all of the birds were pooled for statistical analyses. Some birds were exposed to altered feedback at more than one sequence. In these cases, the magnitude of effects under consideration was quantified for that bird as the average effect across all sequences. Hence, each bird was represented only once in statistical comparisons, unless otherwise noted in Results.
The most consistent real-time effects elicited by altered feedback included changes to the normal sequencing of syllables, changes to song tempo, and song interruptions. The likelihood and magnitude of these effects depended on parameters of AAF including the timing of feedback perturbation relative to ongoing song. Consequently, we first document the nature of changes that can be elicited by altered feedback and then report on the stimulus dependencies of these changes.
One of the most consistent effects after AAF was interruption of ongoing song. On catch trials, the probability of spontaneous song terminations, after targeted syllables, was close to zero for all experiments. In contrast, on AAF trials, song interruptions frequently occurred immediately after the targeted syllable, or the completion of the subsequent syllable. Similar interruptions of ongoing zebra finch song have been reported previously in response to disruptions of auditory feedback and stroboscopic flashes (Cynx, 1990; Cynx and von Rad, 2001). In our experiments, interruptions were most prevalent during initial exposure to AAF and then rapidly habituated (see below, Persistence of effects), suggesting that they might reflect a startle response to unexpected stimuli. In contrast, the other effects that we report below did not habituate and correspondingly seem likely to reflect mechanisms that are tied more specifically to vocal control.
A striking and persistent effect elicited by AAF was disruption of the normal sequencing of syllables. Many songs of Bengalese finches contain “branch points” at which time a given syllable can be followed by two or more alternative sequences. In the majority of instances, the probabilities of syllable transitions at branch points were significantly altered in response to AAF. Figure 3 a illustrates an example in which AAF elicited such a change in transition probabilities. Here, syllable “e,” at a branch point of the bird's song, was targeted for detection. On catch trials, when feedback was unperturbed, the normal syllable patterns after “e” could be assessed (Fig. 3 a, left column). Under these conditions, the bird produced a primary transition to syllable “f” in 92% of cases and only infrequently produced a secondary transition to syllable “k” (7% of cases). On randomly interleaved feedback trials, detection of syllable “e” was followed by playback of a duplicate version of the syllable 40 ms after syllable onset. On these altered feedback trials (Fig. 3 a, right column), the probability of observing the two transitions was significantly changed: the probability of the primary transition (e → f) was reduced to 72%, whereas the secondary transition (e → k) became much more prevalent (21%; p < 0.01; interruptions occurred in the remaining 7% of cases). Hence, in this experiment, AAF changed the normal patterning of syllable sequences.
Similar significant effects on transition probabilities were frequently observed when AAF was delivered at branch points in song. Of 34 experiments conducted on 18 birds in which AAF was delivered at branch points, significant changes to transition probabilities were observed in 19 experiments (56%), and at least one significant disruption of transition probabilities occurred in 16 birds (89%). Interestingly, the transition that was primary under normal feedback conditions usually became less probable after feedback disruption (Fig. 3 b, rectangles). Correspondingly, many of the transitions that were less probable under normal feedback conditions became more likely after feedback alteration (Fig. 3 b, circles). This was the case even when song interruptions were included in the denominator in calculating the probability of transitions, as in Figure 3. The preferential disruption of the most probable (primary) transitions means that variability in syllable sequencing was generally increased after feedback perturbation. We quantified this increased variability by computing the transition entropy after targeted branch points in song. This measure quantifies the mean information (in bits) required to specify the next syllable of song given that the targeted syllable was just observed (see Materials and Methods). On average, when song interruptions were excluded as possible transitions, the transition entropy of feedback perturbed trials was 21% greater than the transition entropy of catch trials, indicating that there was indeed increased variability in syllable sequencing after feedback perturbation (mean entropy score ± SEM: catch trials, 0.886 ± 0.07; feedback trials, 1.068 ± 0.08; paired t test, p = 0.003; n = 18). If song interruptions were included as possible transitions, the increase in sequence variability after feedback perturbation was even greater. In this case, transition entropy was 53% greater on feedback than on catch trials (mean entropy score ± SEM: catch trials, 0.968 ± 0.08; feedback trials, 1.480 ± 0.08; paired t test, p < 0.001; n = 18). Hence, acute alterations in auditory feedback consistently reduced stereotypy in transition probabilities similar to the reduction in stereotypy after the complete removal of auditory feedback by deafening (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997).
In addition to changes in the relative probability of naturally occurring syllable transitions at branch points, feedback disruption could result in the production of completely novel sequences of syllables during what were otherwise completely stereotyped sequences. Such novel sequencing of syllables was elicited only rarely when we altered feedback by playing a single syllable. However, abnormal transitions were observed in experiments for three of four birds in which we used more complex feedback stimuli consisting of two to three syllables (Fig. 3 c) (p < 0.01 in all cases). In each instance of sequence disruption, the phenomenology was similar in two respects: (1) birds “jumped” to patterns of syllables that were normally present elsewhere in their songs and (2) for a given perturbation of feedback, the change in sequencing was stereotyped; that is, the bird usually jumped to the same syllable. The former is similar to sequence changes after deafening (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997), and the latter suggests that some internal states within the song generation premotor circuitry may be “closer” together than others.
Even when there were no apparent effects of feedback alteration on the sequencing of syllables, there were often effects on the timing of syllable production. Figure 4 a illustrates an example in which alteration of feedback caused a localized slowing of song. The duration from the onset of the targeted syllable to the onset of the following syllable (the “target segment”) was measured for randomly interleaved trials of normal and altered feedback (Fig. 4 a, top). Under normal feedback conditions, target segments had a mean duration of 156.4 ± 1.1 ms SEM (Fig 4 a, bottom left) (n = 81). Under randomly interleaved conditions of altered feedback, target segments were lengthened, on average, by ∼10 ms, reflecting a significant feedback-dependent slowing of song (Fig 4 a, bottom right) (mean target segment duration with feedback, 165.1 ± 2.0 ms SEM; n = 62; p < 0.01).
Such slowing of song was a consistent effect of feedback alteration. For cases in which feedback was perturbed during stereotyped sequences, there was a highly significant increase in target segment duration (Fig. 4 b, stereotyped) (H0: percentage of increase in target segment duration, 0; p < 0.001; n = 22 birds). These changes in song tempo were localized to the region of feedback perturbation in the sense that, on average, the duration of the subsequent region of song (“target + 1 segment”) was not significantly altered (H0: percentage of increase in target + 1 segment duration, 0; p = 0.147). Hence, there was neither a protracted slowing effect of altered feedback on song tempo that extended over multiple syllables nor a shortening of subsequent intervals that might have reflected compensation for the localized change in tempo. Consequently, the entire sequence of syllables that included the targeted syllable was of longer duration on feedback trials than on catch trials.
Because sequencing of syllables was disrupted more consistently by altered feedback at branch points of song than during stereotyped syllable sequences, we similarly asked whether there was a difference in the efficacy with which altered feedback could disrupt song tempo at branch points versus stereotyped sequences. For cases in which feedback was disrupted at branch points, we measured changes to the tempo of target intervals associated with the primary transition. We again found that altered feedback consistently caused a slowing of song that was localized to the targeted segment (Fig. 4 b, branch point) (for target segment: p < 0.001; for target + 1 segment: n = 18 birds, p = 0.571). Moreover, the magnitude of feedback-induced tempo changes at branch points was significantly greater than the magnitude of tempo changes that occurred when feedback was targeted at stereotyped portions of song (t test, p = 0.002). Hence, for both sequencing and timing, parts of song that were variable under normal conditions (i.e., branch points) were also more susceptible to disruption by altered feedback.
Dependence of song changes on feedback parameters
For humans, there is a well characterized relationship between the timing of delayed auditory feedback and the degree to which such feedback disrupts ongoing speech. In particular, auditory feedback is maximally disruptive when presented at delays of ∼150–200 ms (Lee, 1950; Howell and Archer, 1984), the approximate duration of a typical syllable of human speech. Here, we tested whether Bengalese finches similarly exhibit a systematic relationship between the timing of altered feedback and the magnitude of disruptive effects on vocalizations. We exposed birds to a fixed feedback element (a single syllable of the bird's song) and systematically varied the timing at which that element was played back in increments of 10–40 ms. Figure 5 a illustrates data from an individual experiment. Here, the targeted syllable of the bird's song was played back on randomly interleaved trials at delays ranging from 5 to 105 ms after the onset of the syllable. There was a significant effect of altered feedback on the local song tempo that varied with the feedback delay (Fig. 5 a). In this case, the effect was maximal for delays of 45–65 ms.
We assessed the degree to which feedback delay affected the magnitude of changes in song tempo for 14 sequences in 10 birds. Each sequence was disrupted with a single feedback element presented at four or more delays ranging between 1 and 165 ms. For nine of the 14 sequences, feedback presented at different delays had significantly different effects on tempo (ANOVA, p < 0.05). For these nine cases (in seven birds), the magnitude of tempo change was a nonmonotonic function of feedback delay, as illustrated in the example of Figure 5 a. We determined for each case the delay that was most effective in eliciting changes to tempo (Fig. 5 b). The most effective delays had a mean value of 56 ms (Fig. 5 b, filled arrowhead) (range, 26–72 ms; n = 9). Interestingly, this delay corresponds approximately to the duration of individual syllables of Bengalese finch song (mean syllable duration was 64 ms for 104 distinct syllables from songs of 18 Bengalese finches) (Fig. 5 b, open arrowhead). Hence, for both Bengalese finch song and human speech, there is a systematic relationship between the timing of ongoing vocalizations and the timing of feedback that is maximally disruptive.
For humans, modification of the spectral structure of delayed auditory feedback has relatively little influence on the degree to which speech is disrupted (Howell and Powell, 1987). We compared the magnitude of sequence and tempo changes between experiments in which the targeted and feedback syllables were identical (n = 20) versus experiments in which the feedback stimulus was a syllable different from the targeted syllable (n = 35). Just as with human speech, the magnitude of song effects (transition entropy, tempo change) was not significantly affected by the identity of the feedback syllable (t test, p > 0.05 for both).
Persistence of effects
The real-time effects of AAF on song tempo and syllable sequencing (differences between catch and feedback trials) persisted for as long as birds were followed. In contrast, there was a comparatively rapid attenuation of the tendency of AAF to elicit grosser disruptions of song such as song interruptions. To characterize the degree to which observed effects persisted over the course of a day, we analyzed changes in the likelihood and magnitude of effects for an extended exposure to AAF. We selected experiments in which birds were exposed to AAF for the first time, and for which there was continued exposure for at least 5.5 h (and/or a minimum of 25 targeted sequences produced in each condition; n = 25 birds). The targeted sequences were split chronologically into two equally sized data sets corresponding to “early” and “late” exposure groups. For interruptions, tempo changes, and sequence changes, we compared the magnitude of effects for early and late data sets. For interruptions, there was a significant and strong habituation over the course of continued exposure. The probability of interruptions after AAF, between early and late sessions, was reduced, on average, by 35.6 ± 5.0% (paired t test; n = 25 birds; p < 0.0001). For a small number of birds that were exposed to the same feedback disruption on subsequent days, there was an additional decline in song interruptions because of AAF. These data suggest that song interruptions may reflect, in part, a startle response to unexpected stimuli (Cynx, 1990; Cynx and von Rad, 2001).
In contrast, there was no significant habituation of the effects of AAF on syllable tempo or sequence over the same time period. For those cases in which AAF significantly slowed tempo, the magnitude of tempo differences between catch and interleaved feedback trials did not significantly change between the early and late sessions (n = 16 birds; p = 0.12). Likewise, for those cases in which AAF significantly changed sequence transitions at branch points, the magnitude of effects was not significantly different between the early and late sessions (n = 10 birds; p = 0.18). Whereas the rapid habituation of song interruptions indeed suggests that they may reflect a startle response, the persistence of sequence and tempo effects suggests that these changes are elicited by different mechanisms that are more specifically tied to vocal control.
An additional question of interest is whether exposure to AAF elicited any lasting changes to song structure. We optimized our experiment to identify on-line contributions of auditory feedback by randomly interleaving feedback and catch trials with equal probability and hence did not maximize the likelihood of eliciting lasting changes. Nevertheless, we examined the same set of birds analyzed above (which experienced perturbed feedback for the longest periods) for any evidence of lasting changes to song.
We first considered whether the structure of song on catch trials, in which feedback is normal, was different between early and late periods of exposure to AAF. For both transition entropy (n = 10 birds) and song tempo (n = 16 birds), there was no significant difference between early and late sessions in catch trials (paired t test, p > 0.05 for both). We additionally looked for systematic changes in the tempo of targeted segments by regressing duration of target segment against trial number for catch trials of individual experiments. A nonzero slope of regression would indicate systematic change over time in the tempo of the targeted portion of song. The mean slope across experiments was 0.03 ± 0.04 ms/trial and was not significantly different from zero (n = 18 birds randomly selected from entire population; H0: slope = 0; t test, p = 0.44). Similarly, for feedback trials there were no significant differences in the magnitude of tempo or sequence changes between early and late sessions (paired t tests, p > 0.05) or the slope of tempo changes across trials (H0: slope = 0; t test, p > 0.05). In a subset of experiments, we also collected data from control recordings before (pre) and after (post) the period of exposure to AAF. For these experiments, we compared song structure of targeted segments of song from prerecordings and postrecordings. Song tempo did not differ significantly between prerecordings and postrecordings (n = 11 birds; p = 0.69). Likewise, the entropy of transition probabilities at branch points did not change significantly between prerecordings and postrecordings (n = 4 birds; p = 0.46).
Hence, by several measures, we found no evidence for lasting changes to song precipitated by exposure to AAF under the conditions of our experiment. However, a strong test of the ability of AAF to drive such changes will likely require experiments in which the following conditions are met: (1) catch trials are reduced in number or eliminated to prevent birds from experiencing large numbers of songs in which feedback is normal; (2) feedback is altered for more protracted periods of time; and (3) normative data are collected to document and compare the degree to which song structure spontaneously varies over the relevant time periods.
Latency of effects on vocal production
The effects of feedback on sequencing and tempo occurred within tens of milliseconds. Because catch trials and feedback perturbation trials were randomly interleaved, the birds could not predict on a given trial whether production of a targeted syllable would be followed by normal or altered feedback. Hence, the elicited changes to song must reflect an influence of auditory feedback on vocal production that arises between the time at which feedback is altered and the time at which song first deviates from normal. For the example in Figure 6 a, the average interval from the point at which feedback was first altered to the onset of the next syllable of song during catch trials was 80 ms, and this interval was significantly increased on AAF trials. In this instance, therefore, the latency from perturbation of feedback to first detectable change, and then to song, was 80 ms. We might have detected an even shorter latency to alteration of song if we had been able to monitor any changes during the silent interval preceding the delayed syllable. Similarly, the time of occurrence of AAF relative to song constrained our ability to determine the minimum possible latency; in the illustrated example, if we had presented altered feedback slightly later, so that the onset of AAF preceded the next syllable by only 70 ms (instead of 80 ms), we might have been able to demonstrate an effect on song at this shorter latency. Hence, for this example, 80 ms represents an upper bound on the latency with which auditory feedback influenced motor production.
To assess how rapidly AAF could influence ongoing vocal production, we similarly calculated upper bounds for latencies of effects on song for all of the birds exposed to AAF. For each bird, we determined the earliest significant effect on song after the onset of feedback alteration. The data for the 21 experimental birds that exhibited the shortest latencies to effect are shown in Figure 6 b. The overall minimum latency was ∼70 ms, and for 12 birds, AAF influenced song in <90 ms. These results indicate that auditory feedback has rapid access to song premotor circuitry during ongoing song (Fig. 6 c).
Previous studies have documented numerous similarities between birdsong and human speech in the requirement of auditory feedback for the normal learning and maintenance of vocalizations (for review, see Doupe and Kuhl, 1999; Brainard and Doupe, 2000). However, although the real-time influence of auditory feedback on speech is well documented, evidence for a similar contribution of auditory feedback to ongoing songbird vocalizations is much more limited (Lee, 1950; Elman, 1981; Howell and Archer, 1984; Houde and Jordan, 1998; Larson et al., 2000; Cynx and von Rad, 2001). Previous studies in zebra finches have shown that song can be interrupted abruptly by unexpected stimuli (Cynx, 1990; Cynx and von Rad, 2001). We similarly found that such interruptions could be elicited by initial exposure to feedback perturbation, but these interruptions, suggestive of a startle response, rapidly habituated. One study also found that zebra finch song was disrupted by continuously delayed auditory feedback (Cynx and von Rad, 2001). However, the predominant effect reported in that study also appears to be an increase in song interruptions and difficulty in song initiation. Perhaps because of the continuous superposition of delayed feedback on the sound of the bird's song, more specific changes to syllable sequencing were not identified, nor did that study find an influence of delayed feedback on the tempo of vocalizations, as is observed for human speech. Continuous feedback disruption also rendered it difficult to assess the latency with which feedback influenced song. Here we circumvented some of these experimental limitations by transiently perturbing feedback with single delayed syllables presented at precisely targeted times during ongoing song.
In contrast to previous studies, our results in Bengalese finches clearly demonstrate systematic effects of auditory feedback on song within tens of milliseconds. Alteration of feedback caused localized disruptions of the normal tempo and sequencing of syllables. Moreover, song was maximally disrupted at delays approximating the duration of individual syllables of Bengalese finch song (∼60 ms). Each of these effects parallels the influence of delayed auditory feedback on human speech. Hence, our results further strengthen the functional similarities between songbirds and humans in the sensorimotor control of vocalizations.
Real-time influences of auditory feedback have important implications for models of song production. Our data indicate that feedback from vocalizations can reach and influence song premotor circuitry during the act of singing. Moreover, the short latencies of effects suggest that song pattern generation by the nervous system is not simply a “feedforward” readout from a central pattern generator (Fig. 7 a). Rather, the progression of song motor circuitry from one state to the next may also rely on sensory feedback from previous syllables contributing to the premotor activity for subsequent syllables (Fig. 7 b).
For Bengalese finch song, both feedforward and reafferent signals seem likely to contribute to pattern generation (syllable sequencing and tempo). Disruption of sequencing by perturbation of feedback (this study) or deafening (Okanoya and Yamaguchi, 1997; Woolley and Rubel, 1997) (Fig. 1) indicates that normal sequencing may rely on auditory feedback. However, even in deafened Bengalese finches, there is some persistence of normal syllable sequencing. This persistence is even more striking in the zebra finch after deafening (Nordeen and Nordeen, 1992; Brainard and Doupe, 2001) and indicates that at least a portion of song patterning circuitry operates in a feedforward manner or relies on nonauditory reafference, such as might be provided by proprioception (Konishi, 1965, 1985; Bottjer and Arnold, 1984; Suthers et al., 2002). The differences we observed in the sensitivity of stereotyped sequences versus branch points to altered feedback suggest that production of stereotyped sequences operate in a more feedforward manner than the production of branch points.
One intriguing neurophysiological observation, which is consistent with reafferent contributions to song pattern generation, is that the sound of a zebra finch's own song, when played to passively listening zebra finches, elicits activity in song premotor circuitry that resembles the activity present during singing (Dave and Margoliash, 2000). Moreover, the neural activity after the sound of one syllable from the bird's song elicits the appropriate premotor pattern associated with the next syllable of the song. This demonstrates that auditory information is conveyed to premotor structures in a manner that is functionally appropriate to contribute to normal pattern generation even in adult zebra finches, in which auditory feedback is not required for the moment-to-moment sequencing of syllables (Nordeen and Nordeen, 1992; Brainard and Doupe, 2001).
A long-standing hypothesis is that learning and maintenance of song rely in part on a comparison of current performance (as provided by auditory feedback) with desired performance [as stored in a sensory “template” (Konishi, 1965, 1985)]. However, the idea of “real-time feedback monitoring” faces both conceptual and experimental challenges. Conceptually, the nervous system confronts a formidable problem during a complex ongoing behavior, such as song or speech, to keep track of which aspects of performance-based feedback should be associated with which aspects of premotor activity. This problem is also complicated by inherent delays between timing of premotor activity and the timing at which sensory feedback can be conveyed to responsible premotor brain structures (Troyer and Doupe, 2000). Our data provide a very direct estimate of this delay of 70–90 ms (Fig. 6 c). Experimentally, the idea of real-time feedback monitoring faces the challenge that neurons in the premotor structures for song production generally have not been found to respond to auditory stimuli during the appropriate behavioral states. In the zebra finch, auditory responses to playback of the bird's own song are attenuated in awake versus anesthetized or sleeping birds (Dave et al., 1998; Schmidt and Konishi, 1998; Mooney, 2000; Nick and Konishi, 2001; Cardin and Schmidt, 2003; Rauske et al., 2003). Moreover, influences of feedback perturbation during singing on activity of song system neurons have not been observed (McCasland and Konishi, 1981; Leonardo, 2004). This has led to the suggestion that some or all of song learning might rely on “off-line” mechanisms that operate outside the context of singing, such as during sleep (Dave and Margoliash, 2000).
Despite these observations, our data indicate that in the Bengalese finch, signals arising from auditory feedback must reach song premotor nuclei at short latencies during ongoing song (Fig. 6). Although these data do not address whether and how such feedback-dependent signals participate in evaluation and modification of song, they unambiguously indicate that information arising from audition is present in song circuitry during ongoing production at an appropriate time to contribute to such processes.
Based on the current functional understanding of song system circuitry, it seems likely that feedback signals reach the song premotor nucleus HVC. HVC receives projections from auditory areas and is implicated both in syllable sequencing and in the control of tempo (Nottebohm et al., 1976; McCasland and Konishi, 1981; Margoliash, 1983; McCasland, 1987; Vu et al., 1994; Fortune and Margoliash, 1995; Yu and Margoliash, 1996; Schmidt and Konishi, 1998; Dave and Margoliash, 2000; Mooney, 2000; Nick and Konishi, 2001; Hahnloser et al., 2002; Cardin and Schmidt, 2003; Rauske et al., 2003; Reiner et al., 2004). However, auditory feedback signals have not been found in HVC or other song nuclei of zebra finches (McCasland and Konishi, 1981; Leonardo, 2004). This may indicate that auditory feedback signals are present elsewhere. Alternatively, this may reflect a comparatively weak contribution of auditory feedback to song production in zebra finches. Although both species depend on hearing for maintenance of song, the consequences of deafening are more rapid and severe for Bengalese finches. Moreover, disruptive feedback similar to that used in the current study, although sufficient to drive gradual deterioration of zebra finch song, has not been reported to have real-time effects such as those described here (Leonardo and Konishi, 1999) (but see Cynx and von Rad, 2001). Consequently, auditory feedback signals in the Bengalese finch brain may be more salient than in those present in the zebra finch. An additional characterization of these sensory feedback signals promises to inform our understanding of sensorimotor integration as it relates to vocal production as well as to identify neural signals that potentially contribute to learning and maintenance of song. More generally, such investigations may shed light on analogous systems, such as human speech, in which sensory feedback about performance contributes to learning and production of complex motor acts.
This work was supported by National Institutes of Health Grants F32 MH068055 and T32 NS07067 to J.T.S. and by the Klingenstein Fund, the McKnight Foundation, a Searle Scholars award, a Sloan Research Fellowship, and National Institutes of Health Grant R01 DC006636 to M.S.B. We thank A. Doupe, P. Sabes, S. Sober, T. Warren, and S. C. Woolley for their critical readings and J. Houde and C. Roddey for their programming contributions.
- Correspondence should be addressed to Michael S. Brainard, Keck Center for Integrative Neuroscience, Department of Physiology, Box 0444, University of California, San Francisco, San Francisco, CA 94143-0444.