Abstract
The relationship between muscle activity and behavioral output determines how the brain controls and modifies complex skills. In vocal control, ensembles of muscles are used to precisely tune single acoustic parameters such as fundamental frequency and sound amplitude. If individual vocal muscles were dedicated to the control of single parameters, then the brain could control each parameter independently by modulating the appropriate muscle or muscles. Alternatively, if each muscle influenced multiple parameters, a more complex control strategy would be required to selectively modulate a single parameter. Additionally, it is unknown whether the function of single muscles is fixed or varies across different vocal gestures. A fixed relationship would allow the brain to use the same changes in muscle activation to, for example, increase the fundamental frequency of different vocal gestures, whereas a context-dependent scheme would require the brain to calculate different motor modifications in each case. We tested the hypothesis that single muscles control multiple acoustic parameters and that the function of single muscles varies across gestures using three complementary approaches. First, we recorded electromyographic data from vocal muscles in singing Bengalese finches. Second, we electrically perturbed the activity of single muscles during song. Third, we developed an ex vivo technique to analyze the biomechanical and acoustic consequences of single-muscle perturbations. We found that single muscles drive changes in multiple parameters and that the function of single muscles differs across vocal gestures, suggesting that the brain uses a complex, gesture-dependent control scheme to regulate vocal output.
Introduction
The transformation from neural signals to muscle performance to behavioral output defines how the brain controls behavior. During both speech and birdsong, the coordinated activity of vocal muscles regulates the tension of vibrating tissue and gates airflow to generate vocalizations (Stevens, 1994; Larsen and Goller, 2002). Behavioral studies have demonstrated that both humans and songbirds can selectively modulate single acoustic parameters such as fundamental frequency (Jones and Munhall, 2000; Sober and Brainard, 2009; Hoffmann and Sober, 2014), raising the question of how the vocal system might selectively alter a single parameter. Although single muscles have been shown to control multiple behavioral parameters in other systems (Carvell et al., 1991; David et al., 2000), the functional role of individual vocal muscles remains poorly understood. The difficulty in accessing human vocal muscles hinders our understanding of how individual muscles shape speech (Ludlow, 2005). Furthermore, although studies in both humans and songbirds have demonstrated correlations between EMG activity and various acoustic parameters (Sawashima et al., 1973; Erickson, 1993; Goller and Suthers, 1996b; Elemans et al., 2008a), it is difficult to isolate the independent contributions of individual vocal muscles because the activities of different muscles can be strongly correlated (Titze et al., 1989; Goller and Suthers, 1996a, 1996b). Here, we test the hypothesis that single vocal muscles control multiple acoustic parameters in songbirds by combining traditional EMG measurements with on-line stimulation techniques that perturb the activity of single muscles.
A distinct but equally important issue is whether the transformation from muscle activation to behavior is fixed or context dependent across different vocal gestures. If fixed, single muscles would have the same effect on acoustics regardless of the performed gesture. Alternatively, the transformation could be context dependent, where the effect of a change in a muscle's activity depends on muscle performance and the biomechanical context of the vocal organ (including the activity of other muscles). A fixed transformation would allow the brain to increase activation of a particular muscle to enact a specific change in fundamental frequency, for instance, regardless of the vocal gesture produced. However, a context-dependent transformation would force the brain to account for the current state of the vocal organ when implementing acoustic changes. Though the transformation from muscle activation to behavior is strongly context dependent in some systems (Li et al., 2001; Sponberg et al., 2011), this question has not been examined in vocal control. We hypothesize that individual muscles have context-dependent effects on acoustic parameters during song.
We used three approaches to test our hypotheses. First, we examined correlations between electromyographic (EMG) activity recorded from single muscles with the acoustic parameters of individual vocal gestures. Second, we developed a novel stimulation assay in which individual muscles are electrically stimulated during behavior to measure the marginal effect of their increased activity on acoustic output. Third, we devised an ex vivo assay in which the vocal organ produced sound in isolation from the nervous system, allowing us to visualize the dynamics of the vocal organ and the effects of stimulation.
Materials and Methods
Surgical procedures
We used EMG and in vivo electrical stimulation to determine the function of vocal muscles during birdsong. All procedures were approved by the Emory University Institutional Animal Care and Use Committee. Before surgery, adult (>90-d-old) male Bengalese finches (Lonchura striata domestica) were anesthetized using 40 mg/kg of ketamine and 3 mg/kg of midazolam injected intramuscularly. Proper levels of anesthesia were maintained using 0–3% isoflurane in oxygen gas. Across all experiments, musculus syringealis ventralis (VS), m. tracheobronchialis dorsalis (DTB), and the expiratory muscle group (EXP) were implanted, though not necessarily all at once. Note that EXP is comprised of three sheet-like overlapping muscles (m. obliquus externus abdominis, m. obliquus internus, and m. transversus abdominis). As in prior studies (Hartley, 1990; Wild, 1993; Reinke and Wild, 1998; Goller and Suthers, 1999), we did not attempt to distinguish signals arising from these two muscles. We focused our experiments on VS, DTB, and EXP because their surfaces are comparatively large (1 mm × 3 mm) and easy to access; we were unable to reliably target other smaller muscles of the syrinx. The implanted muscles reflected a range of biomechanical functions. VS activity is highly correlated with fundamental frequency in brown thrashers (Toxostoma rufum; Goller and Suthers, 1996a) and based on morphological position is suggested to modulate medial labial tension in zebra finches (Taenopygia guttata; Düring et al., 2013). Direct observations during microstimulation in anesthetized northern cardinals (Cardinalis cardinalis) showed that DTB adducts the lateral labia (Larsen and Goller, 2002) and microstimulation of DTB in anesthetized starlings (Sturnus vulgaris) modulated and decreased air flow and sound amplitude (Elemans et al., 2008a). EXP drives expiration during breathing and song, with the majority of sound production taking place during the expiratory phase of the respiratory rhythm (Hartley, 1990). EXP drives expiration by regulating pressure in the system of air sacs that both supply and surround the vocal organ. The relationship between EXP activity and air sac pressure (a key parameter for vocal production) is complex and poorly understood, and appears to depend on syringeal gating of airflow and the volume of air in the air sac, which varies during expiration (Goller and Cooper, 2004).
The syrinx was accessed for electrode implantation via a midline incision between the furcula into the intraclavicular air sac. VS is located on the ventral portion of the syrinx near the midline (Fig. 1b). DTB is located just dorsally to a prominent blood vessel on the lateral and rostral portion of the syrinx. One pair of insulated, single-stranded stainless steel (25 μm diameter; California Fine Wire) or multistranded nickel-copper alloy (50 μm diameter; Phoenix Wire) wires (0.5 mm of insulation was stripped from the end) was inserted into VS and/or DTB muscles (Fig. 1). Wires were secured using either a small amount of tissue adhesive or 10-0 suture (Ethicon) and run from the syrinx to the scalp subcutaneously where the wires were soldered to a plug. The air sac was sealed using additional tissue adhesive. The skin was closed with 5-0 sutures (Ethicon). For surgeries to record from EXP, an incision was made dorsal to the femoral joint, rostral to the pubic bone. One pair of the same wires was inserted into EXP and secured using tissue adhesive or a suture and routed subcutaneously to a plug on the head as described above. Birds typically began singing within a week after the surgery. Voltage signals were amplified using a 10× headstage and a differential AC amplifier (A-M Systems Model 1700). EMG data were bandpass filtered between 300 Hz and 5 kHz, then digitized, rectified, and smoothed (5 ms box filter). Acoustic data were also recorded using a microphone (Countryman Associates Isomax 2 omnidirectional) and digitally bandpass filtered between 200 Hz and 10 kHz before analysis. The microphone was fixed 3 cm above the cage (28 × 28 × 28 cm), which sat in the middle of a semi-anechoic chamber (66 × 71 × 71 cm). Both acoustic and EMG signals were sampled at 32 kHz and digitized using an NI board (National Instruments; BNC-2090A).
Regression analysis
To relate muscle activity to acoustic variation, we measured the acoustic properties of each syllable rendition as well as the EMG activity associated with each syllable. For these experiments, 11 VS muscles (8 from the left side of the syrinx and 3 from the right side), 8 DTB muscles (all from the left side), and 9 EXP muscles (7 from the left side and 2 from the right side) across 20 adult male Bengalese finches were used. EMG activity was measured as the mean smoothed, rectified EMG in the 16 ms preceding the time of fundamental frequency measurement (Fig. 1c,d). The preceding time delay was determined by calculating the cross-correlation between EMG and sound, which typically reached a peak value between a 6 and 16 ms lag. To ensure that our regression results did not depend on our choice of a 16 ms wide time window, we performed additional analyses in which we performed regressions using the EMG signal from 8, 12, and 20 ms before the time of acoustic measurement. As described in Results, these alternate latencies yielded nearly identical results to those obtained using a 16 ms latency. Previous analysis using principal component analysis has identified fundamental frequency, amplitude, and spectral entropy as important axes of acoustic variation in Bengalese finch song (Sober et al., 2008). These acoustic parameters were therefore selected to reveal the behavioral effect of trial-by-trial variations in EMG activity. For each syllable, we defined a measurement time relative to syllable onset (Fig. 1a, vertical red line) that corresponded to a well defined spectral feature. Syllable onsets were defined based on amplitude threshold crossings. Fundamental frequency was quantified at the selected measurement time by quantifying peaks in spectral power as described previously (Sober et al., 2008). Amplitude was defined as the RMS amplitude in the 16 ms window surrounding the time of fundamental frequency measurement. This amplitude measurement was then converted to sound pressure level (dB SPL) by using a sound level calibrator (CAL73; BK Precision), which generated a 1 kHz sound wave at 94 dB SPL re 20 μPa, at the microphone. The following equation was used to calculated sound pressure level (dB SPL): where SPL is the sound pressure level, Vmeasured is the measured RMS voltage, and Vcalibrated, 94 dB is the measured RMS voltage during calibration with a 94 dB SPL re 20 μPa, 1 kHz sound wave. We calculated sound pressure at the microphone (i.e., measured level) and not at the bird's beak (i.e., source level). When time-varying sound amplitude waveforms were calculated, a sliding 16 ms window was used to calculate RMS throughout the waveform before being calculated in dB SPL. Spectral entropy was defined as the entropy of spectral power at the time of fundamental frequency measurement within a one octave window centered on the peak power according to the equation. where E is the spectral entropy, P(f) is the probability distribution of spectral power, and fmin and fmax are the frequency bounds surrounding the first harmonic (one half octave below and above the first harmonic, respectively).
Linear regressions were computed between EMG activity and each of the three acoustic parameters for each muscle–syllable pair (Fig. 1e). Here, a muscle–syllable pair refers to the EMG activity from one muscle associated with one vocal gesture. Therefore, if a bird sang five different syllables, while having two muscles implanted, it would contribute 10 muscle–syllable pairs to the dataset. Because no significant difference in proportion of significant regressions (regression slope significantly different from zero, p < 0.05) was found between muscles on the left and right side, they were combined for our analysis.
Because of the large number of linear regressions performed (336), we needed to confirm that the number of significant regressions we discovered were more than that found at chance. To do so, we permuted each of the four measured parameters (fundamental frequency, amplitude, spectral entropy, and EMG) for each muscle–syllable pair to break any correlations that may exist in the dataset. We then calculated a linear regression for every muscle–syllable pair in the permuted datasets and measured the proportion of significant regressions. This process was repeated 1000 times to establish the 95th percentile for the proportion of significant regressions. This value was then compared with what we found in the original dataset to determine whether the number of significant regressions found in the study were more than those that would be found by chance. This process was also used to determine whether the number of muscle–syllable pairs with significant regressions with two, three, and multiple (i.e., two or three) acoustic parameters in our dataset were more than expected by chance. All analysis was conducted in MATLAB (MathWorks).
Muscle stimulation
Muscle stimulation experiments were conducted using the same type of electrodes implanted for recording experiments, as described above. For stimulation experiments, nine vocal muscles (four left VS, three left DTB, and two EXP) were implanted in nine adult male Bengalese finches, in which five syllables were stimulated across all VS experiments, six syllables across all DTB experiments, and two syllables across all EXP experiments. Using custom LabView code (Tumer and Brainard, 2007), a specific syllable was detected using a spectral template. The template was created by taking the average spectrum of the target syllable across hundreds of unperturbed trials. A template match triggered the onset of stimulation (train burst width: 18.5 ms, pulse duration: 0.5 ms, interpulse period: 3 ms, 7 pulses) at a latency calibrated so that stimulation would perturb the subsequent syllable. Note that because our analysis focuses on the effects of stimulation at a minimal latency rather than maximal effects (see below), the duration of the stimulation train is unlikely to significantly affect our results. A stimulator (A-M Systems, Model 2100) was used to pass current through the electrodes to perturb the implanted muscle in 50% of the trials (to allow comparison with control, or “catch” trials). We determined the appropriate current level by first titrating the current amplitude to the lowest level at which we could observe acoustic effects in any parameter. Our stimulation experiments encompassed currents ranging from 75 to 500 μA. Our ex vivo data (see Results) show 500 μA was a suitable maximum for avoiding activation of neighboring muscles. Though there are differences between the in vivo and ex vivo paradigms that could cause the recruited volume of muscle tissue to differ at the same stimulating current (see Discussion), we believe that the ex vivo preparation suggests a reasonable range for stimulating currents. Acoustic data from stimulation experimented were measured in the same manner described in the correlation approach above.
To avoid the possibility of neural reflex loops affecting our results, we determined the minimum delay after stimulation onset at which the acoustic effects of stimulation could be observed. Because of timing variability in template recognition and natural variability in the song, the timing of stimulation fluctuated relative to the time of acoustic measurement (Fig. 2). As a result, we were able to analyze the effect of stimulation timing on acoustics. We were able to observe acoustic effects as early as 5 ms after stimulation (Fig. 2b), which corresponds well to maximum force generation at 4–5 ms in isolated zebra finch vocal muscle fibers (Elemans et al., 2008a). At longer latencies, the absolute magnitude of acoustic effects could increase, but remained the same sign in all cases. Acoustic effects for stimulation experiments were grouped by the delay between the time of stimulation and the time of acoustic measurements. To minimize the risk of capturing reflex effects, we used data in the range of a 5–20 ms delay. In one of the 11 stimulation experiments, there were not enough data points in this range, so we expanded the range up to 5–30 ms. This range is still smaller than the latency of a reflex loop estimated to be between 35 and 70 ms in a songbird study of somatosensory reflexes in the abdominal expiratory muscle group (Suthers et al., 2002). Furthermore, our analysis window falls below the delay for auditory feedback to influence vocal motor output, which based on prior studies we estimate to be at least 40 ms (Fee et al., 2004; Sakata and Brainard, 2008). Therefore, analyses like the one shown in Figure 2b were used to make sure that acoustic effects at longer latencies were not different in sign from those at a shorter latency. All stimulation trials that took place within the appropriate delay interval were compared with the catch trials using a t test on all three acoustic parameters. Finally, to determine whether stimulation effects were fixed for a given acoustic parameter, the fraction of stimulation experiments for a given muscle that elicited a significant increase in a given acoustic parameter as opposed to a decrease were tested against a binomial distribution with a probability of 0.5. Finding significance in this test indicated whether the effect distribution was significantly different from a 50/50 chance of getting an increase or decrease in that acoustic parameter.
In contrast to muscles in the syrinx, stimulation of EXP did not evoke measureable effects on vocal output. We believe that this was the case because the magnitude of stimulation necessary to measurably perturb vocal output would have caused significant discomfort to our subjects. Current amplitudes as high as 1 mA applied to EXP did not produce detectable changes in song output. However, at this current level, birds often truncated their song bouts even after repeated exposure to stimulation, a potential indication of discomfort. We consequently did not increase stimulation currents higher than 1 mA. Therefore, we were unable to assess whether EXP might drive modulations in multiple acoustic parameters using in vivo stimulation.
Ex vivo syrinx
We developed an ex vivo assay (Fig. 3a) to quantify (1) the muscle recruitment specificity as a function of currents used in the in vivo stimulation paradigm and (2) the resulting acoustic effects of targeted stimulation. We will briefly describe the setup here, while a more detailed description can be found in Elemans et al. (2015).
Experimental setup.
Animals were killed with isoflurane, and then the syrinx and associated blood vessels were removed from the animal. The syrinx was then moved to a covered Petri dish (Sylgard; Dow Corning), which sat on ice and contained oxygenated Ringer's solution. The bronchi and trachea were connected to tubing (Instech Solomon) in the chamber with 10-0 nylon suture (S&T). The syrinx was then perfused through its vasculature using oxygenated Ringer's solution. The syrinx was mounted in the chamber (Fig. 3a) with the ventral side up for VS experiments and the dorsal side up for DTB experiments. This chamber was covered by an airtight glass lid that allowed pressurization and visualization of the syrinx. Bronchial and chamber pressures were controlled separately via dual valve differential pressure PID controllers (Model PCD, 0–10 kPa; Alicat Scientific), referenced to atmospheric pressure. Imaging of the syrinx was performed with a high-speed camera (MotionPro-X4, 12-bit CMOS sensor; IDT) mounted on a stereomicroscope (M165-FC; Leica Microsystems). During experiments, a 1/4 inch microphone (Model 4939 with preamplifier 2669; Bruel & Kjaer) was placed 4 cm from the trachea to record sound, which was then bandpass filtered (20 Hz–22.4 kHz) and amplified (Nexus 2690-OS2; Bruel & Kjaer). The acoustic, camera trigger, and muscle stimulator trigger signals were digitized at 50 kHz (USB 6259, 16 bit; National Instruments). All control and analysis software was written in LabView (National Instruments) or MATLAB.
Muscle recruitment specificity.
We perturbed VS and DTB ex vivo using electrical stimulation to ensure that the currents used during in vivo stimulation specifically recruited the targeted muscle. We used the same electrode design and placement as those used in vivo. We used length change of muscle fibers to assess specificity of muscle recruitment. Fiber length was quantified ex vivo by placing spherical carbon beads of 40–200 μm diameter onto muscle fibers. Carbon beads are light, adhere well, provide stark visual contrast, and precisely follow length changes of muscle fibers (Elemans et al., 2011). The position of the carbon beads was determined using an automated autocorrelation function adapted from Elemans et al. (2011), which then tracked the position of the bead frame by frame throughout the video (Fig. 3b). On both the stimulated muscle and the neighboring muscle, we selected and tracked the distance between two spots to measure muscle contraction during the stimulation. The neighboring muscles for VS and DTB were m. tracheobronchialis ventralis (VTB) and m. syringealis dorsalis lateralis (LDS; Düring et al., 2013), respectively. Muscle length was low-pass filtered with a cutoff of 50 Hz to remove the effects of labial oscillation. We calculated muscle shortening during stimulation as Lagrangian strain, i.e., the percentage difference between muscle length at 40 ms after stimulation onset and at 135 ms after stimulation onset: where ε is the percentage strain, L40 ms is the length at 40 ms, and L135 ms is the length at 135 ms. The value 40 ms was chosen because it reflected the full amount of contraction whereas 135 ms was selected because contraction had already returned to baseline for tens of milliseconds (Fig. 3c). We selected a time after stimulation as opposed to before stimulation because the high-speed camera was triggered at stimulation onset. The strain was then plotted over several trials at currents ranging from 200 μA to 8 mA (note that all current amplitudes used in vivo were <1 mA). To test muscle recruitment specificity, we compared muscle length changes for the neighboring muscles (VTB and LDS) relative to the targeted muscles (VS and DTB, respectively). We used a t test to determine whether any length changes of the stimulated and/or neighboring muscles were significant.
Ex vivo acoustic effects.
The acoustic effects of muscle activation ex vivo were quantified in a manner similar to that used in vivo. Acoustic parameters were quantified (see above) 30 ms after stimulation onset. This delay was slightly larger than the one used in vivo because the temperature of the syrinx was lower in the ex vivo assay (∼26°C) compared with in vivo (41°C) and thus muscle fiber shortening took slightly longer (Rall and Woledge, 1990). The parameters measured here were compared with the parameters measured 20 ms before stimulation onset. Changes in amplitude and spectral entropy were measured as the difference between stimulation and control values. To allow us to combine data from different ex vivo preparations (where baseline fundamental frequency varied somewhat across specimens), each iteration's fundamental frequency (in Hertz) was converted to the fractional change from the control fundamental frequency (in cents) as defined below: where ΔFF is the change in fundamental frequency in cents, hs is the fundamental frequency 30 ms after stimulation onset in Hertz, and hc is the control (unstimulated) fundamental frequency in Hertz. A shift of 100 cents corresponds to one semitone (an ∼6% change in fundamental frequency). Shifts in all three parameters were then regressed against current size for current levels up to 1000 μA using linear regression.
Results
Vocal muscles control multiple acoustic parameters
To examine the relationship between the effects of vocal muscle activity and vocal behavior, we implanted EMG electrodes into 28 vocal muscles (11 VS, 8 DTB, and 9 EXP) in 20 adult male Bengalese finches. Following electrode implantation, we assessed any postsurgical changes in the acoustic structure of song by visually assessing spectrographic representations of vocal output. If surgery resulted in gross changes in acoustic structure such that presurgery song syllables could not be discerned following implantation, the resulting data were excluded from subsequent analysis. Data from two DTB cases and one VS case were excluded for this reason, resulting in 25 total muscles across 17 birds being used for our analysis. In these remaining cases, implantation had minimal effect on vocal acoustics. By calculating linear regressions between EMG and each of three acoustic parameters (fundamental frequency, amplitude, and spectral entropy), we determined how small variations in vocal muscle activity related to changes in acoustics. In these experiments, 78% (87 of 112 muscle–syllable pairs) of all pairs had a significant linear regression between EMG activity and at least one acoustic parameter (p < 0.05, linear regression slope significantly different from zero; Fig. 4). Here, a muscle–syllable pair referred to the EMG activity from one muscle associated with one vocal gesture. Therefore, if a bird sang five different syllables, while having two muscles implanted, it would contribute 10 muscle–syllable pairs to the dataset. These results demonstrate that our EMG and acoustic analyses were sufficiently sensitive to detect significant relationships between trial-by-trial variations in muscle activity and vocal behavior.
To investigate whether individual vocal muscles control multiple acoustic parameters, we counted the number of cases in which a muscle had significant regressions with at least two of the measured acoustic parameters. Importantly, 41% of all pairs (53% of pairs with at least one correlation) had significant regressions between EMG activity and at least two of the measured acoustic parameters (Fig. 4, purple and yellow regions). There were instances of pairs with multiple significant regressions across all recorded muscles indicating that this observation was not specific to one of the three selected vocal muscles. Randomly permuting the dataset and recalculating the number of significant regressions revealed that the proportion of significant linear regressions found in our data was significantly greater than chance (p < 0.001, permutation test; see Materials and Methods). Using the same permutation test, we found that the proportions of significant regression slopes between EMG activity and two, three, and multiple (i.e., two or three) acoustic parameters were also significant (p < 0.001 in all cases). Furthermore, although our primary analysis quantified the relationship between vocal acoustics and EMG activity in a 16 ms wide premotor window (see Materials and Methods), calculating EMG activity in a premotor window of 8, 12, or 20 ms width yielded results that were qualitatively identical to those of our primary analysis. These data suggest that individual vocal muscles are capable of modifying multiple acoustic parameters during song.
Because correlations typically exist between different vocal muscles' activity during singing (Goller and Suthers, 1996a, 1996b), a significant regression between a muscle and an acoustic parameter might have resulted from that muscle's activity having been correlated with the activity of another muscle that drove that parameter. To solve this problem, we developed a targeted stimulation technique (see Materials and Methods). This technique allowed us to perturb the activity of a single vocal muscle without altering other muscles' activity (see below, Ex vivo syrinx tests stimulation specificity), ensuring any acoustic effect was due to the marginal increase in the targeted muscle's activity. In these experiments, the syllable before the syllable of interest was detected using a spectral template (see Materials and Methods). Upon matching the template, half of the trials (“catch” trials) resulted in no stimulation (Fig. 5a). In the other half of trials, a short train of biphasic current between 75 and 500 μA was passed between the EMG electrodes within the implanted muscle (Fig. 5b). Figure 5c shows the difference between the catch and stimulated trials' spectrograms, with a distinct upward shift in frequency toward the end of the syllable. Significant acoustic changes, measured in the 5–20 ms following stimulation (see Materials and Methods), appeared in all three acoustic parameters (Fig. 6a–c) for the syllable shown in Figure 5 when the implanted muscle was stimulated with a current of 75 μA (note that Fig. 5 depicts the effects of stimulation at 500 μA to illustrate a larger effect). Across all stimulation experiments, we stimulated using the minimum current capable of evoking detectable acoustic effects (75–500 μA) and found that significant changes (p < 0.05, two-tailed, two-sample t test) were driven in at least two acoustic parameters in 80% of the experiments where VS was stimulated and 100% of the experiments where DTB was stimulated (Fig. 6d). Although we cannot be certain that the spread of stimulating current in vivo was equal to that ex vivo (see Discussion), we believe that the stimulation likely activated nerve fibers within the implanted muscle, which in turn activated associated motor units, but did not activate muscle fibers directly. This notion is supported by a previous in vitro study (Elemans et al., 2008a), which directly activated isolated syringeal muscle fibers and required current sizes much greater than those used in the present study. The stimulation results suggest that single vocal muscles in the syrinx can drive changes in multiple acoustic parameters, in agreement with the regression-based analysis described above.
Ex vivo syrinx tests stimulation specificity
Because we wanted to test the effects of stimulation on individual muscles in our in vivo experiments, we needed to ensure that the stimulating current only caused contraction in the targeted muscle. The ex vivo syrinx assay (see Materials and Methods) allowed us to directly visualize the effects of electrical stimulation and assess whether the stimulation parameters used in our in vivo studies produced muscle contractions that were restricted to the implanted muscles. Figure 3, b and c, show a trial where VS was stimulated with a current of 400 μA. While the length of VS changed appreciably (t(16) = 7.8, p < 0.01, two-tailed, paired t test), the neighboring muscle VTB showed no significant changes in length (Fig. 3c). Across all trials (Fig. 3d), the change in length of VTB was not significantly different from zero for currents of 500 μA or smaller (t(16) = 0.51, p = 0.62, two-tailed, paired t test). This finding suggests that 500 μA was an appropriate maximum current for in vivo perturbations of VS. This approach was also used to compare changes in muscle length for DTB with the neighboring LDS muscle (Düring et al., 2013). Similarly, changes in length of LDS were not found to be significant for currents of 600 μA or smaller (t(10) = 1.5, p = 0.92, two-tailed, paired t test), while stimulation drove significant changes in the length of DTB (t(10) = 5.5, p < 0.01, two-tailed, paired t test). This suggests that 600 μA is an appropriate maximum current for in vivo perturbations of DTB. Therefore, although the in vivo and ex vivo conditions differed in some respects (including electrode implant duration, temperature, etc.; see Discussion), these analyses suggest that the stimulation current amplitudes used in our in vivo experiments were sufficiently small enough not to evoke contraction in nonimplanted muscles.
Stimulation of single muscles ex vivo drives changes in multiple acoustic parameters
Additionally, we compared the acoustic effects of VS perturbation ex vivo to those found in vivo. This was an important experiment because the ex vivo paradigm allowed us to eliminate the descending neural control and feedback loops, which could affect the observed acoustic effects in vivo. While the ex vivo syrinx produced sound, VS was perturbed using the same stimulation parameters from the in vivo experiments (Fig. 7a). Ex vivo activation of VS showed a positive relationship between current size and fundamental frequency (p < 0.001, linear regression) and a negative relationship between current size and amplitude (p < 0.001, linear regression; Fig. 7b). These results were similar to those of our in vivo stimulation experiments, in which activation of vocal muscles drove changes in multiple acoustic parameters. There was no significant relationship between current size and spectral entropy.
Vocal muscles can have context-dependent effects on acoustic parameters
In addition to testing whether vocal muscles could control multiple acoustic parameters, we asked whether the transformation from vocal muscle activity to vocal behavior was constant across different vocal gestures or changed depending on the gesture produced. First, we investigated whether the sign of the regression slope between the activity of a single muscle and a given acoustic parameter changed across syllables within the same bird. While detecting cases with positive and negative regression slopes is a very simple analysis and might fail to capture more subtle context-dependent differences in muscle function, finding that a given muscle increased fundamental frequency during one syllable but decreased it during another syllable, for example, would provide unambiguous evidence of context dependency. Figure 8a shows a recording of EXP in one individual during song. During syllable A, EMG activity had a positive relationship with fundamental frequency (p < 0.001, linear regression), while EMG activity had a negative relationship with fundamental frequency (p < 0.001, linear regression) during syllable D (Fig. 8b). Across all muscles recorded from birds that produced more than one syllable with quantifiable acoustic structure, 50% (10 of 20) of those muscles exhibited context dependency for at least one acoustic parameter (i.e., had at least one syllable with a significant, positive linear regression slope and at least one with a significant, negative linear regression slope). Second, we tested this hypothesis by examining the variation in stimulation effects within the same muscle. In the syllable of one individual, VS stimulation caused amplitude to significantly decrease (t(1537) = 16.2, p < 0.01, two-tailed, two-sample t test), while in another individual's syllable, perturbation of VS caused amplitude to significantly increase (t(511) = 2.8, p < 0.01, two-tailed, two-sample t test; Fig. 9a–d). Similarly, DTB perturbation caused a significant decrease in fundamental frequency (t(108) = 3.5, p < 0.01, two-tailed, two-sample t test), while the same DTB stimulation during a syllable in another bird caused a significant increase in fundamental frequency (t(109) = 4.3, p < 0.01, two-tailed, two-sample t test; Fig. 9e–h). In one case, we were able to stimulate DTB during three different syllables in the same bird. While stimulation significantly increased entropy in two cases, it significantly decreased that parameter in the third case. Together, the results from EMG recording and stimulation data suggest that single vocal muscles can have context-dependent effects on acoustic parameters.
Transformation from vocal muscle activation to vocal behavior is fixed for some acoustic parameters and context dependent for others
Despite the presence of context dependency in some cases, we asked whether there were any transformations that did not vary across syllables. While a vocal muscle might exert a context-dependent effect on one acoustic parameter, that muscle might have an effect on another parameter that is constant across vocal gestures. Across all stimulation experiments, VS perturbation always significantly increased fundamental frequency, but the stimulation could increase, decrease, or have no effect on amplitude and spectral entropy (Fig. 10a). The relationship with fundamental frequency was found to be significantly different from an even distribution between increases and decreases (p = 0.031, binomial test). DTB perturbation always significantly decreased amplitude, but the stimulation could increase, decrease, or have no effect on fundamental frequency and spectral entropy (Fig. 10b). This relationship was found to be significantly different from an even distribution (p = 0.016, binomial test). These results suggest that a given muscle's activity can have a fixed relationship with one acoustic parameter (e.g., VS and fundamental frequency), while having a context-dependent relationship with other parameters.
Discussion
We investigated whether individual vocal muscles can control multiple acoustic parameters during song. We found that 41% of muscle–syllable pairs had significant regressions between EMG activity and more than one acoustic parameter (Fig. 4). Furthermore, in vivo stimulation showed that 90% of vocal muscle perturbations drove effects in multiple acoustic parameters (Fig. 6d). Ex vivo perturbation of single muscles drove significant acoustic effects in both amplitude and fundamental frequency (Fig. 7b). These results suggest that although songbirds can modulate single acoustic parameters independently (Tumer and Brainard, 2007), individual vocal muscles drive changes in multiple parameters.
We also asked whether the relationship between muscle activity and an acoustic parameter depended on the vocal gesture produced. Our regression analysis showed that 50% of recorded muscles displayed context dependency with at least one acoustic parameter (Fig. 8). Furthermore, in vivo stimulation experiments suggested that the acoustic effects of muscle perturbation varied by syllable (Fig. 9). Interestingly, despite the prevalence of context dependency, individual muscles drove constant-signed effects in some acoustic parameters. VS stimulation always increased fundamental frequency, and DTB stimulation decreased amplitude in every case (Fig. 10). Although such constant-signed effects are consistent with prior models of syringeal function (Goller and Suthers, 1996b; Elemans et al., 2008a), our results suggest the transformation from muscle activity to vocal behavior can be context dependent for some parameters and fixed for others.
Isolating the functional role of single vocal muscles
Our finding that individual muscles affect a range of acoustic parameters constrains the functional architecture of neural control. A selective modification of fundamental frequency would require the modulation of muscles that also control other acoustic parameters, demanding extensive coordination across muscles (and the neurons that control them) to avoid unwanted acoustic changes. The activity of multiple muscles could be coordinated in fixed proportions using a single neural control signal through a muscle synergy (Bernstein, 1967), in which fewer control signals are needed than vocal muscles. While synergies have been studied in other behaviors (Weiss and Flanders, 2004; Ting and Macpherson, 2005; Tresch et al., 2006), the computational technique has yet to be applied to EMG activity in vocal control and may further inform how the brain regulates vocal output.
To our knowledge, this study is the first to illustrate the effects of small variations in EMG activity during the production of individual song syllables. Previous studies have investigated the relationship between EMG activity and vocal output by combining data across many vocal gestures (Goller and Suthers, 1996a, 1996b). While that transformation appears complex and highly nonlinear, investigating smaller variations across renditions of the same syllable allows us to examine the marginal effects of single vocal muscles in shaping particular gestures.
This study implements on-line perturbation of vocal muscles via electrical stimulation to drive short-latency acoustic effects. The most obvious advantage to this approach is the elimination of the effects of correlations between muscles on our analysis. For instance, if a muscle's activity was highly correlated with the activity of another muscle that controlled fundamental frequency, a standard regression analysis would suggest that both muscles control fundamental frequency, when that may not be the case. Comparing stimulation to catch trials allows us to investigate the marginal effect of increasing a single vocal muscle's activity while, on average, holding other motor parameters constant. On-line stimulation could be used to design more complex experiments investigating how muscles work together by altering the relative timing and amplitude of stimulation across multiple muscles.
As described in Results, although correlation analyses suggested that EXP modulates multiple acoustic parameters, we were unable to confirm this using electrical stimulation. Because the relationship between EXP activity and air sac pressure is poorly understood (see Materials and Methods), we must exercise caution when interpreting such correlative data. Significantly, in vivo and modeling studies have shown that perturbing pressure can affect both the fundamental frequency and amplitude of vocal output (Elemans et al., 2008b; Amador and Margoliash, 2013), corroborating the results of our regression analysis on EXP activity (Fig. 4) and suggesting that EXP activity might modulate multiple parameters. Therefore, although future work will be necessary to establish the effects of EXP perturbation in vivo, EXP might, via its effect on air sac pressure, affect multiple acoustic parameters.
Additionally, we used an ex vivo paradigm to visualize the consequences of electrical muscle stimulation and determine a range of current amplitudes that would restrict stimulation to a single muscle. Although we aimed to match the conditions of ex vivo and in vivo stimulation (electrode type and placement, current amplitude/duration, etc.) as closely as possible, potential differences between these conditions limit our ability to exactly duplicate ex vivo the stimulation parameters used in vivo. First, as the muscle reacts to the implanted electrode over time in vivo, electrode impedance will likely increase, whereas changes in electrode impedance are presumably less prevalent over the short time course of acute ex vivo experiments. The potentially greater impedance in vivo suggests that a smaller volume of tissue would be activated (i.e., greater muscle specificity) than that observed ex vivo. Second, our ex vivo tests were performed at room temperature rather than at body temperature, potentially altering the spread of current and reducing the efficacy of axonal activation, and electrical stimulation itself may have altered muscle temperature. Although it is therefore impossible to be certain whether the same number of motor units were activated in the in vivo and ex vivo conditions, given that the currents used in vivo were far smaller than those which evoked significant recruitment of nearby muscles ex vivo, we consider it unlikely that accidental recruitment of adjacent muscles significantly influenced our results.
We found significant effects on fundamental frequency and amplitude from in vivo and ex vivo perturbations of VS. Interestingly, only in vivo stimulation produced changes in spectral entropy. One cause for this discrepancy could be the presence of a rigid tube instead of the upper vocal tract in the ex vivo assay (Fig. 3a). In both songbirds and humans, the upper vocal tract, which includes the trachea, tongue, and beak/jaw, filters the acoustic source and can alter its resonance to accentuate certain harmonics in the acoustic signal (Fant, 1970; Daley and Goller, 2004). The amount of filtering to certain harmonics could affect spectral entropy, which measures the noisiness of the spectral content. By substituting a rigid tube for the upper vocal tract, we may have prevented changes in spectral entropy due to stimulation. Additionally, the stimulation-induced changes in spectral entropy observed in vivo may reflect a mismatch between the source (the syrinx) and the filter (the upper vocal tract). If perturbing a vocal muscle results in a higher fundamental frequency, the resonance of the upper vocal tract may no longer match what is intended by the bird, possibly resulting in changes in spectral entropy in addition to fundamental frequency. At the very least, the ex vivo assay suggests that the technique used for in vivo stimulation can specifically perturb individual muscles and confirms that changes in fundamental frequency and sound amplitude are direct results of muscle perturbation rather than results of reflex mechanisms.
Vocal muscle function is context dependent
In many cases, the relationship between vocal muscle activity and individual acoustic parameters depended on the vocal gesture produced (Figs. 8⇑–10). Similarly, our prior work has shown that the relationship between spiking activity in single neurons in vocal motor cortex and acoustic output can vary across vocal gestures (as when increases in spiking appear to increase fundamental frequency in one syllable but decrease fundamental frequency in a different syllable; Sober et al., 2008). These results suggest that intersyllable differences in neural tuning in motor cortex reflect the context dependency of vocal muscle function.
The context dependency of vocal muscle function could result from nonlinearities in the force-producing properties of the muscles themselves or in the mechanics of the vocal motor system. Force generation is strongly affected by parameters such as muscle length and shortening velocity (Huxley, 1957), which will vary across vocal gestures, and by a muscle's recent history of contraction (Herzog and Leonard, 2000). For the EXP muscle group, context dependency might arise because the relationship between EXP activity and air sac pressure depends on the volume in the air sac and syringeal gating of airflow as noted above (Goller and Cooper, 2004). A recent study that modeled air sac pressure and labial tension concluded that the relationship between air pressure and frequency can take on different signs depending on where a given syllable is produced in the pressure-tension parameter space (Alonso et al., 2014). Similarly, in zebra finches, EMG activity in the VS muscle is correlated with frequency at low frequencies but no relationship is observed at higher frequencies (Goller and Riede, 2013). In human vocal studies, stimulation of the thyroarytenoid muscle caused both increases and decreases in fundamental frequency depending on whether the fundamental frequency and intensity of the vocalization were high or low (Titze et al., 1989), suggesting that context dependency is a common feature of vocal control in humans and songbirds. Examples of context dependency can also be found in nonvocal motor systems such as in cockroach locomotion (Sponberg et al., 2011) and in mammalian joints (De Luca and Mambrito, 1987). Context dependency therefore appears to be a general feature of complex motor control across species and systems.
Footnotes
This work was supported by National Institutes of Health grants P30NS069250, R01NS084844, and F31DC013753; Danish Research Council (FNU); Carlsberg Foundation; and the QuanTM program at Emory University. We thank Franz Goller for his invaluable lessons performing EMG surgeries, Diala Chehayeb and Harshila Ballal for animal care, and Robert Liu for an equipment loan.
The authors declare no competing financial interests.
- Correspondence should be addressed to Samuel J. Sober, Department of Biology, Emory University, Room 2006, 1510 Clifton Road NE, Atlanta, GA 30322. samuel.j.sober{at}emory.edu