Abstract
While hearing in noise is a complex task, even in high levels of noise humans demonstrate remarkable hearing ability. Binaural hearing, which involves the integration and analysis of incoming sounds from both ears, is an important mechanism that promotes hearing in complex listening environments. Analyzing inter-ear differences helps differentiate between sound sources–a key mechanism that facilitates hearing in noise. Even when both ears receive the same input, known as diotic hearing, speech intelligibility in noise is improved. Although musicians have better speech-in-noise perception compared with non-musicians, we do not know to what extent binaural processing contributes to this advantage. Musicians often demonstrate enhanced neural responses to sound, however, which may undergird their speech-in-noise perceptual enhancements. Here, we recorded auditory brainstem responses in young adult musicians and non-musicians to a speech stimulus for which there was no musician advantage when presented monaurally. When presented diotically, musicians demonstrated faster neural timing and greater intertrial response consistency relative to non-musicians. Furthermore, musicians' enhancements to the diotically presented stimulus correlated with speech-in-noise perception. These data provide evidence for musical training's impact on biological processes and suggest binaural processing as a possible contributor to more proficient hearing in noise.
Introduction
The mammalian auditory system adaptably encodes the sounds around us. One way it achieves this is through binaural processing: the ability to simultaneously integrate and analyze input from both ears. The auditory system is tuned to detect minute differences between the ears on the order of tens to hundreds of microseconds (Hudspeth, 1997; Grothe, 2003). Such precision is necessary for hearing in noise, which relies on timing mechanisms to segregate concurrent acoustic streams according to slight deviations in their locations, pitches, or sound qualities. Even when both ears receive identical auditory input (i.e., diotic stimuli that do not contain different level, timing, localization, or pitch cues), hearing thresholds in noise and intelligibility are improved (Plomp and Mimpen, 1979; Kaplan and Pickett, 1981; Davis and Haggard, 1982; Davis et al., 1990). Knowing how auditory experience refines diotic processing may help us better understand this facet of binaural hearing, thus guiding the development of habilitation and remediation approaches for communication abilities for which binaural hearing is a primary concern, including speech perception in noise.
While acoustic input is first relayed ipsilaterally, contralateral projections facilitate the integration of sensory input to both ears early in the central processing stream: binaural interactions are initiated within the superior olivary complex, the nuclei of the lateral lemniscus, and the inferior colliculus of the auditory brainstem (for review, see Moore, 1991). Binaural hearing proficiency does not rely on these subcortical mechanisms alone: during hearing, left and right ear inputs compete within both right and left auditory cortices (Fujiki et al., 2002). Degraded auditory experiences, such as those associated with prior reduced hearing ability (Hogan and Moore, 2003), and cognitive functions, such as attention (Shinn-Cunningham et al., 2005), influence how the inputs from both ears interact. Over the course of their training, musicians develop the ability to make sense of complex auditory environments as well as demonstrating enhanced perceptual learning abilities (Kühnis et al., 2013; Shook et al., 2013). Growing evidence indicates that musical experience is associated with strengthened perception and neural encoding of speech in the presence of noise (Zendel and Alain, 2009; Parbery-Clark et al., 2009a, 2011; Strait et al., 2012, 2013b; Strait and Kraus, 2013), in addition to cognitive abilities that modulate both perception and neural response properties (Parbery-Clark et al., 2009b, 2011; Kraus and Chandrasekaran, 2010; Strait et al., 2010; 2013a; Hanna-Pladdy and MacKay, 2011). Musicians' cognitive and speech-in-noise perceptual benefits have only been measured using binaural approaches, never directly compared across monaural and diotic conditions.
Despite the headway we have made toward delineating markers of musicianship on central auditory processing, the extent to which these enhancements involve diotic sound processing, if at all, remains unknown. To this aim, we compared musicians' and non-musicians' speech-evoked auditory brainstem responses (ABRs) across monaural and diotic listening conditions. While musician enhancements have previously been documented in ABRs collected in both monaural (Wong et al., 2007; Bidelman and Krishnan, 2010; Bidelman et al., 2011a,b; Strait et al., 2012, 2013a) and diotic conditions (Musacchia et al., 2007; Parbery-Clark et al., 2009a, 2012a,b; Strait et al., 2013a), we selected a stimulus for which there is no musician advantage when presented monaurally (see Results). In light of the importance of neural response timing for both binaural hearing and hearing in noise (Rance et al., 2007; Tzounopoulos and Kraus, 2009; Anderson et al., 2010), we centered our analyses on measures of neural timing (i.e., latency and consistency) in addition to neural response amplitude. We hypothesized that musicians' enhanced auditory processing reflects strengthened diotic processing. Accordingly, we predicted that musicians would demonstrate faster neural timing and increased response magnitudes as measured by the timing and amplitudes of discrete response peaks (i.e., earlier and larger peaks) and increased neural response consistency (i.e., higher between-trial response similarity) in the diotic relative to the monaural condition. We further predicted that these enhancements in diotic processing would relate to musicians' advantages for hearing in noise.
Materials and Methods
Participants
Thirty subjects (mean age 20 ± 2 years, 11 males) were recruited from the Chicago area. All subjects were native English speakers, had normal hearing thresholds (≤15 dB HL from 250 to 8000 Hz), and reported no histories of learning or neurological disorders. Participants were screened for normal IQ as measured by the Test of Nonverbal Intelligence (Brown et al., 1997). Subjects were categorized as musicians and non-musicians (N = 15 each). Musicians were self-categorized and had consistently practiced an instrument at least three times a week since 7 years of age. Non-musicians had <3 years of musical training at any point in their lives. See Table 1 for each group's musical experience. Groups were matched for age (F(1,29) = 0.855 p = 0.477), nonverbal IQ (F(1,29) = 2.13 p = 0.217), hearing thresholds (F(1,29) = 1.017 p = 0.552), and sex (χ(1,29)2 = 0.741, p = 0.389). Groups had equivalent neural timing to a 100 μs click stimulus (Wave V: p > 0.1) presented at 31.3 Hz and 70 dB sound pressure level (SPL). All subjects gave informed consent before participating in accordance with the Northwestern University Institutional Review Board and were paid for their participation.
Electrophysiology
Stimuli and recording parameters.
Auditory brainstem responses were elicited by a 40 ms speech syllable, /da/, at 70 dB SPL under three conditions: monaural right, monaural left, and diotic presentation. The five-formant speech stimulus was synthesized at a sampling rate of 10 kHz using a Klatt-based synthesizer (Klatt, 1980). The stimulus comprised an initial 10 ms onset burst and voiced formant transition between the consonant and the vowel with a fundamental frequency that linearly increased from 103 to 125 Hz. Voicing began at 5 ms. The first formant increased from 220 to 720 Hz. The second and third formants decreased from 1700 and 2580 to 1240 Hz and 2500 Hz, respectively. The fourth and fifth formants were constant at 3600 and 4500 Hz. While the stimulus was short and did not contain a steady-state vowel, it is perceived as a consonant-vowel syllable.
The responses were collected at a 20 kHz sampling rate using NeuroScan Acquire 4.3 recording system (Compumedics) with four Ag-AgCl scalp electrodes in a vertical montage (Cz active, forehead ground, and linked-earlobe reference). Electrodes were coupled to the skin with Ten20 conductive paste (Weaver) and affixed with medical tape. Contact impedance was 2 kΩ or less across all electrodes. Responses were off-line, bandpass filtered from 70 to 2000 Hz, with a 12 dB/octave filter roll-off. Sweeps with activity exceeding ±35 μV were considered artifacts and excluded. The speech stimulus was presented through insert ear phones (ER-3; Etymotic Research) in all three conditions with alternating polarities to limit the inclusion of stimulus artifact and cochlear microphonic (Gorga et al., 1985; Aiken and Picton, 2008; Skoe and Kraus, 2010). For each stimulus polarity in each condition, two subaverages representing 3000 artifact-free responses were averaged to generate an average response comprising 6000 response trials for each condition.
Diotic and monaural conditions were randomized across participants, ruling out contributions of neural fatigue or adaptation to any between-condition effects observed. Participants watched captioned movies of their choice to facilitate a calm yet wakeful recording session.
Data analysis
Timing and response magnitude.
To assess neural response timing and magnitude, we identified major response peaks corresponding to the stimulus onset (peaks V and A, occurring at ∼7 and 8 ms), spectrotemporally dynamic consonant-vowel transition (i.e., frequency following response, or FFR; peaks D–F, occurring at ∼23, 31, and 40 ms, respectively) and offset (peak O, occurring at ∼47 ms; see Fig. 1). Peaks were first identified by two independent peak pickers, after which the first author compared their judgments. The two peak pickers' judgments were identical across all peaks in all subjects except two instances; in these cases, the first author arbitrated. All peak pickers were blind to subjects' groups. All peaks were easily identifiable in all participants in all conditions, with peak minima extending beyond the magnitudes of the interpeak intervals. Peak minima also extended beyond the noise floor (i.e., magnitude of the prestimulus period).
Response consistency.
The consistency of an individual's response to the speech stimulus was measured over the 11–42 ms portion of the recording period—a period encompassing the FFR. Three hundred iterations of randomly selected, subtracted-polarity pairs of 3000 subaverages were created and the degree of correlation (Pearson's r) between the pairs was calculated. These correlations were then averaged to form a composite response consistency score. All processing was performed in MATLAB (MathWorks). Higher correlation coefficients indicate greater neural response consistency. All statistical analyses were performed on Fisher transformed z-values.
Speech in noise
Hearing in noise.
The Hearing in Noise Test (Biologic Systems; Nilsson et al., 1994) is an adaptive speech-in-noise test that uses Bamford–Kowal–Bench phonetically balanced sentences (Bench et al., 1979) superposed on a speech-shaped noise masker (65 dB SPL). The noise is acoustically fixed, being identical both within and across trials. Participants are instructed to ignore the noise and repeat 20 short semantically and syntactically simple sentences (e.g., “Sugar is very sweet”) that are presented from a loudspeaker placed 1 m directly ahead of the participant. Sentences are counted as correct only when all the words are repeated correctly. The intensity level of the target sentence varies based on the performance of the participant. Performance is assessed by determining the signal-to-noise ratio defined as the difference between the intensity of the target relative to the background noise at which a participant can repeat 50% of the target items correctly. A lower score reflects greater speech-in-noise ability.
Statistical analysis
Statistical analyses were conducted in SPSS 19.0 (SPSS). Two 3 (condition) × 6 (peak) × 2 (group) repeated-measures ANOVAs (RMANOVAs) were conducted to assess effects of condition on (1) peak latencies and (2) peak amplitudes in musicians and non-musicians. A 3 (condition) × 2 (group) RMANOVA was conducted to assess effects of condition on response consistency in musicians and non-musicians. Post hoc one-way ANOVAs and paired t tests were conducted to define statistically significant (p < 0.05) interaction effects. A one-way ANOVA was used to compare musicians' to non-musicians' speech-in-noise perception. Relationships among speech-in-noise perception and neural response characteristics were explored using Pearson's correlations. Assumptions of normality, linearity, outliers, and multicollinearity were met for all analyses, assessed by normality plots, Shapiro–Wilks test, Mahalanobis distances, and formal Variance Inflation Factor, respectively. All reported statistics reflect two-tailed significance values (α = 0.05).
Results
Summary of results
While musicians demonstrated faster neural timing and more consistent auditory brainstem responses to diotically presented sounds, musicians and non-musicians were not distinct in response to the same sound when presented monaurally. Furthermore, musicians demonstrated greater enhancements from monaural to diotic conditions relative to non-musicians, evident in musicians' faster neural timing, greater response magnitudes, and more consistent responses to diotic relative to monaural stimulation. Musicians' better speech-in-noise perception correlated with faster neural timing and greater response consistency in the diotic but not the monaural conditions.
Brainstem response
Timing
There was a significant main effect of peak (F(5,24) > 166,500.58, p < 0.0001), with no main effects of condition or group. There were significant interactions between condition and peak (F(10,19) = 3.32, p < 0.02) and between condition and group (F(2,27) = 4.62, p < 0.02) as well as a three-way interaction between condition, peak and group (F(10,19) = 5.76, p < 0.001). Musicians had faster responses than non-musicians to diotic but not to right- or left-monaural stimulation for all three peaks corresponding to the neural response to the formant transition (Fig. 1 and Tables 2 and 3; diotic peak D: F(1,29) = 2.38, p < 0.001; peak E: F(1,29) = 2.00, p = 0.06; peak F: F(1,29) = 1.83, p < 0.02; monaural transition peaks, all F < 0.5, p > 0.25). Musicians and non-musicians did not differ in response the speech sound onset or offset in any of the three conditions (all F < 0.5, all p > 0.1). Post hoc paired t tests indicated that musicians' responses to two of the transition peaks (i.e., D and E) and the offset peak (i.e., O) in the response to diotically presented stimuli occurred earlier than those to monaurally presented stimuli (diotic vs right presentation, peak D: t(14) = 3.8, p < 0.005; peak E: t(14) = 2.6, p < 0.02; peak O: t(14) = 2.3, p < 0.05; diotic vs left presentation, peak D: t(14) = 3.6, p < 0.005; peak E: t(14) = 2.6, p < 0.02; peak O: t(14) = 2.3, p < 0.05). Non-musicians' responses did not differ for any peak other than F for the diotic versus right monaural comparison and E for the diotic versus left monaural comparison; in both cases, this peak occurred earlier in the diotic than monaural condition (peak F, diotic vs right presentation: t(14) = 3.0, p < 0.02; peak E, diotic vs left presentation: t(14) = 2.8, p < 0.02). Onset peaks V and A diotically compared with monaurally presented stimuli did not differ in either group, nor did right- and left-monaural responses (all t ≤ 2.0, p ≥ 0.07).
We further quantified the degree of timing shift invoked by diotic stimulation by subtracting the latencies for peaks D–F and O in the left- and right-monaural conditions from the latencies of these same peaks in the diotic condition. Additionally, we computed a mean of D–F shifts to characterize a global formant-transition shift. Negative values indicate that responses to diotically presented stimuli precede those to monaurally presented stimuli. A one-way ANOVA confirmed greater timing shifts between monaural and diotic conditions in musicians relative to non-musicians (peak D: F(29) = 16.5, p < 0.001; peak E: F(29) = 5.7, p < 0.05; peak F: F(29) = 3.2, p = 0.08; transition composite: F(29) = 17.0, p < 0.001; peak O: F(29) = 6.5, p < 0.05). Figure 1D displays the transition composite shift for musicians and non-musicians.
Response magnitude
There were significant main effects of condition and peak (both F > 20.0, p < 0.001), with all six peaks having greater magnitudes in the diotic relative to either monaural condition and four of the six response peaks being larger with left-ear relative to right-ear stimulation. Furthermore, we observed a three-way interaction between condition, peak, and group (F(10,19) = 2.4, p < 0.05). Post hoc paired-samples t tests indicated that musicians' had marginally significant enhancements in response magnitudes to diotically relative to right-ear but not left-ear monaurally presented stimuli relative to non-musicians for three of the six response peaks (diotic vs right presentation, peak A: F(1,29) = 3.7, p < 0.06; peak D: F(1,29) = 8.9, p < 0.01; peak O: F(1,29) = 3.9, p < 0.06; diotic vs left presentation, all F < 0.3, p > 0.1).
Response consistency
Main effects of condition (F(2,27) = 32.1, p < 0.0001) with response consistency being greater in the diotic rather than the monaural conditions, and group (F(1,28) = 6.6, p < 0.02), with musicians demonstrating greater neural response consistency (r = 0.24, SD = 0.132) across conditions relative to non-musicians (r = 0.16, SD = 0.116), were found. There was also a significant group × condition interaction (F(2,27) = 13.5, p < 0.001). Post hoc one-way ANOVAs indicated that musicians' greater response consistency across conditions was driven by an enhancement in the diotic condition; musicians had greater neural response consistency than non-musicians only in response to diotic but not monaural stimulation (Fig. 2; diotic: F(1,29) = 33.4, p < 0.001; right and left: both F < 0.5, p > 0.5)
Relationships between neural measures and speech-in-noise perception
Musicians demonstrated superior hearing-in-noise ability compared with age-matched non-musicians (F(1,29) = 6.606, p < 0.01). Across all subjects, earlier peak timing and greater neural response consistency in response to diotically presented stimuli related to better speech-in-noise perception (peak E timing: r = 0.46, p < 0.02; response consistency: r = −0.591, p < 0.001). These relationships were exclusive to the diotic condition (right- and left-monaural conditions: all p > 0.1).
Discussion
We herein demonstrate musicians' enhancements for processing diotic sounds, with musicians demonstrating faster, larger, and more consistent responses to speech sounds presented simultaneously to both ears but not to the same sounds presented alone to the right or left ears. Furthermore, these physiological indices relate to speech-in-noise perception, suggesting that musicians' advantages may be driven, at least in part, by better processing of diotically presented sounds. Musicians' enhancements to the diotically presented stimulus may reflect music listening's persistent reliance on binaural sound processing; localization of a sound source in ensemble playing or conducting, for example, requires the precise and robust encoding of sounds by both ears to differentiate sounds presented closer to one ear from those presented closer to the other. This extensive auditory experience may equip musicians to preferentially process diotically presented sounds even in the absence of interaural differences providing timing or level cues.
Binaural processing and its malleability with sensory experience
Acoustic signals that arrive at the two ears are initially transmitted ipsilaterally along the auditory pathway. While sound input to each ear at first remains separate, it quickly reaches three points of convergence: the superior olive, the nuclei of the lateral lemniscus, and the inferior colliculus (Bocca, 1955; for review, see Moore, 1991; Wallace et al., 1996; McAlpine et al., 2000). At these points, the auditory system integrates information, detecting differences in the phase, intensity, and timing of the signals from the two ears (Moore, 1991). While cortical auditory structures receive integrated sound input from both ears, their respective weightings within auditory cortex are not predetermined—they can be modulated by auditory experiences early in life (Hogan and Moore, 2003) and the engagement of cognitive functions during the listening process (Shinn-Cunningham et al., 2005).
While the diotic stimulus presentation used here does not tap into traditional binaural processing cues such as timing and level differences, it may be that musicians' diotic advantage reflects binaural processing's known experience-related malleability. Although the structural development of the human auditory brainstem is thought to be complete within the first two years of life (Moore et al., 1995; Moore and Linthicum, 2007) and is guided by genetics (Clopton and Silverman, 1977; Taniguchi, 1981), how the auditory system makes use of this circuitry to achieve sound identification and localization is experience dependent. The functional organization of brainstem nuclei involved in binaural hearing is modulated by sensory experiences that occur during development (Clopton and Silverman, 1977; Knudsen et al., 1984a, b; Knudsen, 1985; for review, see King et al., 2000) and, to a lesser extent, adulthood (Kacelnik et al., 2006). Because of this, even the mature auditory system is capable of adapting to changes in the balance between the two ears (Bauer et al., 1966; Florentine, 1976; Hofman et al., 1998; Shinn-Cunningham et al., 1998; Van Wanrooij and Van Opstal, 2005). Experience-related changes in subcortical binaural hearing structures may stem, at least in part, from the structural and functional organization of auditory cortex that continues into adolescence (Gleeson and Walsh, 2000; Moore and Guan, 2001; Moore and Linthicum, 2007) via top-down modulation of neuronal response properties. In fact, deactivating descending inputs to inferior colliculus prevents sound localization using binaural hearing cues, evident in conditions where the balance in hearing between the two ears has been experimentally altered (Bajo et al., 2010).
While we have interpreted our results in the context of training-related changes in musicians' binaural sound processing, group comparisons cannot disentangle innate and training-related factors contributing to musicians' binaural processing enhancements. Future studies, most notably longitudinal work, should define both the developmental trajectory of monaural and binaural auditory processing advantages in musicians and their direct relationships to training over and above innate predispositions. Additionally, it is possible that our stimulus presentation mode—lacking interaural timing or level cues—enabled musicians to take advantage of their “better ear.” This alternative interpretation would support the possibility that, when presented with diotic stimulation, musicians preferentially benefited from the use of their dominant ear, resulting in an enhanced physiological response.
Interpretation according to previous monaural and binaural approaches in musicians
While these results evidence enhanced diotic but not monaural processing in musicians, musician enhancements have previously been documented in ABRs collected in both monaural (Wong et al., 2007; Strait et al., 2012, 2013a) and diotic conditions (Musacchia et al., 2007; Parbery-Clark et al., 2009a, 2012a,b; Strait et al., 2013a). To test our hypothesis that musicians' enhanced speech-in-noise perception reflects strengthened processing to diotically presented sounds, the present study intentionally used a stimulus that did not elicit a musician advantage when presented monaurally. This stimulus was unique in that it did not contain the acoustically rich vowel portion of the syllable, which is known to elicit more robust neural encoding of the spectral components of speech in musicians (Musacchia et al., 2007; Wong et al., 2007; Parbery-Clark et al., 2009a, 2012a; Strait et al., 2012; Strait and Kraus, 2013). The absence of this vowel region could account for the lack of monaural enhancement observed in musicians to this stimulus due to decreased acoustic content and less backward masking of the formant transition by the broadband vowel. Still, further work should disentangle acoustic parameters that successfully induce musician enhancements in both monaural and diotic conditions from those that do not. Outcomes may shed light on musical training as a model of experience-related neuroplasticity, specifying the markers of musicianship against aspects of auditory processing that remain unaffected.
Practical applications
These results stimulate further investigation into how auditory training might be used in clinical populations that demonstrate compromised binaural processing. Binaural hearing's relationship to speech-in-noise perception (Nábělek and Robinson, 1982; Ellermeier and Hellbrück, 1998; Strouse et al., 1998; Hawley et al., 2004) encourages its relevance for school-aged children given that their everyday learning occurs in noisy classroom environments; in fact, a typical in-session elementary school classroom is ∼60 dB (Rosenberg et al., 1999; Bradley, 2005; Cameron et al., 2006). Given that conversational voice levels are ∼50 dB, it is not surprising that low signal-to-noise ratios can make hearing what the teacher is saying a challenging task (Barton, 1989; Blair, 1990). Our results suggest that strengthening the neural pathways involved in binaural processing through auditory training, such as music lessons, may alleviate some of the classroom difficulties faced by children that involve understanding speech in noise.
Future directions
Future work aimed at defining additional behavioral and neural assessments of binaural processing differences in musicians and non-musicians, including assessments comprising more traditional interaural timing and level difference paradigms, will expand our understanding of the extent to which binaural processing is experience dependent. Furthermore, this work might consider subgroups of musicians, differentiated by the degree to which they depend on binaural hearing cues in their musical practice. While musicians have enhanced sound localization relative to non-musicians (Tervaniemi et al., 2006), there is also the possibility of between-musician group differences. Conductors, for example, must regularly locate sound sources within a busy orchestra and demonstrate strengthened sound localization (Münte et al., 2001); accordingly, conductors may demonstrate binaural processing enhancements above and beyond those reported here in performing musicians. Similarly, orchestral musicians (e.g., string players) may demonstrate enhancements relative to soloists (e.g., pianists). Throughout this work, musicians' cognitive and speech-in-noise perceptual benefits might be assessed using both binaural and monaural approaches, in addition to assessing binaural unmasking and sound localization, which are both dependent on hearing with both ears.
Notes
Supplemental information can be found online at http://www.soc.northwestern.edu/brainvolts/.
Footnotes
- Received December 11, 2012.
- Revision received September 5, 2013.
- Accepted September 11, 2013.
This work was supported by an Undergraduate Academic Research Grant to E.H. and the Knowles Hearing Center and National Science Foundation 0921275 to N.K.. We thank the subjects for their participation in this study and Dylan Levy for his assistance with data analysis.
- Correspondence should be addressed to Dr. Nina Kraus, Auditory Neuroscience Laboratory, Northwestern University, 2240 Campus Drive, Evanston, IL 60208. nkraus{at}northwestern.edu. www.brainvolts.northwestern.edu.
- Copyright © 2013 the authors 0270-6474/13/3316741-07$15.00/0