Abstract
Pitch, one of the primary auditory percepts, is related to the temporal regularity or periodicity of a sound. Previous functional brain imaging work in humans has shown that the level of population neural activity in centers throughout the auditory system is related to the temporal regularity of a sound, suggesting a possible relationship to pitch. In the current study, functional magnetic resonance imaging was used to measure activation in response to harmonic tone complexes whose temporal regularity was identical, but whose pitch salience (or perceptual pitch strength) differed, across conditions. Cochlear nucleus, inferior colliculus, and primary auditory cortex did not show significant differences in activation level between conditions. Instead, a correlate of pitch salience was found in the neural activity levels of a small, spatially localized region of nonprimary auditory cortex, overlapping the anterolateral end of Heschl's gyrus. The present data contribute to converging evidence that anterior areas of nonprimary auditory cortex play an important role in processing pitch.
Introduction
Pitch, the perceptual correlate of sound periodicity, is critical to music and speech perception and the analysis of complex auditory scenes (Darwin and Carlyon, 1995). For music, pitch underlies our ability to perceive melodies; for speech, pitch provides important prosodic and semantic cues in certain languages. Functional imaging studies in humans have revealed brain areas engaged in processing pitch or pitch changes (Zatorre et al., 1994, 2002; Griffiths et al., 1998, 2001; Gutschalk et al., 2002; Patterson et al., 2002; Krumbholz et al., 2003; Warren and Griffiths, 2003; Warren et al., 2003). However, none has localized representations of pitch strength or salience, an important variable for everyday listening. Increasing pitch strength leads to a more musical sensation and improves listeners' abilities to make judgments of small changes in periodicity (Houtsma and Smurzynski, 1990). Previous imaging studies (Griffiths et al., 1998, 2001) have shown that stimuli of increasing temporal regularity, going from noisy to highly periodic, produce increasing activity in structures from cochlear nucleus to cortex. This finding may relate to pitch salience because increasing the temporal regularity of a stimulus also increases its perceived pitch strength. However, the previous work leaves unresolved whether (physical) stimulus regularity or (perceived) pitch salience is reflected in brain activation levels.
The present functional magnetic resonance imaging (fMRI) study investigates neural representations of pitch salience using harmonic complex tones with varying degrees of pitch salience but identical temporal regularity. We achieved a dissociation between salience and regularity by filtering the tone complexes to remove either low- or high-numbered harmonics. Tones consisting of only high-numbered harmonics elicit a much weaker pitch salience than tones with low-numbered harmonics while still maintaining the same temporal regularity in the sense that they remain perfectly periodic (Carlyon and Shackleton, 1994). The decrease in pitch salience as the lowest present harmonic number is increased seems to be related to the filtering that occurs in the cochlea (Houtsma and Smurzynski, 1990; Bernstein and Oxenham, 2003). The absolute bandwidth of cochlear tuning increases with increasing center frequency. Therefore, the first few harmonics of a complex tone are individually represented, or resolved, in the auditory periphery, whereas harmonics beyond approximately the 10th are unresolved, combining in the auditory periphery to form complex waveforms that repeat at the fundamental frequency (F0) (see Fig. 1a-c).
Complex-tone stimuli with fixed temporal regularity and different degrees of pitch salience. The schematic diagrams illustrate time (a) and frequency (b) representations of a complex tone (vertical lines). Auditory filters (solid and dotted curves in b) are excited by single harmonic components at low frequencies. Consequently, firing patterns after cochlear filtering represent individual low-numbered harmonics, and hence the components are resolved (c, top). In contrast, because filter band width increases with center frequency, several components excite high-frequency filters. The harmonics interact within the filter such that firing patterns after cochlear filtering result in complex waveforms whose repetition rates correspond to the fundamental frequency (c, bottom). Thus, the harmonic components are unresolved by the auditory system. Perceptually, complex tones comprising only unresolved harmonics produce weaker pitch than those comprising resolved harmonics. d, Because of our choice of F0 ranges and spectral regions, the complex tones in condition 2 included only unresolved harmonics and hence produced weaker pitch salience than the rest of the tonal conditions. Although a total of four tones were created per condition, for ease of visualization, only one is shown here. The wide-band background noise that accompanies the tones was added to mask distortion products at lower frequencies arising from nonlinear interactions between components at the level of the cochlea.
We compared activity between four main stimulus conditions. Two were formed by filtering harmonic tone complexes into low and high spectral regions to produce stimuli of strong (condition 1) and weak (condition 2) pitch salience, respectively (see Fig. 1d). Two more conditions were also formed by filtering complex tones into low and high spectral regions, but the fundamental frequency of the tones was sufficiently high for both regions to comprise low-numbered harmonics with a similar (strong) pitch salience (see Fig. 1d, conditions 3, 4). To supplement the tone data, some experiments used two extra conditions that used noise stimuli and thus had a pitch salience lower than any of the tone conditions.
Materials and Methods
Subjects and stimuli. The experimental protocol was approved by the Institutional Review Boards of the Massachusetts General Hospital, Massachusetts Eye and Ear Infirmary, and Massachusetts Institute of Technology. Subjects (five males, one female; between 23 and 28 years of age) had normal hearing thresholds and provided written informed consent to participate in the study.
Complex tones (harmonic components added in sine phase) used in conditions 1 and 2 (Fig. 1d) were obtained by bandpass filtering harmonic tones with F0s at semitone intervals between 80 and 95 Hz into low (340-1100 Hz) or high (1200-2000 Hz) spectral regions, respectively. Harmonic tones with F0s at semitone intervals between 240 and 285 Hz were filtered into the same low and high spectral regions to yield conditions 3 and 4. Two additional conditions (5, 6) were created from wide-band white noise (12 kHz), again filtered into the same low and high spectral regions. Filtering was accomplished with a 256 point Hanning window in the frequency domain. The cutoff frequencies were defined at the points in which the filter gain was attenuated by 6 dB. Both tones and noise were presented as 300 msec bursts (including 10 msec cosine ramps) at a rate of 1.5 per sec. Within each tone condition (1-4), tones with different F0s were ordered randomly.
To mask distortion products at lower harmonic frequencies caused by nonlinear interactions at the level of the cochlea, the tones were embedded at 20 dB above their masked threshold (estimated from mean data in pilot psychophysical measurements to be within 2-4 dB of each other in all conditions) in continuous Gaussian wide-band background noise. Sounds were presented at an overall level of 69 dB sound pressure level (SPL) over headphones. The different sound conditions were ordered randomly and presented using a standard on-off stimulation paradigm in which sequences of tone or noise bursts (embedded in background noise) were presented for 32 sec, followed by a silent interval of the same duration. Each condition was presented between 8 and 12 times per experiment.
Imaging protocol. Functional imaging was performed on a 3 tesla scanner (Allegra; Siemens, Munich, Germany) using an echo-planar imaging sequence [gradient echo; echo time (TE), 30 msec; flip angle, 90°]. Eleven parallel near-coronal slices (thickness, 4 mm; gap, 1.32 mm) were imaged to span the entire superior temporal lobe. The orientation of the imaging planes was chosen to include the inferior colliculus and first Heschl's gyrus in the fifth (from the most posterior) imaging slice. Anatomical images [repetition time (TR), 700 msec; TE, 12 msec; flip angle, 70°] of the functionally imaged slices were also obtained. Additional anatomical images optimized for gray matter-white matter contrast were acquired in five of six subjects in a separate imaging session at 1.5 tesla (Sonata; Siemens) (inversion time, 1100 msec; TR, 2539 msec; TE, 3.25 msec; flip angle, 7°; matrix, 256 × 256; slice thickness, 1 mm). (One subject, who had participated previously in two functional imaging sessions and in whom our findings were replicated, was unable to participate in the anatomical scanning session required for the inflation analysis. Data for this subject could not be included in the group analysis.)
An important component of the functional imaging was the handling of the scanner acoustic noise. Every image acquisition produced a tone-like transient (∼115 dB SPL at 1.4 kHz, measured as described by Ravicz et al., 2000) that can mask the stimulus sound and reduce the dynamic range of image signal changes. To minimize these confounding effects, the 11 imaged slices were acquired in very close temporal proximity (clusters) after a long interacquisition interval (∼8 sec) (Edmister et al., 1999; Hall et al., 1999). The effects of the noise were further reduced by the 30 dB attenuation provided by the headphone system used for sound delivery (Ravicz and Melcher, 2001). Acoustic noise produced by the scanner coolant pump was eliminated entirely by turning the pump off.
To improve the detection of activation in brainstem structures, the data were acquired using cardiac triggering. We monitored the subject's electrocardiogram and synchronized the onset of each image cluster to the same point in the subject's cardiac cycle (Guimaraes et al., 1998).
Data analysis. The functional image data were first corrected for motion and normalized such that the time-average signal level was the same for all subjects. One type of activation map was created by comparing the images of each condition with those of the immediately following period of silence using a t test. Differential activation maps were created by comparing images corresponding to different stimulus conditions (e.g., condition 1 vs 2, condition 2 vs 1, and so on). Activation was quantified as percentage signal change = (S - Ss)/[(S + Ss) × 0.5], where S is the mean signal during a stimulus condition, and Ss is the mean signal during the immediately following period of silence. Analyses were performed at the individual and group levels.
For the group analysis, the functional data of the individuals were coregistered into a common spherical coordinate system. This was done using a computational tool that extracts and unfolds the gray matter-white matter interface to visualize the cortical surface in an inflated format (Fischl et al., 1999). Specifically, the anatomical volumes acquired at 1.5 tesla were reconstructed and inflated, and each hemisphere was mapped into a unit-radius sphere on which the sulcal and gyral patterns of each subject are superimposed. Functional data were then mapped to the spherical coordinate system and combined across subjects to generate composite differential activation maps.
Results
Stations throughout the auditory pathway (cochlear nucleus, inferior colliculus, and auditory cortex) responded strongly to the stimuli as seen in contrasts between each stimulus condition and periods of silence (see Fig. 4a,b). To investigate possible sensitivity to pitch and other stimulus attributes, we contrasted individual and group data between stimulus conditions.
Cochlear nucleus and inferior colliculus showed robust activation that did not depend on pitch salience. a, b, Activation maps for contrasts between condition 1 and silence for a representative subject. The color activation maps are superimposed on grayscale anatomical images for the same subject. c, d, Average ± SEM normalized percentage signal change produced by tone conditions, relative to silence, within cochlear nuclei (c) and inferior colliculi (d). Percentage signal change was quantified for the lowest p value voxel in each structure and hemisphere, normalized to the condition yielding the highest percentage change, and then averaged across subjects. In both cochlear nucleus and inferior colliculus, activation did not differ significantly between conditions (c, d) and was therefore independent of pitch salience.
Nonprimary cortical activity was lowest for sounds with weak pitch salience
Group data for complex tones
To reveal cortical areas that were sensitive to pitch salience, we contrasted the group data (n = 5) between the four complex tone conditions. Contrasting conditions of high and low pitch salience (1 vs 2, 4 vs 2) revealed areas of differential activation at the anterolateral end of Heschl's gyrus bilaterally, with some spread onto the superior temporal gyrus in the left hemisphere (Fig. 2a). All of these areas showed greater activity (i.e., image strength) during the conditions of high, compared with low, pitch salience (although the analyses were capable of detecting areas in which the reverse was true). To control for spectral and fundamental frequency differences between conditions in these initial contrasts, two control contrasts were performed. Because conditions 1 and 2 differed in spectral content as well as pitch salience, we tested the possibility that the areas arising from this contrast could be related to differences in sound spectral content alone by comparing two conditions that also differed in spectral content but not in pitch salience (conditions 3 and 4). Likewise, by comparing conditions 3 and 1, we addressed the possibility that areas arising from contrasting conditions 4 and 2 reflected a difference in fundamental frequency. The control contrasts did not result in any differential activation. Therefore, from an analysis of the group data, activity in anterolateral auditory cortex was specifically sensitive to pitch salience and not to other differences between conditions.
Low activation for complex tones with weak pitch salience in anterior auditory cortex. a, Differential activation maps based on data from five subjects superimposed on averaged anatomy and displayed in an inflated format. Light and dark gray regions correspond to gyri and sulci, respectively. White indicates areas of significant difference (p < 10-3) between conditions that differed in harmonic resolvability and hence in pitch salience. The qualitative center of mass for these areas (indicated by asterisks) had Talairach coordinates of 48.28, -10.52, 3.35 (right hemisphere) and -54.94, -4.83, 2.65 (left hemisphere). No differential activation was found in the contrasts between conditions that differed in F0 or frequency range alone. b, c, Average ± SEM normalized percentage signal change for each tone condition (b) or bandpass-filtered noise bursts (c), relative to silence, within the activation areas shown in a. Within complex-tone stimuli, condition 2, which produced weak pitch salience, elicited the lowest level of activation. Consistent with the interpretation that activity in these are as reflects pitch salience, noise bursts (no pitch) elicited activation levels that were lower than, or comparable with, those produced by the condition of weak pitch salience (c). Unlike the case of tones, different levels of activation for the noise conditions reflected differences in spectral region.
To quantify the relationship between pitch salience and cortical activity, the areas shown in Figure 2a were mapped to each individual and used to calculate percentage signal changes produced by each stimulus condition relative to silence. On average, percentage change for the condition of weak pitch salience was significantly less (p ≤ 0.01; Fisher's least significant difference test) than for the other three conditions (with strong pitch salience). Furthermore, there was no significant difference among these other three conditions (Fig. 2b). Hence, weaker salience was reflected by lower activation levels in anterolateral auditory cortex.
Individual data for complex tones
The data were examined at the level of individual subjects by first assessing intersubject variability in the location of the areas of differential activation. Contrasts between conditions that differed in pitch salience (1 vs 2, 4 vs 2), as well as the comparisons that controlled for spectral and fundamental frequency differences (3 vs 4, 3 vs 1), were performed to generate maps such as those in Figure 2a for each subject (n = 6). Maps contrasting conditions of weak and strong pitch salience showed differential activation in anterolateral Heschl's gyrus in 9 of 10 hemispheres, although there was usually some spread to more posterior areas and onto the superior temporal gyrus (Fig. 3). Unlike the group data, the individual data sometimes showed areas that reflected spectral or fundamental frequency differences between conditions (data not shown). In general, the location of such areas was highly variable, and their extent was small relative to the extent of the differential activation produced by contrasting conditions that differed in pitch salience. Thus, the most robust and repeatable activation occurred when contrasting conditions of different pitch salience.
Differential activation maps in individual subjects showing areas sensitive to pitch salience. The map for each subject was constructed as in Figure 2. Asterisks mark the Talairach coordinates of the activation “center of mass” from the group analysis. In each of the individual hemispheres, the asterisk lies at the anterolateral end of first Heschl's gyrus. The majority of hemispheres show differential activation on anterolateral (first) Heschl's gyrus with some showing additional activation in posterior and lateral areas. For the purposes of this display, a less stringent activation threshold was used for subject 3 (p < 0.01) than for the other subjects (p < 0.001). Note that, within a region of interest defined by the group activation (Fig. 2), all hemispheres showed the lowest percentage signal change for condition 2 (even the right hemisphere of subject 3, which shows no differential activation).
The data for individual subjects were quantified in two steps. First, we defined a region of interest for each subject comprising all areas that emerged from a pairwise comparison of conditions. Second, in the region of interest, we calculated the percentage signal change produced by each condition versus silence. In every subject, the condition with weak pitch showed the lowest level of activation, in agreement with the group data. To test the repeatability of our findings, we imaged one subject in two separate sessions. In both tests, the activation produced by the condition with weak pitch salience was lowest.
Data for noise stimuli
In four of the subjects, activation in response to noise, filtered into the same two spectral regions as used by the tones, was recorded within the same scanning sessions. Both noise conditions produced robust activation throughout the auditory pathways compared with quiet. Considering cortical regions of interest (i.e., areas of differential activation identified by the group analysis of the tone data), the low-spectrum noise produced significantly higher activation than did the high-spectrum noise (Fig. 2c), in contrast to the findings with tonal stimuli in which the spectral region had no significant effect overall.
Comparing across conditions within each spectral region, post hoc paired comparisons (n = 4) revealed that the high spectral-region noise produced significantly lower activation than condition 2 (low pitch salience; p = 0.013), which in turn produced significantly lower activation than condition 4 (high pitch salience; p < 0.0001). The low spectral-region noise produced significantly lower activation than both conditions 1 and 3 (both high pitch salience; p < 0.01), which did not differ significantly from one another (p > 0.5). In other words, noise, which had the lowest pitch salience of all, produced the lowest level of activity when comparisons were broken down by spectral region.
Subcortical activity was not dependent on pitch salience
Both the cochlear nuclei and the inferior colliculi showed robust activation when each of the stimulus conditions (tone or noise) was compared with silence (Fig. 4a,b). However, in contrast to cortical areas, the activation in these regions was not significantly different between both tone (Fig. 3c,d) and noise (data not shown) conditions. Hence, neither the cochlear nuclei nor the inferior colliculi showed a sensitivity to pitch salience for the stimuli used in this study.
Discussion
A representation for pitch salience in anterior nonprimary auditory cortex
Using fMRI as an indicator of neural activity, we identified regions of auditory cortex in which the level of activity depends on pitch salience. This dependence, demonstrated using complex tones, was also supported by data using noise, indicating a generalization of the dependence across stimuli. The exact locations showing sensitivity to pitch salience varied somewhat across individuals but consistently included a region immediately anterolateral to primary auditory cortex (i.e., anterior Heschl's gyrus bilaterally and, in the left hemisphere, adjacent superior temporal gyrus). Overall, the data indicate a representation for pitch salience in neural activity levels of anterior nonprimary auditory cortex.
An important aspect of our experimental design is that it allows the changes in neural activity on anterolateral Heschl's gyrus to be attributed to perceptual, rather than physical, stimulus differences. This dissociation of physical and perceptual stimulus characteristics was achieved using stimuli with very different pitch salience but identical stimulus regularity and by controlling for the physical differences between conditions (i.e., in spectral region and F0). It of course remains to be seen whether the identified neural activity changes represent processing at a stage before or subsequent to perceptual awareness.
Anterior auditory cortex and the processing of pitch
Previous imaging studies have investigated the representation of various pitch features in auditory cortex (Warren et al., 2003), but none has directly examined pitch salience. Studies by Griffiths et al. (1998, 2001) bear indirectly on the representation of pitch salience in that they showed increasing cortical activity with increasing stimulus temporal regularity (and, correspondingly, increasing pitch salience). However, they did not resolve whether the activity increases reflect (physical) temporal regularity or (perceptual) pitch salience changes because the two variables covaried.
The cortical correlate of pitch salience identified in the present study is located in a region that has also been implicated in the processing of pitch changes (e.g., as occurs in melodies). Specifically, several fMRI studies have identified a region immediately anterior to primary auditory cortex that shows greater activity for stimuli with changing, compared with fixed, pitch (Griffiths et al., 2001; Patterson et al., 2002; Warren and Griffiths, 2003; Warren et al., 2003). Together, the previous and present data converge in indicating that anterior areas of nonprimary auditory cortex are heavily engaged in pitch processing.
No identified representation for pitch salience in subcortical structures
In contrast to cortical areas, we found that activation in the inferior colliculus and cochlear nucleus did not differ significantly across stimulus conditions and hence showed no relationship to pitch salience. These results are consistent with proposals that subcortical structures do not encode pitch per se and instead encode physical variables related to pitch (Griffiths et al., 1998, 2001; Patterson et al., 2002). It should be noted, however, that, although consistent with this view, our data provide only weak support for it because they do not rule out the existence of subcortical representations for pitch or pitch salience on a finer spatial or temporal scale (submillimeter or subsecond) than provided by fMRI.
Griffiths et al. (2001) showed that increasing the temporal regularity of a stimulus produces an increase in activation in subcortical structures. It was assumed, but not shown, that the activation increase reflected the increase in temporal regularity rather than the covarying perceptual property (pitch salience). The present study provides support for this assumption in that there were no differences in subcortical activation across our tone conditions despite their differences in pitch salience. However, the present study contrasts with Griffiths et al. in that it did not show a relationship between activation and temporal regularity. Specifically, we found that brainstem activation for noise was not significantly different from responses to tones despite the greater temporal regularity of the tone conditions. One reason for this result may be our use of low-level, wide-band noise, which was present in all conditions (tone and noise). This background noise allowed us to rule out any possible influence of auditory distortion products (a potential confound ignored in previous imaging studies) and may have been the dominant determinant of subcortical response levels.
Implications for theories of pitch perception
Several pitch extraction mechanisms have been proposed that exploit specific sound physical features. Of interest here is a proposal that the high-salience pitch produced by low-numbered (resolved) harmonics is processed via a different mechanism than the low-salience pitch produced by high-numbered (unresolved) harmonics (Carlyon and Shackleton, 1994). In particular, it is believed that the high-salience pitch with individual peripherally resolved harmonics may exploit both temporal (fine-structure) and tonotopic (or place) codes, whereas low-salience pitch, with only peripherally unresolved harmonics, exploits only a temporal code based on the temporal envelope (Oxenham et al., 2004). Our experimental design does not allow us to explore this issue directly. However, because the low- and high-salience conditions produced activity within the same focal region of nonprimary auditory cortex, it appears that the processing of both forms of pitch converges in this area.
Footnotes
This work was supported by the National Institutes of Health-National Institute on Deafness and Other Communication Disorders Grants R01 DC 05216 (A.J.O.) and P01 DC 00119 (J.R.M.). This work was also supported in part by Consejo Nacional de Ciencia y Tecnologia Grant 145843 (H.P.). We thank Irina Sigalovsky for substantial help in the acquisition of the data and Christophe Micheyl for comments on a previous version of this manuscript.
Correspondence should be addressed to Hector Penagos, Massachusetts Institute of Technology, Division of Health Sciences and Technology, 77 Massachusetts Avenue, Room E18-310, Cambridge, MA 02139. E-mail: penagos{at}mit.edu.
Copyright © 2004 Society for Neuroscience 0270-6474/04/246810-06$15.00/0