Auditory pitch patterns are significant ecological features to which nervous systems have exquisitely adapted. Pitch patterns are found embedded in many contexts, enabling different information-processing goals. Do the psychological functions of pitch patterns determine the neural mechanisms supporting their perception, or do all pitch patterns, regardless of function, engage the same mechanisms? This issue is pursued in the present study by using 150-water positron emission tomography to study brain activations when two subject groups discriminate pitch patterns in their respective native languages, one of which is a tonal language and the other of which is not. In a tonal language, pitch patterns signal lexical meaning. Native Mandarin-speaking and English-speaking listeners discriminated pitch patterns embedded in Mandarin and English words and also passively listened to the same stimuli. When Mandarin listeners discriminated pitch embedded in Mandarin lexical tones, the left anterior insular cortex was the most active. When they discriminated pitch patterns embedded in English words, the homologous area in the right hemisphere activated as it did in English-speaking listeners discriminating pitch patterns embedded in either Mandarin or English words. These results support the view that neural responses to physical acoustic stimuli depend on the function of those stimuli and implicate anterior insular cortex in auditory processing, with the left insular cortex especially responsive to linguistic stimuli.
In addition to being a major component of music and vocalizations in many species, pitch conveys information about several aspects of spoken human language, including speaker identity, affect, intonation, phonemic stress, and word meaning. The present study evaluated the following two hypotheses on the neural basis of human pitch perception. The “functional” hypothesis proposes that the psychological function of fundamental frequency (F0) patterns determines which neural mechanisms are engaged during pitch perception. The “acoustic” hypothesis proposes that, regardless of psychological functions, all F0 patterns elicit responses from the same neural mechanisms. Empirical support is available for both functional (Ross et al., 1992) and acoustic hypotheses (Zatorre and Belin, 2001; Warrier and Zatorre, 2004) (for review, see Wong, 2002). In the present study, we test these hypotheses by examining the perception of pitch patterns that are used to contrast word meaning. Such pitch patterns, called lexical tones, are present in most languages of the world (Fromkin, 2000). For example, in Mandarin Chinese, the syllable /ma/ spoken in a high pitch means “mother,” but the same syllable spoken with a falling pitch pattern means “to scold.”
Lexical tone perception research has mainly focused on psychological processes (Xu, 1994; Wong and Diehl, 2003) and gross hemispheric specialization (Gandour et al., 1997). Recently, several neuroimaging studies have been conducted (Gandour et al., 2000) and reported activations in left inferior frontal regions. In contrast, nonlexical pitch processing studies (Zatorre et al., 1992) found homologous activations in the right hemisphere. Together, these studies provide evidence for the functional hypothesis, because the brain area engaged is determined by the psychological function of pitch.
However, all previous imaging studies of lexical tone perception possess a potentially serious confound of lexical tone processing and semantic processing. Specifically, the lexical tones were discriminated by native speakers of the tone language and by a control group unfamiliar with that language. Thus, for the former group but not the latter, the F0 patterns were lexically relevant and were embedded in meaningful words. This leaves open the possibility that the left hemisphere responses were attributable to semantic processing rather than to the perception of pitch as such.
The present study uses positron emission tomography (PET), an optimal acoustic environment for auditory studies, to compare native Mandarin- and English-speaking listeners discriminating pitch patterns embedded in Mandarin and English word pairs and passively listening to the same sets of stimuli. This design permitted a comparison of lexical tone processing and nonlexical pitch processing. Previous related studies led to the hypothesis that the frontal operculum and anterior insula would be intensely activated by these tasks (Zatorre et al., 1992). The functional hypothesis predicts the activation of those areas in the left hemisphere (relative to passive listening) in listeners for whom the pitch patterns are lexically meaningful (i.e., in the condition where Mandarin listeners discriminate pitch embedded in Mandarin words). The acoustic hypothesis predicts that right hemisphere homologues should be active for both groups of listeners regardless of the linguistic status of the stimuli.
Materials and Methods
Subjects. The subjects were seven adult male native Mandarin speakers who also speak English and seven adult male native speakers of American English who had no previous exposure to Mandarin. All Mandarin-speaking subjects ranked Mandarin as their dominant language relative to English and spoke mostly Mandarin (or in some cases, Mandarin plus another Chinese dialect) in their childhood. The Mandarin speakers ranged from 18 to 32 years of age, with a mean of 25 and SD of 5.1; the English speakers ranged from 18 to 27 years of age, with a mean of 21 and SD of 3.2. The self-rated proficiency in Mandarin for all of the Mandarin-speaking subjects was 10, on a scale of 1 (not proficient) to 10 (extremely proficient); their self-rated proficiency in English ranged from 6 to 9, with a mean of 7 and SD of 1.2. All subjects were right handed, musically untrained, and had normal hearing. Right-handedness was assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). Musical ability was assessed by a simple music-reading task, which involves sight singing or playing, in relative pitch, a six-note melody in C major with no accidentals (sharps or flats). A traditional Western music score and a conversion of the score into numerals (e.g., 1, C; 3, E) were provided to the subjects. All subjects failed this music test.
Stimuli. Mandarin and English stimulus sets were resynthesized from naturally produced utterances of two male Mandarin speakers and two male English speakers. F0s of these spoken words did not differ significantly between the two language groups. Resynthesis was performed with the Analysis/Synthesis Laboratory module of Computerized Speech Laboratory (Kay Elemetrics, Lincoln Park, NJ). Mandarin speakers were asked to produce a list of Mandarin words, all of which carried the high level tone (Tone 1). English speakers were asked to produce a list of English words in a high pitch. For every word produced, three speech tokens were resynthesized with pitch patterns resembling the high level tone (Tone 1), the rising tone (Tone 2), and the falling tone (Tone 4) of Mandarin. Tone 3, the dipping tone, was not included because it has been shown perceptually to be the most confusable tone both to native Mandarin speakers (Chuang et al., 1972) and to second-language learners of Mandarin (Kiriloff, 1969). Pitch contours were interpolated linearly through the voiced portion of each stimulus. The starting and ending pitch points for Tone 1 were identical. This value was the mean F0 of the list of words produced by each speaker. The ending pitch point of Tone 2 was the same as Tone 1, and the starting pitch point of Tone 2 was 26% lower than its ending point. The starting pitch point of Tone 4 was 10% higher than Tone 1 and fell by 82%. These pitch contours were modeled on the values obtained by Shih (1988). Three native informants of each language group transcribed the resynthesized speech tokens of their native language with an accuracy of 98% for the English set and 96% for the Mandarin set. The difference in transcription accuracy was not statistically significant. Mandarin informants were required to write the Chinese character, not the phonetic transcription (pin-yin) associated with the word they heard. For the Mandarin speech token set, only tokens that are actual Mandarin words were included in this intelligibility task and in the actual stimulus set. For example, /min1/ was not included because it is not an actual Chinese word (although it can be transcribed and pronounced by Mandarin speakers based on its pin-yin).
The stimuli on each trial consisted of pairs of resynthesized words originally spoken by the same speaker. For example, the /fei2/ and /wei2/ stimuli were based on the /fei1/ and /wei1/ productions of the same speaker. Half of the stimulus pairs had the same tone (or pitch patterns for the English stimuli) in each word of the pair, and the other half had different tones in each word of the pair. The frequency of occurrence for each tone category was equal, and the order of tone presentation was counterbalanced. Except for the aforementioned acoustic properties, all properties within a word pair (e.g., the vowel) were identical. Stimuli were 72 monosyllables (36 in each language) of ∼350 msec in duration each. Because of ceiling effects on discrimination accuracy observed in pilot studies, white noise, with amplitude of 2 dB lower than the stimuli, was added to the stimuli to increase the difficulty of the task. The aforementioned intelligibility testing was performed before the addition of noise.
Experimental procedures. There were five conditions, including a rest condition, two passive listening conditions, and two active listening conditions. Except for the rest condition, each condition was repeated twice, and a PET scan was performed for each repetition. The order of presentation of conditions was pseudorandomized for each subject. The four passive listening conditions (two conditions, each presented twice) were always presented before the active listening ones to reduce the possibility of active discrimination of the stimulus pairs by the subjects even in the passive conditions. Anatomical magnetic resonance images (MRIs) were acquired of the brain of each subject for accurate spatial normalization and localization of regional cerebral blood flow (rCBF) foci.
To reduce head motion during scanning, each subject was fitted with a thermal plastic facemask. A computer keyboard, for task responses, was placed within comfortable reach of the subject. All responses were made with the middle and index fingers of the left hand, pressing the 1 and 4 keys on the number pad, respectively. During all conditions, subjects lay supine in the scanning instrument with their eyes closed. All stimuli were presented via Sony Fontopia-Earbud headphones (Sony, Tokyo, Japan).
During the discrimination tasks, subjects judged whether the tone patterns in Mandarin word pairs or the pitch patterns in English word pairs were the same or different and responded by key press. During the passive listening tasks, subjects were instructed to listen to, but not attempt to discriminate, either Mandarin or English word pairs and to press the response keys alternatively after each word pair had been presented. During the rest condition, no stimuli were presented.
During PET trials, subjects began performing the task ∼30 sec before the onset of the 40 sec scan. On average, a word pair trial lasted ∼1 sec, with an intra-pair interval of 300 msec. A response from the subjects triggered the presentation of the next stimulus pair. If subjects did not respond within 1250 msec, the next stimulus pair was presented. A relatively short inter-stimulus interval was used in an attempt to facilitate attention to the experimental task. Inter-scan interval was ∼10 min.
Imaging procedures and analysis. PET scans were performed on a GE 4096 camera, which has a pixel spacing of 2.0 mm, an inter-plane, center-to-center distance of 6.5 mm, 15 scan planes, and a z-axis field of view of 10 cm. Correction for radiation attenuation was made by means of a transmission scan collected before the first scan using a 68Ge/68Ga pin source. Cerebral blood flow was measured with H2 15O (half-life, 123 sec) administered as an intravenous bolus of 8-10 ml of saline containing 60 mCi. At the start of a scanning session, an intravenous cannula was inserted into the subject's right forearm for injection of each tracer bolus. A 40 sec scan was triggered as the radioactive tracer was detected in the field of view (the brain) by increases in the coincidence-counting rate. During this scan, the subject was in the rest condition or performed a task in one of the four experimental conditions. Immediately after this, a 50 sec scan was acquired as the subject lay with his eyes closed without performing a task. The latter rest PET images, in which task-related rCBF changes were still occurring in specific brain areas, were combined with the task PET images to enhance detection of relevant activations. A 10 min inter-scan interval was sufficient for isotope decay (5 half-lives) and return to resting state levels of regional blood flow within activated regions. Images were reconstructed using a Hann filter, resulting in images with a spatial resolution of ∼7 mm (full-width at half-maximum).
Significant changes in cerebral blood flow indicating neural activity were detected using a regions-of-interest (ROIs)-free image subtraction strategy. High-resolution MRI scans were acquired for each subject. Inter-scan, intra-subject movement was assessed and corrected using the Woods' algorithm (Woods et al., 1993). Semiautomatic registration of PET to matched, spatially normalized MRI (in Talairach space) was performed using in-house spatial normalization software implementing an algorithm using a nine-parameter, affine transformation (Talairach and Tournoux, 1988; Lancaster et al., 1995).
The data were smoothed with an isotropic 3 × 3 × 3 Gaussian kernal to yield a final image resolution of 8 mm. By use of moderate smoothing, resolution was not lost; control for false positives was provided by the p value criteria used in later stages of our analysis stream. The data were then analyzed using the Fox et al. (1988) algorithm as implemented in Multiple Image Processing Station (MIPS) (Research Imaging Center, University of Texas Health Science Center San Antonio, San Antonio, TX, USA). Intra-subject image averaging was performed within conditions, and the resulting image data were analyzed by an omnibus (whole-brain) test.
For this analysis, local extrema (positive and negative) were identified within each image using a three-dimensional search algorithm (Mintun et al., 1989). Each set of local extrema data was plotted as a frequency histogram (for visual inspection). Then, a β-2 statistic measuring kurtosis and a β-1 statistic measuring skewness of the histogram of the difference images (change distribution curve) (Fox and Mintun, 1989) were used as omnibus tests to assess overall significance (D'Agostino et al., 1990). Critical values for β statistics were chosen at p < 0.01. The β-1 and β-2 tests, which are implemented in the MIPS software (Research Imaging Center, University of Texas Health Science Center at San Antonio, San Antonio, TX) in a manner similar to the use of the gamma-1 and gamma-2 statistics (Fox and Mintun, 1989), improve on the gamma statistic by using a better estimate of the degrees of freedom (Worsley et al., 1992).
The null hypothesis of omnibus significance was rejected, so a post hoc (regional) test was done (Fox et al., 1988; Fox and Mintun, 1989). In this algorithm, the pooled variance of all brain voxels was used as the reference for computing significance. This method is distinct from methods that compute the variance at each voxel but is more sensitive (Strother et al. 1997), particularly for small samples, than the voxel-wise variance methods of Friston et al. (1991) and others. In this analysis, a maxima and minima search was conducted to identify local extrema within a search volume of 125 mm3 (Mintun et al., 1989). Cluster size was determined based on the number of significant, contiguous voxels within the search cube of 125 mm3. The statistical parameter images were converted to z-values by dividing each image voxel by the average SD of the null distribution. p values were assigned from the Z distribution. Only Z values >2.96 (p < 0.001) were reported. The critical value threshold for regional effects (Z >2.96; p < 0.001) was not raised to correct for multiple comparisons (e.g., the number of image resolution elements). This is because omnibus statistics were established before post hoc analysis. The scanning methods used at the Research Imaging Center have been described previously (Raichle et al., 1983; Fox et al., 1985; Fox and Mintun, 1989; Lancaster et al., 1995).
Gross anatomical labels were applied to the detected local maxima using a volume occupancy-based, anatomical-labeling strategy as implemented in the Talairach Daemon (Lancaster et al., 2000), except for activations in cerebellum, which were labeled manually with reference to an atlas of the cerebellum (Schmahamann et al., 1999).
Anatomical MRI scans were performed on an Elscint 1.9 T Prestige system. The scans used three-dimensional gradient recalled acquisitions in the steady state (3D GRASS), with a repetition time of 33 msec, an echo time of 12 msec, and a flip angle of 60° to obtain a 256 × 192 × 192 volume of data at a spatial resolution of 1 mm3.
A two-way, repeated measures ANOVA performed on response accuracy (Fig. 1) revealed a main effect for subject group, with Mandarin-speaking subjects showing greater accuracy in discriminating tone and pitch patterns than English-speaking ones (F(1, 12) = 8.29; p < 0.05). A two-way, repeated measures ANOVA performed on response times (Fig. 2) yielded a significant main effect of task (F(1,12) = 11.14; p < 0.01), with subjects slower on the tone task than the pitch task. Within subject groups, there appeared to be a speed-accuracy trade off such that both subject groups were faster but less accurate in the pitch task relative to the tone task.
PET results: cerebral blood flow increases
Distinct patterns of activation were observed for each task and each group. Tables 1 and 2 itemize the regions of cerebral blood activity for the tone and pitch tasks relative to their corresponding passive listening tasks. Only z values >2.96 (p < 0.001) and forming contiguous clusters >60 mm3 are reported. In the descriptions, figures, and tables that follow, activations that resulted spuriously from subtractions of deactivations are excluded. Thus, all activations were verified by subtracting the values in the rest condition from those during the relevant active tasks, and false-positive activations caused by subtractions of deactivations were eliminated.
Comparison of the Mandarin tone task to Mandarin passive listening for the Mandarin speakers yielded significant rCBF increases predominately in the left hemisphere. The strongest activation was in the left anterior insula (Fig. 3, left panel). Another very strong activation was present in the left basal ganglia (putamen). Other left hemispheric areas activated were fusiform gyrus [Brodmann's Area (BA) 36, 37], posterior insula, inferior supplementary motor area (SMA; medial BA 6), and dorsolateral premotor cortex (BA 6). There were right hemispheric activations in somatosensory cortex (BA 3) and dorsolateral premotor cortex (BA 6). In addition, there were bilateral activations in vermal or midline regions of cerebellum, mostly in the posterior hemisphere.
The same comparison for the English speakers yielded similar activations in each hemisphere but no cerebellar activity. Of particular note is that English speakers intensely activated anterior insula in the right hemisphere, homologous to that activated by the Mandarin speakers (Fig. 3, right panel). Although activations were strong in both hemispheres when English subjects discriminated Mandarin stimuli, left hemisphere activity was more diffuse, compared with right hemisphere activity, which clustered around the insular and the frontal cortices. In the left hemisphere, there were activations in middle frontal cortex (BA 10), anterior cingulate (BA 24), supramarginal gyrus (BA 40), middle and superior temporal gyrus (BA 39, 22), basal ganglia (caudate nucleus), and somatosensory and premotor frontal cortex (BA 2, 6, 44). In the right hemisphere, apart from the insula activation described earlier, there was strong activation in dorsolateral premotor frontal cortex (BA 6) as well as considerable activation in other frontal areas, including primary motor (BA 4), inferior frontal (BA 47), and somatosensory cortices (BA 2, 3). In addition, there were right hemispheric activations in anterior cingulate cortex.
Interestingly, when the active English pitch task was contrasted with its passive counterpart, both Mandarin and English speakers showed primarily right hemispheric activations. For the Mandarin speakers, activity was observed in right hemisphere in insula (Fig. 4, left panel) in a region approximately homologous (24, 17, 4) to that activated in the left hemisphere during the active Mandarin tone task (-27, 24, 4). In addition, there was right hemispheric activity in SMA (medial BA 6), pons, midbrain, anterior cingulate cortex, inferior frontal gyrus (BA 44/6), posterior insula, dorsolateral frontal cortex (BA 6), lentiform nucleus, and thalamus. In the left hemisphere, there were two foci only, one each in anterior cingulate cortex (BA 32) and dorsolateral premotor frontal cortex (BA 6). The right cerebellum was activated in midline and lateral areas in the posterior hemisphere.
For the English speakers, there was strong activation in right insula (30, 22, 0) near that observed for the Mandarin speakers (Fig. 4, right panel). In addition, there was right hemispheric activation in superior (BA 10) and inferior frontal cortex (BA 47, 44/6, 46) as well as somatosensory (BA 2, 3) and primary motor and premotor cortex (BA 4, 6). Other right areas activated included anterior cingulate (BA 24, 32), prefrontal cortex (BA 44/9), middle temporal cortex (BA 39), insula (posterior to the focus described earlier), cuneus and extrastriate occipital cortex (BA 18, 19), and basal ganglia (caudate nucleus). In the left hemisphere, there were several foci in anterior cingulate (BA 24, 32). There was one activation in the left midline cerebellum.
English speakers only, during both Mandarin and English pitch discrimination tasks, exhibited activations in bilateral parietal cortex [left angular gyrus (BA 39), left supramarginal gyrus (BA 40), right inferior parietal regions (BA 39 and 40)] as well as in right visual cortex [medial occipital gyrus (BA 19) and cuneus (BA 18)]. These areas are not typically viewed as related to speech and pitch processing, although activations in these areas have been reported in speech (Zatorre et al., 1992) and melody perception (Zatorre et al., 1994). These activations may be attributable to subjects' use of resources outside the regular speech and pitch processing domains. English-speaking subjects, for whom identifying pitch patterns at the word level is an unfamiliar task, could be using strategies such as visualizing the written forms of the stimuli they heard or mentally drawing the pitch patterns of the stimuli.
To further assess the activity in insular cortex for groups and conditions, a ROI analysis was performed. Mean activity within a ROI of 5 × 5 × 5 voxels (voxel, 2 mm3) was measured at the extrema point within both the left and right anterior insular cortex in both subject groups for both experimental conditions. A repeated measures ANOVA was performed on the resulting ROI data. The analysis revealed a group × condition × hemisphere interaction (F(1, 13) = 6.01; p < 0.05); the left anterior insula was activated to a greater degree in the tone condition in the Mandarin-speaking subjects (Fig. 5). Other than a marginal main effect of subject group (F(1, 13) = 4.19; p < 0.07), no other main effects or interactions were found. The group × condition × hemisphere interaction remained when the search volume was decreased from 1000 mm3 (5 × 5 × 5 voxel ROI) to 216 mm3 (3 × 3 × 3 voxel ROI) and 8 mm3 (1 × 1 × 1 voxel ROI).
These findings indicate that the brain areas engaged during pitch pattern perception depend on the psychological functions for those patterns. Unlike previous neuroimaging studies of lexical tone processing, our subject groups discriminated pitch patterns of words in their respective native languages, and the patterns of brain activations were different depending on the functions of these pitch patterns. When pitch patterns had a lexical function, activations were predominately in the left hemisphere, including areas that have been shown to be active during language tasks. When Mandarin-speaking subjects discriminated the same pitch patterns embedded in English words, activations were primarily in the right hemisphere, including the inferior frontal gyrus, which has been shown to be active during pitch processing. However, when English-speaking subjects discriminated pitch patterns of English word pairs without a lexical function, the right hemisphere, including the right inferior gyrus, was strongly activated.
These results are inconsistent with an acoustic hypothesis predicting that the same brain areas subserve a particular stimulus category independent of its function. However, these data are not necessarily inconsistent with other hypotheses that propose localized, specialized, acoustic processing. Thus, one hypothesis proposes that left primary auditory cortex is specialized for temporal cues, whereas that in the right hemisphere is specialized for spectral cues (Zatorre and Belin, 2001). Our findings may be consistent with this hypothesis if the activity we observed occurs downstream from such acoustically specialized processing.
Our findings provide clear evidence that there is a region of anterior insula that supports aspects of pitch pattern perception. Furthermore, our results imply that on the left side, this insular region supports pitch pattern perception when the patterns carry lexical information, whereas on the right side, it supports the pitch pattern perception when the patterns are not lexically informative. The insula spans essentially the length of primary language areas in the brain and connects to orbital, frontal opercular, lateral premotor, ventral granular, and superior temporal cortices among other areas (Mesulam and Mufson, 1985; Augustine, 1996). Regions of the insula have been implicated by studies in aspects of language-related functions (for review, see Ardila, 1999), including aphasia (Damasio and Damasio, 1980), dyslexia (Paulesu et al., 1996), affective and nonaffective prosody (Borod, 2000), and auditory agnosia (Habib et al., 1995).
A recent cross-language functional MRI (fMRI) study found bilateral insular activity during phonological and nonphonological speech processing (Jacquemot et al., 2003). Subjects discriminated vowel duration and epenthetic vowels embedded in pseudowords by French and Japanese speakers; vowel duration is phonologically relevant to Japanese but not French speakers, but the reverse is true for epenthetic vowel. This use of pseudowords is likely reflected in the lack of laterality effect seen here with lexical and nonlexical pitch processing in real words. Their results further provide evidence supporting the functional account of pitch perception that emphasizes the importance of context (i.e., lexical tones are most saliently lexical when they are embedded in real words). In an fMRI study of phonetic learning, Golestani and Zatorre (2004) examined the discrimination of native and non-native phonemes as monolingual English-speaking subjects learned to perceive Hindi consonants. For both native and acquired non-native consonant discriminations, activity was detected in regions of bilateral anterior insula slightly lateral to those activated here in the context of linguistic (left) or acoustic-only (right) features. However, because subjects were required to attend to individual phonemes (e.g., /ta/) isolated from lexical or broader linguistic contexts (e.g., an actual word “tea”), it is likely that they closely attended to the acoustic characteristics of the stimuli, because the spectrotemporal features of the stimuli were being varied. Thus, it is unclear whether a pure psychoacoustic strategy was used, or whether the potential lexical-linguistic relevance of these stimuli played any role. The bilateral insular activation (as opposed to a left-lateralized activation) may reflect a combination of these strategies.
The insula is often active in neurological studies of pitch processing and vocalization (Table 3) (for review, see Bamiou et al., 2003). Neuroimaging studies of sentence intonation processing have found activation in the frontal operculum extending to the anterior insula (Kotz et al., 2003; Meyer et al., 2003). Magnetoencephalography studies have observed a right-lateralized frontal region to be associated with nonlexical intonation processing (Herrmann et al., 2003). An event-related potential study investigating the processing of nonlexical pitch patterns in German words failed to show neural distinctions between the detection of correct and incorrect pitch contours (Friedrich et al., 2001), implying that the lexical status of pitch is instrumental in localizing the associated neural substrates. When subjects process lexical tones, left insula is activated when a linguistic component (whether native or non-native) is present (i.e., words minus rest, or words minus pitch), whereas the right insula is activated when nonlexical pitch is present (i.e., pitch minus rest). These activations have sometimes been attributed to subvocal rehearsal (Paulesu et al., 1996). As discussed later, evidence does not favor a subvocal account of our observed insular activations but an account more consistent with speech perception and phonological processing.
Insular cortex also has a role in overt vocal production. Perry et al. (1999) found that singing a single pitch on a vowel, relative to passive listening to complex tones, activated the insular cortex bilaterally. Their left anterior activation (-42, -1, 8) was linked to the production aspect of singing, and the right side (39, -6, -6) was associated with the self-regulation-pitch perception aspect of singing. Likewise, Riecker et al. (2000) found that overt speaking and overt nonlyrical singing elicited bilateral insula activity. However, both activations were in the anterior insula (-35, 18, 4 for speech; 32, 16, -6 for singing), unlike the results of the study by Perry et al. (1999). No such activations were observed in covert speaking and singing of the same stimuli. Left anterior insula activity was present when amateur musicians sang either a monotonic pitch, a melody, or a harmonization to a melody (Brown et el., 2004).
A unique feature of the present study is that the stimuli were embedded in noise (to eliminate a behavioral ceiling effect). Noise may increase subjects' tendency to rehearse the stimuli subvocally, and subvocal rehearsal led to the strong insular activations. Several aspects of the results suggest otherwise. Subvocal rehearsal is associated with activation on the lateral area of BA 44 and the inferior and superior aspects of BA 6 (Smith et al., 1998). When our subjects performed the nonlexical pitch tasks, only right insular activations were detected. Because the stimuli included consonants, vowels, and actual words in addition to the pitch patterns, subvocal rehearsal should induce left insular activations in addition to the right activations. Thus, the absence of activation in left insula appears inconsistent with the use of subvocal rehearsal. However, the addition of noise introduced an extra cognitive-perception demand (i.e., signal-to-noise extraction), which may be reflected in our activations.
There were a variety of other brain areas activated in these studies. The activations in the motor (BA 4), premotor (BA 6), and primary sensory (BA 2, 3) areas were likely to be related to subjects' left-handed button press responses. Bilateral anterior cingulate activations were observed, similar to other studies of language and pitch processing [e.g., orally producing words, Tan et al. (2001); word discrimination, Kiehl et al. (1999); identifying words from multiple talkers, Wong et al. (2004)]. Perry et al. (1999) found that relative to passive listening to complex tones, singing a single pitch activated the right anterior cingulate gyrus. Zatorre et al. (1994) found right and midline anterior cingulate activations in pitch discrimination. Thus, left and midline anterior cingulate cortex are implicated in the production and perception of ordinary language, whereas the right and midline anterior cingulate cortex appear to support pitch production.
Basal ganglia activations were observed in both tasks and both subject groups. Similar activations were observed in other language and pitch processing studies (Brown et al., 2004). Lesions in the thalamus are associated with language deficits (Nadeau and Crosson, 1997). Specifically, the pulvinar-lateral posterior complex of the thalamus is implicated in language because of its extensive connections with temporal and parietal language areas (Jones, 1985). Not surprisingly, we observed thalamic activations, especially in the pulvinar. Thalamic activations were observed in other functional imaging studies of tasks with lexical tones (Hsieh et al., 2001).
Investigations of lexical tone processing, like that here, report cerebellar activations in midline and right-sided lateral and posterior areas (Gandour et al., 2000; Li et al., 2003). It is not yet clear exactly what functions are supported by the cerebellar activity during auditory and nonmotor language tasks, although new findings suggest a role for cerebellum in supporting auditory sensory processing (Bower and Parsons, 2003) and working memory (Desmond and Fiez, 1998).
Overall, these findings imply that brain activations associated with pitch perception depend on the function of those stimuli and suggest that left anterior insula is important for lexically distinctive prosodic information. These results indicate that speech and phonological processing may involve neural resources beyond those associated with basic auditory processing, regardless of signal complexity.
This work was supported by National Science Foundation Grant BCS-9986246 (R.D., P.W.), National Institutes of Health Grant R01 DC00427-13, 14 (R.D.), and a grant from the Research Imaging Center (L.P.). We thank Jack Gandour, Ani Patel, and Kai Alter for their comments on a previous version of this manuscript.
Correspondence should be addressed to Patrick C. M. Wong, Speech Research Laboratory, Department of Communication Sciences and Disorders, Northwestern University, 2240 Campus Drive, Evanston, IL 60208. E-mail:.
L. M. Parsons's and M. Martinez's present address: Department of Psychology, University of Sheffield, S10 2TP, UK.
Copyright © 2004 Society for Neuroscience 0270-6474/04/249153-08$15.00/0