Elsevier

Neuropsychologia

Volume 47, Issue 4, March 2009, Pages 1096-1106
Neuropsychologia

The role of spectral and durational properties on hemispheric asymmetries in vowel perception

https://doi.org/10.1016/j.neuropsychologia.2008.12.033Get rights and content

Abstract

The aim of the current study is to investigate potential hemispheric asymmetries in the perception of vowels and the influence of different time scales on such asymmetries. Activation patterns for naturally produced vowels were examined at three durations encompassing a short (75 ms), medium (150 ms), and long (300 ms) integration time window in a discrimination task. A set of 5 corresponding non-speech sine wave tones were created with frequencies matching the second formant of each vowel. Consistent with earlier hypotheses, there was a right hemisphere preference in the superior temporal gyrus for the processing of spectral information for both vowel and tone stimuli. However, observed laterality differences for vowels and tones were a function of heightened right hemisphere sensitivity to long integration windows, whereas the left hemisphere showed sensitivity to both long and short integration windows. Although there were a number of similarities in the processing of vowels and tones, differences also emerged suggesting that even fairly early in the processing stream at the level of the STG, different mechanisms are recruited for processing vowels and tones.

Introduction

Understanding the neural basis of speech perception is essential for mapping out the neural systems underlying language processing. An important outstanding area of this research focuses on the functional role the two hemispheres play in decoding the speech signal. Recent neuroimaging experiments suggest a hierarchical organization of the phonetic processing stream with early auditory analysis of the speech signal occurring bilaterally in Heschl's gyri and the superior temporal lobes and later stages of phonetic processing occurring in the left middle and anterior STG and STS of the left, dominant language hemisphere (Liebenthal, Binder, Spitzer, Possing, & Medler, 2005; Scott, Blank, Rosen, & Wise, 2000).

Speech sounds themselves, however, are not indissoluble wholes; they are comprised of a set of acoustic properties or phonetic features. For example, the perception of place of articulation in stop consonants requires the extraction of rapid spectral changes in the 10 s of ms at the release of the consonant. The perception of voicing in stop consonants requires the extraction of a number of acoustic properties (Lisker, 1978), among them, voice-onset time, corresponding to the timing between the release of the stop consonant and the onset of vocal cord vibration. The perception of vowel quality requires the extraction of quasi-steady-state spectral properties associated with the resonant properties of the vocal tract. What is less clear are the neural substrates underlying the mapping of the different acoustic properties or features that give rise to these speech sounds.

Several studies have explored potential differences in lateralization for spectral and temporal properties using synthetic non-speech stimuli with spectral and temporal properties similar to speech (Boemio, Fromm, Braun, & Poeppel, 2005; Hall et al., 2002; Jamison, Watkins, Bishop, & Matthews, 2005) and have found differential effects of both parameters on hemispheric processing. Results suggest that at early stages of auditory processing, the functional role of the two hemispheres may differ (Ivry & Robertson, 1998; Poeppel, 2001; Zatorre et al., 2002a, Zatorre et al., 2002b). In particular, although both hemispheres process both spectral and temporal information, there is differential sensitivity to this information. It has been shown in fMRI studies (Boemio et al., 2005; Zatorre & Belin, 2001) as well as studies of aphasia (Van Lancker & Sidtis, 1992) that the right hemisphere has a preference for encoding pitch or spectral change information and it does so most efficiently over long integration time windows, whereas the left hemisphere has a preference for encoding spectral information and particularly for integrating rapid spectral changes. Moreover, it has been shown that this preference may be modulated by task demands (Brechmann & Scheich, 2005). Evidence from intracerebral evoked potentials further suggests that processing of fine-grained durational properties of both speech and non-speech stimuli is localized to the left Heschl's gyrus (HG) and planum temporale (Liegois-Chauvel, deGraaf, Laguitton, & Chauvel, 1999). Other evidence suggests that this division may not be so clear-cut. For instance, in an fMRI study of non-speech stimuli, bilateral HG responded to spectral variation, whereas in the STG temporal information was left-lateralized, and spectral properties were right-lateralized (Schonwiesner, Rubsamen, & von Cramon, 2005).

Poeppel's Asymmetical Sampling in Time theory (AST) (2003) hypothesizes that laterality effects in processing auditory stimuli arise due to different temporal specializations of the two hemispheres. In particular, the AST proposes that the two hemispheres have different temporal integration windows with the left hemisphere preferentially extracting information from short (25–50 ms) windows and the right hemisphere preferentially extracting information from long (150–300 ms) windows. To test this hypothesis, Poeppel and colleagues (Boemio et al., 2005) designed an fMRI study in which subjects listened to narrow-band noise segments with temporal windows ranging from 25 to 300 ms. They found that relative to a constant frequency tone, the temporal noise segments produced increasing right hemisphere activation as they increased both the overall duration and the number of different pitch variations. However, their results only partially support the AST model. While an increase in right hemisphere activity was observed with increasing duration, no corresponding increase in left hemisphere activity was found for short durations.

Two questions are raised by these studies. The first is whether the acoustic properties of speech are processed by a domain-general spectro-temporal mechanism (for discussion see Zatorre & Gandour, 2007). If this is the case, then a speech property that favored spectral analysis over a relatively long time domain should recruit right hemisphere mechanisms. The second is whether manipulating the time window over which spectral information is integrated would affect hemispheric preferences.

The previous studies described above demonstrated right hemisphere preference using non-speech stimuli varying in their long-term spectral properties. What is less clear is whether similar right hemisphere preferences would emerge for vowels which require extraction and integration of spectral properties over a particular time window and the potential effects of the size of that window on hemispheric laterality. Given the hypothesized differences in the computational processes of the two hemispheres, there are two properties of vowels that are likely to recruit right hemisphere processing mechanisms. The perception of vowel quality requires the extraction of spectral properties over a relatively long steady-state. For example, differences in vowel quality such as [i] as in beet and [u] as in boot are determined by the first two formant frequencies. Moreover, the spectral properties of vowels tend to be fairly stable over 150 ms or more, and should therefore have a ‘long’ integration window. Thus, presumably at early stages of processing, there should be a right hemisphere preference for the processing of vowels.

Nonetheless, the hypotheses that there are functional differences in the processing of speech sounds in the two hemispheres are largely based on the results of neuroimaging studies using non-speech stimuli (Boemio et al., 2005, Jamison et al., 2005; Zatorre & Belin, 2001). While it is assumed that at early stages of processing both non-speech and speech engage similar computational mechanisms (Uppenkamp, Johnsrude, Norris, Marslen-Wilson, & Patterson, 2006), the question remains as to what effect the presence of a linguistic stimulus may have on hemispheric laterality (Narain, Scott, Wise, & Rose, 2003).

Indeed, early literature exploring the hemispheric processing of auditory stimuli using the dichotic listening technique showed hemispheric differences as a function of the type of stimulus. Linguistic stimuli such as numbers, words, nonsense syllables and even consonants showed a right ear/left hemisphere advantage, whereas non-linguistic stimuli such as music and sound effects showed a left ear/right hemisphere advantage (Kimura, 1961, Kimura, 1964, Spellacy and Blumstein, 1970; Studdert-Kennedy & Shankweiler, 1970). Similar dichotomies in the hemispheric processing of linguistic and non-linguistic stimuli have also been shown in patients with left and right temporal lobectomies, with deficits in processing speech sounds with left temporal lobectomies and deficits in processing pure tones and other aspects of musical processing with right temporal lobectomies (Milner, 1962). Interestingly, results for vowel stimuli were mixed; a small right hemisphere advantage was shown in some experiments (Shankweiler & Studdert-Kennedy, 1967), a left hemisphere advantage in others (Godfrey, 1974; Weiss & House, 1973), and no ear advantage in still others (Spellacy & Blumstein, 1970).

A few functional neuroimaging studies have been conducted using real and synthetic vowel stimuli, but none have analyzed directly potential hemispheric asymmetries for the processing of vowels. One study investigated vowel processing and showed bilateral activation for both vowels and non-speech control stimuli of equal complexity with greater activation for vowels than for the non-speech stimuli (Uppenkamp et al., 2006). Obleser et al. (2006) also showed bilateral activation for vowels in a study investigating the topographical mapping of vowel features associated with differences in formant frequencies, and hence spectral patterns of the stimuli. Their results suggested stronger right hemisphere activation for the processing of these vowel features, although laterality differences were not tested statistically (cf. also Guenther, Nieto-Castanon, Ghosh, & Tourville, 2004).

The aim of the current study is to investigate potential hemispheric asymmetries in the perception of vowel quality and the influence of different time scales on such asymmetries. To this end, activation patterns for naturally produced vowels, which have a quasi-steady-state, constant spectral formant pattern will be examined at three time scales or durations, encompassing a short (75 ms), medium (150 ms), and long (300 ms) integration time window. If, as discussed above, there is a right hemisphere domain-general mechanism for extracting relatively steady-state spectral properties of auditory stimuli, then a right hemisphere preference for the processing of vowels will emerge. However, the magnitude of the asymmetry should be influenced by vowel duration with an increased right hemisphere preference for long vowels. Models such as Poeppel's AST also predict an increased left hemisphere preference for short vowels. Nonetheless, given the results of Boemio et al. (2005), it is not clear whether there will be increased left hemisphere activation for vowels at shorter (75 ms) timescales. The locus of this asymmetry for vowels should emerge in the STG and STS, and, in particular, in the anterior STG and STS reflecting the recruitment of the auditory ‘what’ processing stream relating to the perception of ‘auditory objects’ or speech sounds (Hickock & Poeppel, 2000; Obleser et al., 2006).

A discrimination task with a short (50 ms) interstimulus interval (ISI) will be utilized. There are two reasons for using this paradigm. First, discrimination judgments will be based on the perception of differences in the spectral properties (i.e. the formant frequencies) of the vowel stimuli. Second, it is generally assumed that a discrimination task taps early stages of phonetic processing since it focuses attention on potential differences in the acoustic properties of the stimuli, rather than on the phonetic category membership of the stimuli (Liberman, Harris, Hoffman, & Griffith, 1957).

A tone discrimination task will be used as a control task. Tone discrimination tasks have been used as a control condition in other studies investigating the perception of speech (Burton, Blumstein, & Small, 2000; Jancke, Wustenberg, Scheich, & Heinze, 2002; Sebastian & Yasin, 2008). Although the discrimination of tones reflects perception of differences in pitch and the discrimination of vowels reflects perception of differences in vowel quality, both share the acoustic property of periodicity with tones being fully periodic and vowels being quasi-periodic. Importantly, tone discrimination has shown right hemisphere lateralization (Binder et al., 1997). The tone stimuli will be single frequency sine wave tones which share the second formant frequency and the duration parameters (75, 150, 300 ms) of the vowel stimuli. Given the hypotheses described above that there is a right hemisphere domain-general mechanism for extracting steady-state properties of auditory stimuli, simple sine wave tones should show similar lateralization patterns to those of vowels. In general, there should be right hemisphere lateralization for the discrimination of pitch contrasts between the tone stimuli (cf. also Binder et al., 1997, Milner, 1962), and the patterns of asymmetry as a function of duration should mirror those for vowels with an increased right hemisphere preference for long duration tones. When contrasted with the tone stimuli, however, there may be less right hemisphere activation due to the linguistically relevant properties of the vowel.

Section snippets

Subjects

Fifteen healthy volunteers (11 females, 4 males), aged 18–54 (mean = 23 ± 9 years), participated in the study. All subjects were native English speakers and right-handed according to the Edinburgh Handedness Inventory (Oldfield, 1971). Participants gave written consent prior to participation in accordance with guidelines established by the Human Subjects Committee of Brown University and Memorial Hospital. Each participant received moderate monetary compensation for their time. In addition to the

Behavioral

Fig. 2 shows the performance and Fig. 3 shows the RT results for the behavioral data. As Fig. 2 shows, overall performance was above 90% for both vowels and tones. “Different” and “same” performance responses were submitted to separate 2-way ANOVAs with factors of duration and stimulus type (see Farell, 1985 for discussion about potential differences in processing mechanisms for same and different responses). For “different” responses, only a main effect of duration (F(2, 28) = 7.905, p < 0.002) was

Discussion

The results of this study provide some further insight into the computational mechanisms underlying the processing of both speech and non-speech. Although previous studies have examined hemispheric processing of spectral and temporal information in non-speech stimuli, these parameters have typically covaried (cf. Hall et al., 2002; Zatorre & Belin, 2001) making it difficult to determine the role of each of these parameters separately and to assess potential interactions between them. In the

Acknowledgements

This research was supported in part by the Dana Foundation, NIH Grant DC006220 to Brown University and the Ittelson Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Deafness and Other Communication Disorders or the National Institutes of Health. Reprint requests and correspondence should be sent to Sheila Blumstein, Department of Cognitive and Linguistic Sciences, Brown University, 190 Thayer

References (53)

  • J.R. Binder et al.

    Human brain language areas identified by functional magnetic resonance imaging

    Journal of Neuroscience

    (1997)
  • J.R. Binder et al.

    Human temporal lobe activation by speech and nonspeech sounds

    Cerebral Cortex

    (2000)
  • A. Boemio et al.

    Hierarchical and asymmetric temporal sensitivity in human auditory cortices

    Nature Neuroscience

    (2005)
  • A. Brechmann et al.

    Hemispheric shifts of sound representation in auditory cortex with conceptual listening

    Cerebral Cortex

    (2005)
  • M.W. Burton et al.

    The role of segmentation in phonological processing: An fMRI investigation

    Journal of Cognitive Neuroscience

    (2000)
  • R.W. Cox et al.

    Software tools for analysis and visualization of fMRI data

    NMR in Biomedicine

    (1997)
  • T.H. Crystal et al.

    Articulation rate and the duration of syllables and stress groups in connected speech

    Journal of the Acoustic Society of America

    (1990)
  • M.H. Davis et al.

    Hierarchical processing in spoken language comprehension

    The Journal of Neuroscience

    (2003)
  • H.M. Duvernoy

    The human brain: surface, blood supply, and three-dimensional sectional anatomy

    (1999)
  • B. Farell

    Same-“Different” judgments: A review of current controversies in perceptual comparisons

    Psychological Bulletin

    (1985)
  • A.L. Giraud et al.

    Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing

    Cerebral Cortex

    (2004)
  • J.J. Godfrey

    Perceptual difficulty and the right ear advantage for vowels

    Brain and Language

    (1974)
  • F.H. Guenther et al.

    Representation of sound categories in auditory cortical maps

    Journal of Speech, Language, and Hearing Research

    (2004)
  • D.A. Hall et al.

    Sparse temporal sampling in auditory cortex

    Human Brain Mapping

    (1999)
  • D.A. Hall et al.

    Spectral and temporal processing in human auditory cortex

    Cerebral Cortex

    (2002)
  • G. Hickock et al.

    Towards a functional neuroanatomy of speech perception

    Trends in Cognitive Science

    (2000)
  • Cited by (24)

    • Neural processing of poems and songs is based on melodic properties

      2022, NeuroImage
      Citation Excerpt :

      This distinction is in line with the oscillatory dynamics of the brain: higher frequencies (in the gamma-range, i.e., 30–80 Hz) are biased towards the left and lower frequencies (in the delta-range, i.e., 0.3–4 Hz) toward the right hemisphere. Moreover, temporal processing is primarily supported by the left and spectral processing by the right hemisphere, relating to left-biased smaller integration windows and right-biased larger integration windows (Boemio et al., 2005; Britton et al., 2009; Hall et al., 2002; Obleser et al., 2008; Patterson et al., 2002; Schönwiesner et al., 2005; Zatorre and Belin, 2001). The temporal-spectral dissociation is in line with the finding that the perception of lower-level prosodic information relies on a right-hemispheric network that is also recruited during music perception, including song (Gandour et al., 2004; Kreitewolf et al., 2014; Merrill et al., 2012; Sammler et al., 2015; Tong et al., 2005).

    • Auditory-motor coupling affects phonetic encoding

      2019, Brain Research
      Citation Excerpt :

      They also highlighted the active role of the superior temporal gyrus (STG) and its direct access to vestibular inputs. The STG is well-known to be involved in phonetic processing (Chang et al., 2011; DeWitt and Rauschecker, 2012; Steinschneider et al., 2013) for vowel processing see (Britton et al., 2009). Recently, Mesgarani et al. (2014) showed that acoustic phonetic feature encoding is represented in the superior temporal gyrus (STG).

    • Mapping the cortical representation of speech sounds in a syllable repetition task

      2016, NeuroImage
      Citation Excerpt :

      The present result is also consistent with a proposed role of the right planum temporale in converting auditory inputs into phonological representations (Deschamps and Tremblay, 2014). The rightward lateralization of this cluster (and the bilateral localization of STs clusters) might reflect a right-hemisphere preference for vowel processing (Britton et al., 2009). A number of areas that predicted vowel identity during either the input or output portions of the task were not anticipated.

    • Structural white matter asymmetries in relation to functional asymmetries during speech perception and production

      2013, NeuroImage
      Citation Excerpt :

      Analysis of the average number of activated voxel in the left and right temporal lobe revealed, that in both groups activations were largely bilateral. This finding is in line with several other studies which reported bilateral, symmetrical temporal lobe activations in language tasks (Britton et al., 2009; Friederici et al., 2010; Obleser and Kotz, 2010; Price, 2010) and probably reflects the strong bilateral activation related to general auditory rather than phonological or speech processing. While the functional neuroanatomy of speech production and speech perception overlaps to some extent (Rauschecker and Scott, 2009; Scott and Wise, 2004), our fMRI results show that it is important to distinguish between speech production asymmetry and speech perception asymmetry when specifically assessing language lateralization in order to relate it to structural asymmetries.

    • A review and synthesis of the first 20years of PET and fMRI studies of heard speech, spoken language and reading

      2012, NeuroImage
      Citation Excerpt :

      These right hemisphere responses may help to explain why the perception of prosody in heard speech prosody is associated with the right hemisphere, particularly when the language demands of the task are low (Gandour et al., 2004; Meyer et al., 2004). Bilateral superior temporal activation was reported for the acoustic analysis of speech and nonspeech sounds (Turkeltaub and Coslett, 2010; Obleser et al., 2007a, 2007b; Dick et al., 2011) and shown to be sensitive to frequency discriminations (Zaehle et al., 2008), familiarity (Raettig and Kotz, 2008; Davis and Gaskell, 2009; Kotz et al., 2010; Vaden et al., 2010), spectral structure and temporal modulation (Britton et al., 2009; Leaver and Rauschecker, 2010). Left lateralized responses were reported for the discrimination of fast changing verbal and nonverbal sounds in the planum temporale (Elmer et al., 2011a) and for the perceptual interpretation of speech sounds in early auditory areas (Kilian-Hutten et al., 2011).

    View all citing articles on Scopus
    1

    Now at University of Illinois, Urbana Champaign.

    View full text