Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE
PreviousNext
Articles, Behavioral/Systems/Cognitive

Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions

Jonas Obleser, Richard J. S. Wise, M. Alex Dresner and Sophie K. Scott
Journal of Neuroscience 28 February 2007, 27 (9) 2283-2289; DOI: https://doi.org/10.1523/JNEUROSCI.4663-06.2007
Jonas Obleser
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard J. S. Wise
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. Alex Dresner
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sophie K. Scott
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Speech perception is supported by both acoustic signal decomposition and semantic context. This study, using event-related functional magnetic resonance imaging, investigated the neural basis of this interaction with two speech manipulations, one acoustic (spectral degradation) and the other cognitive (semantic predictability). High compared with low predictability resulted in the greatest improvement in comprehension at an intermediate level of degradation, and this was associated with increased activity in the left angular gyrus, the medial and left lateral prefrontal cortices, and the posterior cingulate gyrus. Functional connectivity between these regions was also increased, particularly with respect to the left angular gyrus. In contrast, activity in both superior temporal sulci and the left inferior frontal gyrus correlated with the amount of spectral detail in the speech signal, regardless of predictability. These results demonstrate that increasing functional connectivity between high-order cortical areas, remote from the auditory cortex, facilitates speech comprehension when the clarity of speech is reduced.

  • speech
  • semantics
  • auditory cortex
  • prefrontal cortex
  • functional connectivity
  • fMRI

Introduction

Everyday speech perception is successful despite listening conditions that are usually less than ideal (e.g., the speech signal may be degraded by being embedded in ambient noise or echoes or may be subject to spectral reductions or compression down telephone lines). Despite these distortions, the listener is usually unaware of any difficulty in understanding what has been said. It has been recognized since the 1950s that an important factor supporting speech comprehension is the semantic context in which it is heard. Thus, words embedded in sentences are usually better understood than isolated words, and in noisy environments, comprehension improves once the listener knows the topic of a conversation (Miller et al., 1951; Boothroyd and Nittrouer, 1988; Grant and Seitz, 2000; Stickney and Assmann, 2001; Davis et al., 2005). Therefore, both intuitive reasoning and objective evidence from these psychoacoustic investigations indicate that speech comprehension is the result of an interaction between bottom-up processes, involving decoding of the speech signal along the auditory pathway, and top-down processes informed by semantic context.

The brain regions that interact to match acoustic information with context are not well understood. Functional imaging studies have advanced our understanding of the functional anatomy of the auditory processes supporting speech perception (Binder et al., 1996; Scott et al., 2000; Davis and Johnsrude, 2003; Zekveld et al., 2006) (for review, see Scott and Johnsrude, 2003; Poeppel and Hickok, 2004; Xu et al., 2005), with good evidence for an auditory processing stream along the superior temporal gyrus that is sensitive to intelligibility. Noise vocoding is an effective technique to manipulate the spectral detail of speech (Shannon et al., 1995) and render it more or less intelligible in a graded manner (Scott et al., 2000, 2006; Davis and Johnsrude, 2003; Warren et al., 2006). In a previous behavioral experiment, we demonstrated that a contextual manipulation was most effective at an intermediate level of quality of speech signal (J. Obleser, L. Alba-Ferrara, and S. K. Scott, unpublished observation). We used sentences that varied in semantic predictability, so that the strength of semantic associations between the key words was either high or low (e.g., “He caught a fish in his net” vs “Sue discussed the bruise”) (Kalikow et al., 1977; Stickney and Assmann, 2001). Semantic predictability has been shown previously to influence speech perception (Boothroyd and Nittrouer, 1988; Pichora-Fuller et al., 1995; Stickney and Assmann, 2001), and in our study, we demonstrated that it affected accuracy of comprehension, with an improvement from 50 to 90%, when used with spectrally degraded, noise-vocoded sentences of intermediate degree (eight frequency channels) (Obleser, Alba-Ferrara, and Scott, unpublished observation).

In this study, subjects listened to sentences varying in acoustic degradation and in semantic predictability in an event-related functional magnetic resonance imaging (fMRI) experiment. The aim was to identify the interdependency between bottom-up and top-down processes during speech comprehension, specifically to investigate which brain regions mediate successful yet effortful speech comprehension through contextual information under adverse acoustic conditions and how these brain regions interact.

Materials and Methods

Subjects.

Sixteen right-handed monolingual speakers of British English (seven females; mean age, 24 ± 6 years SD) were recruited. All were native monolingual speakers of English, and none had a history of a neurological, psychiatric, or hearing disorder. No subject had previous experience of noise-vocoded or spectrally rotated speech, and all were naive to the purpose of the study. The total duration of the procedure was <1 h. The study had previous approval of the local ethics committee of the Hammersmith Hospitals Trust.

Stimulus material.

The stimulus material consisted of 180 spoken sentences from the SPIN (speech intelligibility in noise) test (Kalikow et al., 1977) (forms 2.1, 2.3, 2.5, and 2.7), half of which comprise low-predictability sentences (e.g., “Sue discussed the bruise” or “They were considering the gang”) and high-predictability sentences (e.g., “His boss made him work like a slave” and “The watchdog gave a warning growl”), matched for phonetic and linguistic variables such as phonemic features, number of syllables, and content words. The sentences were recorded by a phonetically trained female speaker of British English in a soundproof chamber [using a Brüel & Kjaer (Naerum, Denmark) 2231 sound-level meter fitted with a 4165 cartridge; sampling rate, 44.1 kHz]. The final set of stimuli was created off-line by down-sampling the audio recordings to 20 kHz (9 kHz bandpass), editing the sentences at zero-crossings before and after each sentence, and applying 5 ms linear fades to onsets and offsets. Sentence recordings were normalized with respect to average root-mean-squared amplitude and had an average duration of 2.2 s.

Each of the 180 sentence recordings was submitted to a noise-vocoding routine (Shannon et al., 1995) with 2, 8, or 32 filter bands, resulting in six conditions (Fig. 1a): two predictability levels (low, high) at three intelligibility levels (logarithmically varying spectral degradation through 2-, 8- or 32-band noise vocoding) with 30 stimuli each. A seventh condition was created to serve as an entirely unintelligible control condition: an additional set of 30 noise-vocoded stimuli (32 bands) were presented after they had been spectrally rotated (Blesser, 1972). This control condition has been used in imaging studies (Scott et al., 2000; Narain et al., 2003; Obleser et al., 2007); it leaves the temporal envelope unaltered and preserves the spectrotemporal complexity, whereas the signal is rendered unintelligible by inverting the frequency spectrum.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

a, Illustration of the basic experimental design. Predictability (i.e., the inner semantic coherence of a sentence or how well one key word predicts the others) and intelligibility of the signal (through noise vocoding and thereby reducing the spectral detail) are varied orthogonally. b, Behavioral pretesting results unequivocally identified predictability to be most effective on speech comprehension at intermediate signal quality of eight-band noise-vocoded speech. ***p < 0.0001 [data based on n = 18 (replotted from Obleser, Alba-Ferrara, and Scott, unpublished observation)]. Error bars indicate SEM.

The exact levels of noise vocoding were chosen after a series of behavioral pretests (Obleser, Alba-Ferrara, and Scott, unpublished observation). Eight-band noise-vocoded speech had been identified as the condition in which predictability had the largest influence (for a signal frequency range of 0–9 kHz). At such an intermediate signal quality, the influence of context provided by semantic predictability proved most effective in improving performance. In two experiments that closely preserved the design of this imaging study, identification of key words within sentences improved by almost 40% for sentences with high predictability compared with sentences with low predictability. The same sentence material was used and presented randomly, and we orthogonally varied predictability and intelligibility over a wide range of spectral degradation (noise-vocoding) levels. In contrast, there was no influence of predictability on key word recognition either with two-band noise-vocoded speech or with normal speech. In the current imaging study, we approximated normal speech with 32-band noise-vocoded speech to avoid pop-out effects (Fig. 1b).

Experimental procedures.

Subjects were in the supine position in a 3.0T Philips (Best, The Netherlands) Intera scanner equipped with a six-element SENSE head coil for radiofrequency signal detection, fitted with a B0-dependent auditory stimulus delivery system (MR-Confon, Magdeburg, Germany). In a short familiarization period, all subjects listened to 30 examples of noise-vocoded speech with three levels of degradation and two levels of predictability. The levels of degradation and the sentences were not used in the subsequent fMRI study. There were also 30 trials on spectrally rotated speech. All subjects recognized the noise-vocoded stimuli as speech, despite the varying degrees of spectral degradation and intelligibility.

In the scanner, subjects were instructed to lie still and listen attentively. They were prepared to answer some questions on the material afterward, while no further task was introduced. A series of 240 MR-volume scans (echo-planar imaging) was obtained. Trials of all seven conditions (30 trials and volumes per condition) and 30 additional silent trials were presented in a pseudo-randomized and interleaved manner. An MR volume consisted of 32 axial slices, obliquely oriented to cover the entire brain (an in-plane resolution of 2.5 × 2.5 mm2 with a slice thickness of 3.25 mm and a 0.75 mm gap). Scans were acquired with SENSE factor 2 and second-order shim gradients to reduce blurring and signal loss associated with susceptibility gradients adjacent to the ear canals. Volume scans were acquired using temporal sparse sampling (Hall et al., 1999) with a repetition time of 9.0 s, an acquisition time of 2.0 s, and a single stimulus being presented, in silence, 5 s before the next volume scan. The exact stimulus onset time was jittered randomly ±500 ms to sample the blood oxygenation level-dependent (BOLD) response more robustly. The functional run lasted for 36 min and was followed by a high-resolution T1-weighted scan to obtain a structural MR image for each subject.

Data analysis.

Using SPM5 (Wellcome Department of Imaging Neuroscience, London, UK), images were corrected for slice timing, realigned, coregistered, normalized to a T1 template using parameters from gray matter segmentation, and smoothed (8 mm3 Gaussian kernel). For each subject, seven regressors of interest (seven conditions) and six realignment parameters were modeled using a finite impulse response (Gaab et al., 2006), with the silent trials forming an implicit baseline. At the second (group) level, a random-effects within-subjects ANOVA with seven conditions (each condition contrasted with silence from each subject) was calculated. Unless stated otherwise, all group inferences are reported at an uncorrected level of p < 0.005 and a cluster extent of >30 voxels. Coordinates of peak activations were transformed into Talairach coordinates and labeled according to the Talairach Daemon Database (Lancaster et al., 2000).

Analysis of functional connectivity.

For clusters of activated voxels in the contrast of high- with low-predictability sentences using eight-band noise-vocoded speech (the level of degradation at which semantic predictability had the greatest behavioral effect) (Fig. 1a), a correlation analysis was planned to investigate the strength of functional connectivity between clusters. Using the MarsBaR toolbox within SPM5 and using the first eigenvector to summarize activation of a cluster across voxels, time courses from all significant clusters were extracted from each subject's data across all conditions. The condition- and cluster-specific time courses of all subjects were then collapsed into a median time course that represents the average activity time course of a given brain region in a given condition (having 30 sampling points for the 30 trials of each condition; because it is averaged across subjects, it also has an enhanced signal-to-noise ratio). Correlations of activity time courses between brain regions were then analyzed separately across all conditions and assessed statistically using Pearson's correlation coefficient.

Results

We found extensive bilateral temporal activation in response to all stimuli in all subjects. Therefore, all subjects were included in the random-effects group analysis. First, an F-contrast for any main effect of intelligibility was assessed (i.e., we looked for brain regions that showed a change in activity with increasing spectral detail in the speech signal, regardless of predictability). This analysis revealed extensive bilateral activation in the temporal lobes, with the peak voxels in each hemisphere located in the anterior superior temporal sulcus (STS), with extension on the left into the inferior frontal gyrus (Fig. 2). All of these regions showed a quasi-monotonic increase in BOLD signal with increasing intelligibility of the signal. In contrast, the medial parietal cortex (precuneus) and the left posterior inferior parietal cortex demonstrated decreasing activation with increasing signal quality [see Table 1 for the stereotactic coordinates, in MNI (Montreal Neurological Institute) space, for the peak voxels].

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Main effect of intelligibilty. Results from an F-contrast for the main effect of intelligibility (number of bands) is shown, yielding extensive bilateral clusters in temporal lobes, with peak voxels (contrast bar graphs for all 7 conditions are shown below) in the left and right STS that are not modulated by predictability. In the left hemisphere, the cluster also incorporates the inferior frontal cortex. The posterior parietal activations show the reverse pattern (decreasing activation with increasing intelligibility). All activations are p < 0.005; the cluster extent is >30 voxels. Error bars indicate SEM. lo, Low; hi, high; rot., rotated.

View this table:
  • View inline
  • View popup
Table 1.

Overview of significant clusters in random-effects analysis (p < 0.005; cluster extent, >30 voxels)

Second, the influence of sentence predictability at an intermediate degradation level was investigated, informed by the known behavioral effect at intermediate signal quality (eight-band noise-vocoded speech), the condition when predictability had the greatest influence on speech comprehension (Fig. 1). The left and right anterior STS showed no difference in activity between low- and high-predictability eight-band noise-vocoded sentences. However, activity with high-predictability sentences extended posteriorly into the left posterior temporal and inferior parietal cortices and forward into the left temporal pole and ventral inferior frontal cortex (Fig. 3). There were additional activations outside the temporal lobes, in the medial prefrontal and posterior cingulate cortices.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Overview over differential activation for high- versus low-predictability sentences at intermediate (8-band) signal quality. The top and middle panels of activation overlays display the two conditions separately (in red and blue, respectively) compared with spectrally rotated speech. The bottom panels of brain overlays show the direct comparison of both. Bar graphs show activations in cluster peak voxels of direct comparison. BA 39, BA 8, and BA 30 (left panels) and BA 9 and BA 47 (right panels) all exhibit strongest activation through high-predictability eight-band speech. All activations are p < 0.005; the cluster extent is >30 voxels. Error bars indicate SEM. lo, Low; hi, high; rot., rotated; hi-pred, high predictability; lo-pred, low predictability.

A direct comparison of brain activity in response to sentences of high and low predictability with the eight-band noise-vocoded sentences confirmed the activation of these cortical areas (Fig. 3, Table 1). Five brain regions, four lateralized to the left hemisphere and one midline in the anterior prefrontal cortex, demonstrated increased activity in response to degraded yet highly predictable speech. The left dorsolateral prefrontal cortex, angular gyrus, and posterior cingulate cortex did so only under this condition of effortful yet successful speech comprehension. Importantly, activity returned to baseline when the sentences were both highly predictable and readily intelligible; therefore, these regions were not demonstrating a response simply to success at comprehension, although this effect was observed in the medial prefrontal cortex and in the left inferior frontal gyrus, where activity did not differ between 8- and 32-band noise-vocoded speech when sentences were of high predictability.

Third, the functional connectivity between these five cortical areas was determined, as described in Materials and Methods. Because these cortical areas were engaged when the degraded speech signal was heard within the context of high predictability (i.e., when comprehension was enhanced by semantic context), a change in functional connectivity was expected (expressed as an across-trials correlation between one cortical area and another). This prediction was confirmed as an increase in correlation of the activity between cortical areas when the eight-band noise-vocoded sentences were of high predictability (Table 2). Notably, the correlation of the responses between the left angular gyrus and prefrontal cortex changed from being not significant (r = 0.12 and r = 0.25 for the lateral and medial prefrontal cortex, respectively) when the sentences were of low predictability to significant (r = .68 and r = .71, respectively; p < 0.0001) when sentences were of high predictability (Fig. 4). Although the hypothesis-led analyses were directed at the comparison of sentences of high versus low predictability heard at an intermediate signal quality, we also observed that the correlation of activity between brain regions was not significant when subjects heard unintelligible rotated speech. The correlations between activated areas within the prefrontal cortex and between anterior and posterior midline areas were high, regardless of whether the eight-band noise-vocoded sentences were of high or low predictability (i.e., predictability did not modulate the functional connectivity between these areas) (Table 2).

View this table:
  • View inline
  • View popup
Table 2.

Changes in correlation comparing high-predictability and low-predictability eight-band conditions (Δr)

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

a, Overview over the changes in correlation between brain regions for the high-predictability eight-band condition (red) and the low-predictability eight-band condition (blue). Numbers in clusters indicate BAs of cluster peak voxels. All arrows shown indicate an increase in the positive correlation for the high-predictability condition, whereas no brain areas are more strongly interlinked for low predictability. +p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. The dashed blue lines indicate nonsignificant correlations. For details of changes in correlation (Δr), see Table 2. b, Scatterplots with linear regression slopes exemplifying the increase in the positive correlation between the left posterior temporal/inferior parietal cortex (angular gyrus, BA 39; x-axis) and left dorsolateral prefrontal cortex (superior frontal gyrus, BA 8; y-axis). Values plotted are condition-specific first eigenvector time series from the cluster identified in random-effects group analysis averaged across subjects (see Materials and Methods).

Discussion

We have demonstrated how changes in functional integration across very distributed brain regions improve speech perception under acoustically suboptimal conditions. Using a design that varied orthogonally intelligibility (Shannon et al., 1995) and semantic predictability (Stickney and Assmann, 2001) revealed functional connections between areas in the temporal, inferior parietal, and prefrontal cortices that strengthened and supported comprehension of sentences with high semantic predictability but intermediate signal quality. Because the signal quality was constant (eight-band noise-vocoded speech) across sentences of both low and high predictability, this effect could be attributed to the modulating effect of semantic context (Kalikow et al., 1977; Stickney and Assmann, 2001) (Obleser, Alba-Ferrara, and Scott, unpublished observation). This was further confirmed by the observation that the strengthening of functional connections was between areas that were not responding simply to increased sentence comprehension, because activity was not maintained in response to the easily understood 32-band noise-vocoded sentences.

Activation in the contrast of high- and low-predictable eight-band noise-vocoded sentences was most evident in the left angular gyrus. Activity in this area, and at the other extreme of the left temporal lobe, in the anterior temporal cortex extending into the ventral inferior frontal gyrus, showed this effect of high predictability more than low predictability at intermediate signal quality. Additional areas demonstrating the same effect were the left dorsolateral and medial prefrontal cortices and the left posterior cingulate cortex. In contrast, the lateral temporal necortex within the superior temporal gyrus and sulcus was sensitive to the increasing spectral detail across all stimuli.

The enhanced activity was seen only in the high-predictability eight-band condition; when speech comprehension was effortless in response to 32-band speech, activity in the angular gyrus, posterior cingulate cortex, and dorsolateral prefrontal cortex returned to baseline level (Fig. 3). In other words, only if speech comprehension succeeds despite adverse acoustic conditions are these regions involved. Changes in predictability in the absence of signal degradation (i.e., for 32-band signals) were not accompanied by substantial increases in brain activation, nor did they lead to differences in speech recognition in the behavioral pretests. The conclusion is that the influence of semantic context when listening to short sentences only becomes crucial once the signal is compromised.

Functional connectivity analysis among the activated clusters yielded positive correlations between their time courses of activity. This evidence for functional integration was most evident when the subjects listened to eight-band noise-vocoded speech that could be decoded because of semantic context compared with the corresponding signal when semantic context was absent. As summarized in Table 2, the connectivity between the angular gyrus and the other four activation clusters showed the greatest increase (Fig. 4). Interestingly, this strength of connectivity between the angular gyrus and the frontal lobe is reduced in developmental dyslexia (Horwitz et al., 1998), and a recent study demonstrated parietal-frontal “underconnectivity” during sentence comprehension in autistic subjects (Kana et al., 2006). The strengthened connectivity along the temporal lobe (between the angular gyrus and temporal pole) fits well with recent evidence on anatomical links between these areas (middle longitudinal fasciculus), as does the link between the angular gyrus and lateral prefrontal cortex (Schmahmann and Pandya, 2006). A recent study on written text comprehension also identified an increase in activity in the angular gyrus [Brodmann's area (BA) 39] in a contrast of real-word sentences with sentences comprising pseudowords (Ferstl and von Cramon, 2002). This is additional evidence that the angular gyrus is a resource for semantic processing, activated in our study when semantics had a decisive influence on speech perception.

Because all of the regions are distributed across the frontal and parietal cortices, their contribution to speech comprehension is likely to be of higher order than basic acoustic processing. These widespread activations are likely to represent a number of cognitive-supporting mechanisms, among them aspects of working memory and attention. One hypothesis is that the contribution of the angular gyrus is through its role in verbal working memory, and it is a region that has frequently been implicated in explicit semantic decision tasks (Binder et al., 2003; Scott et al., 2003; Sharp et al., 2004). Other processes such as phonological memory and auditory–motor transformation processes (Jacquemot et al., 2003; Hickok and Poeppel, 2004; Warren et al., 2005; Jacquemot and Scott, 2006) that might contribute to recovering meaning from degraded speech signals are located in adjacent but separate temporo-parietal areas. Working memory operations in the left dorsolateral prefrontal cortex [Petrides, 1994; Owen, 1997; for a review on working memory, see Owen et al. (2005)] might support comprehension under adverse conditions by permitting the manipulation of the degraded stimuli within short-term memory. Thus, components of the degraded sentence that do not map automatically onto meaning can be reconstructed by reprocessing them within the context of semantic predictability.

The fronto-parietal network seen here might also reflect monitoring and selection processes more commonly associated with attention than only maintaining information in short-term memory (Lebedev et al., 2004). This interpretation postulates that the prefrontal cortex has a role in directing attention to relevant auditory features, to guide both short-term memory and access to long-term memory representations (for review, see Miller and Cohen, 2001). Most relevant to comprehending distorted speech in a sentence context is the concept of competition among lexical and phonological candidates, because signal degradation introduces considerable ambiguity. A recently suggested framework for the prefrontal cortex and conceptual selection problems (Kan and Thompson-Schill, 2004) would imply that the system for speech perception has to solve problems associated with lexical selection. Because of the acoustic ambiguity, each key word might activate multiple possible word candidates. With high semantic coherence, top-down influences guide correct lexical selection, but with low semantic coherence, lexical selection will be much less successful. For such selection and competition processes, the left prefrontal cortex is engaged (Tippett et al., 2004). Also, top-down control of selective attention in our study might encompass the on-line formation of increasingly specific hypotheses about which sentence-final word to expect in a degraded yet predictable sentence. This in turn would enable more thorough and, ultimately, more successful (re-)analysis of the noise-vocoded signal as more elements of the sentence become available.

Finally, the facilitation through context in the current stimulus set is likely to entail a range of possible subordinate lexical mechanisms by which predictability supports comprehension. Both verb semantics (by narrowing down the context) and semantic associations in general (by allowing to “prime” for other word candidates) are possible influences here (Kalikow et al., 1977; Friederici and Kotz, 2003), a matter to disentangle in additional studies.

To summarize, successful speech perception under less than ideal listening conditions depends on greater functional integration across a very distributed left-hemispheric network of cortical areas. Therefore, speech perception is facilitated when high-order cognitive subsystems become engaged, and it cannot be considered as the product of processing within the unimodal auditory cortex alone (Jacquemot and Scott, 2006).

A clear lateralization to the left was observed. Left-hemisphere predominance is often absent from studies of speech perception and comprehension, and it was by and large absent in the main effect of intelligibility results in the present study (Fig. 2). Thus, it appears that only once higher-order processes interact with downstream perceptual systems is the left lateralization established. Notably, left- and right-hemispheric temporal cortices also varied in their degree of responsiveness to the entirely unintelligible sounds of rotated speech: as can be seen in Figure 2, right STS areas show near-baseline activation in response to rotated speech, whereas the left STS area clearly exhibits a relative deactivation [compare previous findings on rotated speech (Scott et al., 2000; Narain et al., 2003)].

Interestingly, Broca's area (BA 44/45) was not activated differentially by high and low predictability despite its known role in rule-based processing also in speech. Its response pattern was statistically indistinguishable from the anterolateral STS (Fig. 2). Thus, structural processing of the language content in our set of stimuli may only have become possible once a certain level of signal quality was attained, and that it increased further as perceptual ambiguity was overcome.

By using a factorial design, parametric degradation of the speech signal and an acoustically matched unintelligible baseline, we have not only been able to confirm the network of cortical areas that respond to speech intelligibility (Binder et al., 2000; Scott et al., 2000, 2004; Davis and Johnsrude, 2003; Obleser et al., 2006; Zekveld et al., 2006) but have demonstrated the dynamic changes within the network that occur when intelligibility and semantic context interact. This captures the nature of speech perception under real-life conditions.

In conclusion, the combination of two established manipulations of speech, one acoustic (spectral degradation through noise vocoding) and the other linguistic (semantic predictability), allowed us to demonstrate a widely distributed left-hemisphere array of cortical areas that can establish speech perception under adverse listening conditions. These areas are remote from the unimodal auditory cortex in high-order heteromodal and amodal cortices. Their functional connectivity is strengthened when semantic context has a beneficial influence on speech perception.

Footnotes

  • This work was supported by the Wellcome Trust (S.K.S.), the Medical Research Council (R.J.S.W.), and the Landesstiftung Baden-Württemberg Germany (J.O.). We are grateful to Lucy Alba-Ferrara for help with acquiring the data and to two anonymous reviewers for their helpful suggestions.

  • Correspondence should be addressed to Dr. Jonas Obleser, Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London WC1N 3AR, UK. j.obleser{at}ucl.ac.uk

References

  1. ↵
    1. Binder JR,
    2. Frost JA,
    3. Hammeke TA,
    4. Rao SM,
    5. Cox RW
    (1996) Function of the left planum temporale in auditory and linguistic processing. Brain 119:1239–1247.
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Binder JR,
    2. Frost JA,
    3. Hammeke TA,
    4. Bellgowan PS,
    5. Springer JA,
    6. Kaufman JN,
    7. Possing ET
    (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10:512–528.
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Binder JR,
    2. McKiernan KA,
    3. Parsons ME,
    4. Westbury CF,
    5. Possing ET,
    6. Kaufman JN,
    7. Buchanan L
    (2003) Neural correlates of lexical access during visual word recognition. J Cogn Neurosci 15:372–393.
    OpenUrlCrossRefPubMed
  4. ↵
    1. Blesser B
    (1972) Speech perception under conditions of spectral transformation.I. Phonetic characteristics. J Speech Hear Res 15:5–41.
    OpenUrlPubMed
  5. ↵
    1. Boothroyd A,
    2. Nittrouer S
    (1988) Mathematical treatment of context effects in phoneme and word recognition. J Acoust Soc Am 84:101–114.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Davis MH,
    2. Johnsrude IS
    (2003) Hierarchical processing in spoken language comprehension. J Neurosci 23:3423–3431.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Davis MH,
    2. Johnsrude IS,
    3. Hervais-Adelman A,
    4. Taylor K,
    5. McGettigan C
    (2005) Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences. J Exp Psychol Gen 134:222–241.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Ferstl EC,
    2. von Cramon DY
    (2002) What does the frontomedian cortex contribute to language processing: coherence or theory of mind? NeuroImage 17:1599–1612.
    OpenUrlCrossRefPubMed
  9. ↵
    1. Friederici AD,
    2. Kotz SA
    (2003) The brain basis of syntactic processes: functional imaging and lesion studies. NeuroImage 20(Suppl 1r):S8–S17.
    OpenUrlCrossRefPubMed
    1. Gaab N,
    2. Gabrieli JDE,
    3. Glover GH
    (2007) Assessing the influence of scanner background noise on auditory processing I. An fMRI study comparing three experimental designs with varying degrees of scanner noise. Hum Brain Mapp, in press.
  10. ↵
    1. Grant KW,
    2. Seitz PF
    (2000) The recognition of isolated words and words in sentences: individual variability in the use of sentence context. J Acoust Soc Am 107:1000–1011.
    OpenUrlCrossRefPubMed
  11. ↵
    1. Hall DA,
    2. Haggard MP,
    3. Akeroyd MA,
    4. Palmer AR,
    5. Summerfield AQ,
    6. Elliott MR,
    7. Gurney EM,
    8. Bowtell RW
    (1999) “Sparse” temporal sampling in auditory fMRI. Hum Brain Mapp 7:213–223.
    OpenUrlCrossRefPubMed
  12. ↵
    1. Hickok G,
    2. Poeppel D
    (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92:67–99.
    OpenUrlCrossRefPubMed
  13. ↵
    1. Horwitz B,
    2. Rumsey JM,
    3. Donohue BC
    (1998) Functional connectivity of the angular gyrus in normal reading and dyslexia. Proc Natl Acad Sci USA 95:8939–8944.
    OpenUrlAbstract/FREE Full Text
  14. ↵
    1. Jacquemot C,
    2. Scott SK
    (2006) What is the relationship between phonological short-term memory and speech processing? Trends Cogn Sci 10:480–486.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Jacquemot C,
    2. Pallier C,
    3. LeBihan D,
    4. Dehaene S,
    5. Dupoux E
    (2003) Phonological grammar shapes the auditory cortex: a functional magnetic resonance imaging study. J Neurosci 23:9541–9546.
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Kalikow DN,
    2. Stevens KN,
    3. Elliott LL
    (1977) Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J Acoust Soc Am 61:1337–1351.
    OpenUrlCrossRefPubMed
  17. ↵
    1. Kan IP,
    2. Thompson-Schill SL
    (2004) Selection from perceptual and conceptual representations. Cogn Affect Behav Neurosci 4:466–482.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Kana RK,
    2. Keller TA,
    3. Cherkassky VL,
    4. Minshew NJ,
    5. Just MA
    (2006) Sentence comprehension in autism: thinking in pictures with decreased functional connectivity. Brain 129:2484–2493.
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. Lancaster JL,
    2. Woldorff MG,
    3. Parsons LM,
    4. Liotti M,
    5. Freitas CS,
    6. Rainey L,
    7. Kochunov PV,
    8. Nickerson D,
    9. Mikiten SA,
    10. Fox PT
    (2000) Automated Talairach atlas labels for functional brain mapping. Hum Brain Mapp 10:120–131.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Lebedev MA,
    2. Messinger A,
    3. Kralik JD,
    4. Wise SP
    (2004) Representation of attended versus remembered locations in prefrontal cortex. PLoS Biol 2:e365.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Miller EK,
    2. Cohen JD
    (2001) An integrative theory of prefrontal cortex function. Annu Rev Neurosci 24:167–202.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Miller GA,
    2. Heise GA,
    3. Lichten W
    (1951) The intelligibility of speech as a function of the context of the test materials. J Exp Psychol 41:329–335.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Narain C,
    2. Scott SK,
    3. Wise RJ,
    4. Rosen S,
    5. Leff A,
    6. Iversen SD,
    7. Matthews PM
    (2003) Defining a left-lateralized response specific to intelligible speech using fMRI. Cereb Cortex 13:1362–1368.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Obleser J,
    2. Boecker H,
    3. Drzezga A,
    4. Haslinger B,
    5. Hennenlotter A,
    6. Roettinger M,
    7. Eulitz C,
    8. Rauschecker JP
    (2006) Vowel sound extraction in anterior superior temporal cortex. Hum Brain Mapp 27:562–571.
    OpenUrlCrossRefPubMed
  25. ↵
    1. Obleser J,
    2. Zimmermann J,
    3. Van Meter JW,
    4. Rauschecker JP
    (2007) Multiple stages of auditory speech perception reflected in event-related fMRI. Cereb Cortex, in press.
  26. ↵
    1. Owen AM
    (1997) The functional organization of working memory processes within human lateral frontal cortex: the contribution of functional neuroimaging. Eur J Neurosci 9:1329–1339.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Owen AM,
    2. McMillan KM,
    3. Laird AR,
    4. Bullmore E
    (2005) N-back working memory paradigm: a meta-analysis of normative functional neuroimaging studies. Hum Brain Mapp 25:46–59.
    OpenUrlCrossRefPubMed
  28. ↵
    1. Petrides M
    (1994) Frontal lobes and behaviour. Curr Opin Neurobiol 4:207–211.
    OpenUrlCrossRefPubMed
  29. ↵
    1. Pichora-Fuller MK,
    2. Schneider BA,
    3. Daneman M
    (1995) How young and old adults listen to and remember speech in noise. J Acoust Soc Am 97:593–608.
    OpenUrlCrossRefPubMed
  30. ↵
    1. Poeppel D,
    2. Hickok G
    (2004) Towards a new functional anatomy of language. Cognition 92:1–12.
    OpenUrlCrossRefPubMed
  31. ↵
    1. Schmahmann JD,
    2. Pandya DN
    (2006) Fiber pathways of the brain (Oxford UP, Oxford).
  32. ↵
    1. Scott SK,
    2. Johnsrude IS
    (2003) The neuroanatomical and functional organization of speech perception. Trends Neurosci 26:100–107.
    OpenUrlCrossRefPubMed
  33. ↵
    1. Scott SK,
    2. Blank CC,
    3. Rosen S,
    4. Wise RJ
    (2000) Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123:2400–2406.
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Scott SK,
    2. Leff AP,
    3. Wise RJ
    (2003) Going beyond the information given: a neural system supporting semantic interpretation. NeuroImage 19:870–876.
    OpenUrlCrossRefPubMed
  35. ↵
    1. Scott SK,
    2. Rosen S,
    3. Wickham L,
    4. Wise RJ
    (2004) A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception. J Acoust Soc Am 115:813–821.
    OpenUrlCrossRefPubMed
  36. ↵
    1. Scott SK,
    2. Rosen S,
    3. Lang H,
    4. Wise RJ
    (2006) Neural correlates of intelligibility in speech investigated with noise vocoded speech—a positron emission tomography study. J Acoust Soc Am 120:1075–1083.
    OpenUrlCrossRefPubMed
  37. ↵
    1. Shannon RV,
    2. Zeng FG,
    3. Kamath V,
    4. Wygonski J,
    5. Ekelid M
    (1995) Speech recognition with primarily temporal cues. Science 270:303–304.
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. Sharp DJ,
    2. Scott SK,
    3. Wise RJ
    (2004) Monitoring and the controlled processing of meaning: distinct prefrontal systems. Cereb Cortex 14:1–10.
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Stickney GS,
    2. Assmann PF
    (2001) Acoustic and linguistic factors in the perception of bandpass-filtered speech. J Acoust Soc Am 109:1157–1165.
    OpenUrlCrossRefPubMed
  40. ↵
    1. Tippett LJ,
    2. Gendall A,
    3. Farah MJ,
    4. Thompson-Schill SL
    (2004) Selection ability in Alzheimer's disease: investigation of a component of semantic processing. Neuropsychology 18:163–173.
    OpenUrlCrossRefPubMed
  41. ↵
    1. Warren JD,
    2. Scott SK,
    3. Price CJ,
    4. Griffiths TD
    (2006) Human brain mechanisms for the early analysis of voices. NeuroImage 31:1389–1397.
    OpenUrlCrossRefPubMed
  42. ↵
    1. Warren JE,
    2. Wise RJ,
    3. Warren JD
    (2005) Sounds do-able: auditory-motor transformations and the posterior temporal plane. Trends Neurosci 28:636–643.
    OpenUrlCrossRefPubMed
  43. ↵
    1. Xu J,
    2. Kemeny S,
    3. Park G,
    4. Frattali C,
    5. Braun A
    (2005) Language in context: emergent features of word, sentence, and narrative comprehension. NeuroImage 25:1002–1015.
    OpenUrlCrossRefPubMed
  44. ↵
    1. Zekveld AA,
    2. Heslenfeld DJ,
    3. Festen JM,
    4. Schoonhoven R
    (2006) Top-down and bottom-up processes in speech comprehension. NeuroImage 32:1826–1836.
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 27 (9)
Journal of Neuroscience
Vol. 27, Issue 9
28 Feb 2007
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions
Jonas Obleser, Richard J. S. Wise, M. Alex Dresner, Sophie K. Scott
Journal of Neuroscience 28 February 2007, 27 (9) 2283-2289; DOI: 10.1523/JNEUROSCI.4663-06.2007

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions
Jonas Obleser, Richard J. S. Wise, M. Alex Dresner, Sophie K. Scott
Journal of Neuroscience 28 February 2007, 27 (9) 2283-2289; DOI: 10.1523/JNEUROSCI.4663-06.2007
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Articles

  • Choice Behavior Guided by Learned, But Not Innate, Taste Aversion Recruits the Orbitofrontal Cortex
  • Maturation of Spontaneous Firing Properties after Hearing Onset in Rat Auditory Nerve Fibers: Spontaneous Rates, Refractoriness, and Interfiber Correlations
  • Insulin Treatment Prevents Neuroinflammation and Neuronal Injury with Restored Neurobehavioral Function in Models of HIV/AIDS Neurodegeneration
Show more Articles

Behavioral/Systems/Cognitive

  • Influence of Reward on Corticospinal Excitability during Movement Preparation
  • Identification and Characterization of a Sleep-Active Cell Group in the Rostral Medullary Brainstem
  • Gravin Orchestrates Protein Kinase A and β2-Adrenergic Receptor Signaling Critical for Synaptic Plasticity and Memory
Show more Behavioral/Systems/Cognitive
  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
(JNeurosci logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.