Elsevier

Neuropsychologia

Volume 50, Issue 9, July 2012, Pages 2154-2164
Neuropsychologia

Note
Auditory skills and brain morphology predict individual differences in adaptation to degraded speech

https://doi.org/10.1016/j.neuropsychologia.2012.05.013Get rights and content

Abstract

Noise-vocoded speech is a spectrally highly degraded signal, but it preserves the temporal envelope of speech. Listeners vary considerably in their ability to adapt to this degraded speech signal. Here, we hypothesised that individual differences in adaptation to vocoded speech should be predictable by non-speech auditory, cognitive, and neuroanatomical factors. We tested 18 normal-hearing participants in a short-term vocoded speech-learning paradigm (listening to 100 4-band-vocoded sentences). Non-speech auditory skills were assessed using amplitude modulation (AM) rate discrimination, where modulation rates were centred on the speech-relevant rate of 4 Hz. Working memory capacities were evaluated (digit span and nonword repetition), and structural MRI scans were examined for anatomical predictors of vocoded speech learning using voxel-based morphometry. Listeners who learned faster to understand degraded speech also showed smaller thresholds in the AM discrimination task. This ability to adjust to degraded speech is furthermore reflected anatomically in increased grey matter volume in an area of the left thalamus (pulvinar) that is strongly connected to the auditory and prefrontal cortices. Thus, individual non-speech auditory skills and left thalamus grey matter volume can predict how quickly a listener adapts to degraded speech.

Highlights

► Individuals differ in their perceptual adaptation to degraded speech. ► Auditory non-speech skills predict short-term adaptation to vocoded speech. ► Envelope sensitivity is a more robust predictor than other cognitive skills. ► Brain morphology (left thalamic volume) is also predictive of perceptual adaptation.

Introduction

Noise-vocoding has been used to simulate cochlear-implant (CI) transduced speech in normal-hearing listeners (Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995). Vocoding removes most of the spectral properties of the auditory signal, while its temporal envelope cues remain largely preserved. Normal-hearing listeners learn to understand this type of degraded speech surprisingly quickly on the first exposure (Davis et al., 2005, Eisner et al., 2010). However, there is considerable variability across listeners in their ability to perceptually adapt to this degraded auditory input. Although there are larger individual differences in speech perception performance in CI users than in normal-hearing subjects listening to a CI simulation, normal-hearing listeners still differ considerably in learning to understand vocoded speech (Shannon, Galvin, & Baskent, 2002). Currently it is unclear what drives the adaptation to this degraded speech signal. In cochlear-implanted children, demographic factors such as age at implantation, duration of deafness, residual hearing before implantation or the number of electrodes of the CI can only explain about half of the variance in outcome (Blamey et al., 2001, Sarant et al., 2001).

Phonological working memory is one cognitive factor that has been linked to the development of speech recognition abilities (Gathercole, Willis, Baddeley, & Emslie, 1994). According to the model proposed by Baddeley and colleagues, working memory relies on two processes: the “buffer”, which holds memory traces for a few moments, and the “phonological loop”, which can be imagined as a subvocal rehearsal process (Baddeley, 2012). Pisoni and colleagues have investigated the relationship between phonological working memory and individual speech recognition performance in CI users, and found that working memory scores, as measured by digit span, significantly correlated with spoken word recognition scores in paediatric CI users, even after statistical partialling out of demographic factors (Pisoni & Cleary, 2003).

In the present study, we wanted to examine whether we could replicate these results in normal-hearing adults listening to vocoded speech. Digit span requires the participant to hold online representations of perceived auditory information, taxing the “buffer” in Baddeley's working memory model. However, as digit span demands memorizing of highly familiar items that have a long-term semantic representation, it may not be a reliable test of phonological working memory (Jacquemot & Scott, 2006), especially if semantic and phonological representations have separable stores as some patient studies suggest (Romani & Martin, 1999). In order to test more specifically phonological working memory, we therefore additionally employed a nonword repetition test in the current study. Degraded speech recognition challenges phonological analysis and requires the maintenance of auditory input in the working memory for a short period of time. Thus, we expected that both tests of working memory, digit span and nonword repetition, should independently explain some of the variability associated with vocoded speech perception.

From an acoustic perspective, noise-vocoded speech has very poor spectral resolution. This forces listeners to rely more on temporal envelope cues for speech recognition. The utilisation of these cues might be easier for listeners with a higher sensitivity to envelope fluctuations in an auditory signal. We therefore hypothesised that the individual capability to adapt to vocoded speech is related to amplitude modulation (AM) rate discrimination thresholds. AM discrimination performance provides a measure of the auditory system's ability to encode temporal information in a waveform's envelope (Wakefield & Viemeister, 1990). The most prominent low-frequency amplitude modulations in the temporal envelope of speech are near the syllabic rate of approximately 4 Hz (Giraud et al., 2000, Houtgast and Steeneken, 1985). As these low AM frequencies are essential for speech perception, we decided to test listeners in their sensitivity to AM rates centred on 4 Hz. In the present study, we deliberately chose to assess sensitivity to AM rate rather than AM depth, which is traditionally used to determine temporal modulation transfer functions (Viemeister, 1979). We assessed thresholds to modulation rate instead, as it captures the temporal aspect of envelope encoding which is emphasised when listening to vocoded speech.

Hence we anticipated that, complementarily to phonological working memory scores, auditory processing skills that are not speech-specific (i.e., AM rate sensitivity) could account for part of the individual differences in perceptual learning of vocoded speech.

A small number of studies have investigated the relationship between brain activation patterns and perceptual adaptation to degraded speech. Notably, Adank and Devlin (2010) conducted a functional MRI study of adaptation to time-compressed speech. They observed adaptation-related activity bilaterally in the auditory cortex (posterior superior temporal sulcus) and in the left ventral premotor cortex. The authors concluded that perceptual learning of degraded speech involves mapping novel acoustic patterns onto articulatory motor plans (Adank & Devlin, 2010).

Using a short-term vocoded speech-learning paradigm, Eisner et al. (2010) found that individual performance improvement in vocoded speech perception was correlated with the functional MRI signal change in the inferior frontal gyrus (IFG). Additionally they observed a positive correlation between activity in the angular gyrus (AG) and individual learning curves over the course of 100 noise-vocoded sentences (but did not have continuous measures of performance throughout the scan period).

Golestani and colleagues explored the structural brain correlates of novel speech sound learning. They found that faster phonetic learners had increased white matter density in the parieto-occipital sulcus (Golestani, Paus, & Zatorre, 2002). Though, to our knowledge, structural correlates of vocoded speech learning have not been investigated so far. Here, we examine whether it is possible to predict perceptual learning of a vocoded speech from the brain structure of a listener.

The present study tested normal-hearing adults in a short-term (ca. 20 min long) vocoded speech-learning paradigm. A set of tests including digit span, nonword repetition and AM rate discrimination were administered. Additionally, we analysed structural MRI scans using voxel-based morphometry (VBM). The aim of the study was to elucidate whether and to what extent these cognitive, non-speech auditory, and neuroanatomic measures could predict how well a listener perceptually adapts to vocoded speech.

Section snippets

Participants

Eighteen participants (nine females, age range 22–30 years, mean 26.2 years) took part in this study. Participants were recruited from the participant database of the Max Planck Institute for Human Cognitive and Brain Sciences. All were right-handed, monolingual speakers of German with no known hearing or language impairments or neurological problems. They were naïve to noise-vocoded speech and had not participated in a vocoded speech perception experiment before. Participants received

Adaptation to vocoded speech

First, we established that participants exhibited performance increases in vocoded speech recognition over the course of the experiment. A paired-sample t-test showed that participants' mean report scores in the first half of the experiment were significantly lower than the ones in the second half (t(17)=−7.64, p<0.001; see Fig. 1A). Another index for participants' overall performance improvement over time is the correlation between mean report scores with sentence position within the

Discussion

In the present study, we tested individual short-term adaptation to degraded (i.e., noise-vocoded) speech. We hypothesised that individual differences in adaptation to vocoded speech should be predictable by non-speech auditory, cognitive, and neuroanatomical factors. The main finding of the present study is the predictive potential of AM rate discrimination ability and morphology of the left thalamus in individual perceptual learning of vocoded speech.

Acknowledgements

Research was supported by the Max Planck Society (Max Planck Research Group fund to J.O.) and F.E. was supported by NWO grant 275-75-009. Zoe Schlüter helped acquire and analyse the data; Lars Meyer provided scripts and was very helpful with the VBM analysis; Antje Strauß helped develop the sentence material; Claudia Männel provided an adapted version of the nonword repetition test. Two anonymous reviewers contributed constructive comments and helped improve this manuscript.

References (63)

  • D.S. Lazard et al.

    Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation

    Neuropsychologia

    (2011)
  • D.S. Lazard et al.

    Phonological processing in post-lingual deafness and cochlear implant outcome

    Neuroimage

    (2010)
  • S.L. Mattys et al.

    Recognizing speech under a processing load: dissociating energetic from informational factors

    Cognitive Psychology

    (2009)
  • J.E. Peelle et al.

    Adjusting for global effects in voxel-based morphometry: gray matter decline in normal aging

    Neuroimage

    (2012)
  • T. Piquado et al.

    Effects of degraded sensory input on memory for speech: behavioral data and a test of biologically constrained computational models

    Brain Research

    (2010)
  • A. Tomoda et al.

    Pseudohypacusis in childhood and adolescence is associated with increased gray matter volume in the medial frontal gyrus and superior temporal gyrus

    Cortex

    (2012)
  • K. von Kriegstein et al.

    Task-dependent modulation of medial geniculate body is behaviorally relevant for speech recognition

    Current Biology

    (2008)
  • S. Amitay et al.

    Auditory frequency discrimination learning is affected by stimulus variability

    Perception and Psychophysics

    (2005)
  • J.R. Anderson et al.

    Reflections of the environment in memory

    Psychological Science

    (1991)
  • A. Baddeley

    Working memory: theories, models, and controversies

    Annual Review of Psychology

    (2012)
  • T.R. Barkat et al.

    A critical period for auditory thalamocortical connectivity

    Nature Neuroscience

    (2011)
  • T.E. Behrens et al.

    Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging

    Nature Neuroscience

    (2003)
  • V.R. Bejjanki et al.

    Perceptual learning as improved probabilistic inference in early sensory areas

    Nature Neuroscience

    (2011)
  • P.J. Blamey et al.

    Relationships among speech perception, production, language, hearing loss, and age in children with impaired hearing

    Journal of Speech, Language, and Hearing Research

    (2001)
  • Brett, M., Anton, J., Valabregue, R., & Poline, J. (2002). Region of interest analysis using an SPM toolbox....
  • R.A. Burkholder et al.

    Effects of a cochlear implant simulation on immediate memory in normal-hearing adults

    International Journal of Audiology

    (2005)
  • M.H. Davis et al.

    Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences

    Journal of Experimental Psychology. General

    (2005)
  • R. Drullman et al.

    Effect of reducing slow temporal modulations on speech reception

    Journal of the Acoustical Society of America

    (1994)
  • R. Drullman et al.

    Effect of temporal envelope smearing on speech reception

    Journal of the Acoustical Society of America

    (1994)
  • E. Dupoux et al.

    Perceptual adjustment to highly compressed speech: effects of talker and rate changes

    Journal of Experimental Psychology: Human Perception and Performance

    (1997)
  • F. Eisner et al.

    Inferior frontal gyrus activation predicts individual differences in perceptual learning of cochlear-implant simulations

    Journal of Neuroscience

    (2010)
  • Cited by (43)

    • Age-related differences in the neural network interactions underlying the predictability gain

      2022, Cortex
      Citation Excerpt :

      The fMRI experiment had a duration of 50 min, the total duration of the second session was approximately 90 min. Experimental sentences were taken from the German Speech Intelligibility in Noise (G-SPIN) material (Kalikow, Stevens, & Elliott, 1977; for a detailed description of the German version, see; Erb, Henry, Eisner, & Obleser, 2012). The 216 sentences consisted of pairs with the same sentence-final words (i.e., keyword) but different preceding sentence frames: While the frame of one of the sentences was predictive of the keyword (high predictability: “She shortened the hem of her new skirt.

    • Neural correlates of individual differences in predicting ambiguous sounds comprehension level

      2022, NeuroImage
      Citation Excerpt :

      However, if they failed again in the N-1 length condition, the difficulty level would continue to decrease until they succeeded. We followed the same paradigm and procedure as in Erb et al. (2012). Participants were asked to discriminate different amplitude modulation (AM) rates.

    • A multisensory perspective onto primate pulvinar functions

      2021, Neuroscience and Biobehavioral Reviews
      Citation Excerpt :

      This is based on evidence of a contribution of the pulvinar to speech production and verbal memory (Hebb and Ojemann, 2013; Hugdahl and Wester, 1997). For example, the larger the left pulvinar in healthy subjects, the faster they are at understanding degraded speech and the lower are their speech perception thresholds (Erb et al., 2012). Anterior superior pulvinar lesions lead to aphasia, i.e. a deficit in speech production (Van and Borke, 1969), while medial dorsal pulvinar lesions lead to anomia, i.e. a deficit in the recall of the names of everyday life objects (Ojemann et al., 1968).

    • The vulnerability of working memory to distraction is rhythmic

      2020, Neuropsychologia
      Citation Excerpt :

      For the main study, to-be-remembered auditory materials were the German numbers from 1 to 9, spoken by a female voice (average number duration 0.6 s). To-be-ignored distractors were short German sentences (5–8 words, average duration: 2.1 s), taken from a German version of speech-in-noise sentences (Erb et al., 2012), spoken by the same female voice as the numbers. We used an adapted version of the Irrelevant-Speech Task (Colle and Welsh, 1976; Jones and Morris, 1992).

    View all citing articles on Scopus
    View full text