Elsevier

NeuroImage

Volume 186, 1 February 2019, Pages 33-42
NeuroImage

Late cortical tracking of ignored speech facilitates neural selectivity in acoustically challenging conditions

https://doi.org/10.1016/j.neuroimage.2018.10.057Get rights and content

Highlights

  • Continuously varying signal-to-noise ratio modulates the selective neural response to continuous speech.

  • Beyond the enhanced neural tracking of attended speech, we also observed a clean representation of ignored speech.

  • The late cortical representation of ignored speech reflects enhanced top-down selection in areas beyond auditory cortex.

Abstract

Listening requires selective neural processing of the incoming sound mixture, which in humans is borne out by a surprisingly clean representation of attended-only speech in auditory cortex. How this neural selectivity is achieved even at negative signal-to-noise ratios (SNR) remains unclear. We show that, under such conditions, a late cortical representation (i.e., neural tracking) of the ignored acoustic signal is key to successful separation of attended and distracting talkers (i.e., neural selectivity). We recorded and modeled the electroencephalographic response of 18 participants who attended to one of two simultaneously presented stories, while the SNR between the two talkers varied dynamically between +6 and −6 dB. The neural tracking showed an increasing early-to-late attention-biased selectivity. Importantly, acoustically dominant (i.e., louder) ignored talkers were tracked neurally by late involvement of fronto-parietal regions, which contributed to enhanced neural selectivity. This neural selectivity, by way of representing the ignored talker, poses a mechanistic neural account of attention under real-life acoustic conditions.

Introduction

Human listeners comprehend speech surprisingly well in the presence of distracting sound sources (Cherry, 1953). The ubiquitous question is how competing acoustic events capture bottom-up attention (e.g., by being dominant, that is, louder than the background), and how in turn top-down selective attention can overcome this dominance (e.g., listening to a certain talker against varying levels of competing talkers or noise; Kaya and Elhilali, 2017).

Auditory selective neural processing has been mainly attributed to auditory cortex regions. It is by now well-established that the auditory cortical system selectively represents the (spectro-)temporal envelope of attended, but not ignored speech (i.e., neural phase-locking; Magneto-encephalography: Ding and Simon, 2012; Electroencephalography: Kerlin et al., 2010; Power et al., 2012; Horton et al., 2013; O'Sullivan et al., 2014). Accordingly, auditory cortical responses allow for a reconstruction of the spectrogram of speech and to detect the attended talker (e.g., Mesgarani and Chang, 2012; Zion Golumbic et al., 2013). In sum, selective neural processing in auditory cortices establishes an isolated and distraction-invariant spectro-temporal representation of the attended talker.

However, as has been shown, degradations of the acoustic signals attenuate the neural phase-locking to speech. Experimental degradations have included artificial transformations of temporal fine structure (Ding et al., 2014; Kong et al., 2015), or rhythmicity (Kayser et al., 2015), reverberation (Fuglsang et al., 2017) or decreased signal-to-noise ratio (SNR; Kong et al., 2014; Ding and Simon, 2013; Giordano et al., 2017). Not least, neural selection of speech appears weakened in people with hearing loss (Petersen et al., 2016). In sum, those studies suggest that the strength of neural phase-locking indicates behavioral performance such as speech comprehension.

Additionally, higher order non-auditory neural mechanisms facilitate speech comprehension as well. The supra-modal, fronto-parietal attention network is a candidate to be involved in top-down selective neural processing during demanding listening tasks (Woolgar et al., 2016). Beyond the phase-locking in lower frequency bands (i.e., ∼1–8 Hz; Wang et al., 2018, Pomper and Chait, 2017), top-down selective neural processing has also been associated with changes in the power of induced alpha-oscillations (i.e., ∼8–12 Hz; Obleser et al., 2012; Kayser et al., 2015, Wöstmann et al., 2016). Specifically, increased parietal alpha-power is related to enhanced suppression of the distracting input (Wöstmann et al., 2017). This reflects that, besides the neural spectro-temporal enhancement of the attended talker, a crucial role in top-down neural selective processing was attributed to the suppression of the ignored talker.

Neural signatures of suppression can be two-fold. First, suppression can attenuate the neural response to an ignored talker compared to an attended talker, like it was found in neural phase-locking from latencies of around 100 ms (Ding and Simon, 2012; Wang et al., 2018). Second, active suppression can add or increase components in the neural response to the ignored talker, given that the response is dissociable from the response to the attended talker (e.g.; a louder ignored talker evoking a stronger neural response anti-polar to the response to a louder attended talker). Here we asked, how the components of the phase-locked neural response are affected by selective attention under varying signal-to-noise ratio (SNR).

The phase-locked neural response to broad-band continuous speech can be obtained from EEG by estimating the (delayed) covariance of the temporal speech envelope and the EEG, which results in a linear model of the cortical response; a temporal response function (TRF; Lalor et al., 2009; Crosse et al., 2016). Analogous to the event-related potential (ERP), the components of the TRF can be interpreted as reflecting a sequence of neural processing stages where later components reflect higher order processes within the hierarchy of the auditory system (Davis and Johnsrude, 2003; Picton, 2013; Di Liberto et al., 2015).

Here, we use a listening scenario in which two concurrent talkers undergo continuous SNR variation. Our results demonstrate differential effects of bottom-up acoustics vs. top-down selective neural processing on earlier vs. later neural response components, respectively. Source localization reveals that not only auditory cortex regions are involved in the selective neural processing of concurrent speech, but that a fronto-parietal attention network contributes to selective neural processing through late suppression of the ignored talker.

Section snippets

Participants

Eighteen native speakers of German (9 females) were invited from the participant database of the Department of Psychology, University of Lübeck, Germany. We recruited participants who were between 23 and 68 years old at the time of testing (mean: 49, SD: 17), to allow valid conclusions from such a challenging listening scenario to middle-aged and older adults. All reported normal hearing and no histories of neurological disorders. Incomplete data due to recording hardware failure were obtained

Results

We asked participants to listen to one of two simultaneously presented audiobooks under varying signal-to-noise ratio (Fig. 1A and B; −6 to +6 dB SNR). After each of twelve five-minute blocks, subjects were asked to rate the difficulty of listening to the to-be-attended talker on a color bar ranging from red (difficult = 1) to green (easy = 10). The average difficulty ratings strongly varied between subjects (mean: 5.2, SD: 2.2, range: 2.3–8.9). No difference in difficulty ratings for listening

Discussion

In the present study, human listeners attended to one of two concurrent talkers under continuously varying signal-to-noise ratio (SNR). We asked to what extent a late cortical representation (i.e., neural tracking) of the ignored acoustic signal is key to the successful separation of to-be-attended and distracting talkers (i.e., neural selectivity) under such demanding listening conditions.

Forward modeling of the EEG response revealed neural responses to the temporal envelopes of individual

Acknowledgments

Research was supported by the European Research Council (ERC-CoG-2014 646696 to JO) and the Oticon Foundation (NEURO-CHAT).

References (59)

  • E.M. Zion Golumbic et al.

    Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party

    Neuron

    (2013)
  • R.A. Bentler et al.

    Hearing-in-Noise: comparison of listeners with normal and (aided) impaired hearing

    J. Am. Acad. Audiol.

    (2004)
  • W. Biesmans et al.

    Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario

    IEEE Trans. Neural Syst. Rehabil. Eng.

    (2016)
  • C. Brodbeck et al.

    Transformation from auditory to linguistic representations across auditory cortex is rapid and attention dependent for continuous speech

    BioRxiv

    (2018)
  • M.P. Broderick et al.

    Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech

    Curr. Biol.

    (2018)
  • E.C. Cherry

    Some experiments on the recognition of speech, with one and with two ears

    J. Acoust. Soc. Am.

    (1953)
  • T. Chi et al.

    Multiresolution spectrotemporal analysis of complex sounds

    J. Acoust. Soc. Am.

    (2005)
  • E. Combrisson et al.

    Exceeding chance level by chance: the caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy

    J. Neurosci. Methods

    (2015)
  • M.J. Crosse et al.

    The multivariate temporal response function (mTRF) toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli

    Front. Hum. Neurosci.

    (2016)
  • M.H. Davis et al.
    (2003)
  • N. Ding et al.

    Neural coding of continuous speech in auditory cortex during monaural and dichotic listening

    J. Neurophysiol.

    (2012)
  • N. Ding et al.

    Adaptive temporal encoding leads to a background-insensitive cortical representation of speech

    J. Neurosci.

    (2013)
  • W. Van Drongelen et al.

    A spatial filtering technique to detect and localize multiple sources in the brain

    Brain Topogr.

    (1994)
  • B. Efron

    Bootstrap methods: another look at the jackknife

    Ann. Stat.

    (1979)
  • L. Fiedler et al.

    Single-channel in-ear-EEG detects the focus of auditory attention to concurrent tone streams and mixed speech

    J. Neural. Eng.

    (2017)
  • B.L. Giordano et al.

    Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Elife

    (2017)
  • L.S. Hamilton et al.

    The revolution will not be controlled: natural stimuli in speech neuroscience

    Lang. Cogn. Neurosci.

    (2018)
  • I. Hertrich et al.

    Magnetic brain activity phase-locked to the envelope, the syllable onsets, and the fundamental frequency of a perceived speech signal

    Psychophysiology

    (2012)
  • A.E. Hoerl et al.

    ridge regression: biased estimation for nonorthogonal problems

    Technometrics

    (1970)
  • Cited by (65)

    View all citing articles on Scopus
    View full text