Skip to main content
Log in

Multistage audiovisual integration of speech: dissociating identification and detection

  • Research Article
  • Published:
Experimental Brain Research Aims and scope Submit manuscript

Abstract

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Andersen TS, Mamassian P (2008) Audiovisual integration of stimulus transients. Vision Res 48:2537–2544

    Article  PubMed  Google Scholar 

  • Andersen TS, Tiippana K, Laarni J, Kojo I, Sams M (2009) The role of visual spatial attention in audiovisual speech perception. Speech Commun 51:184–193

    Article  Google Scholar 

  • Arnal LH, Morillon B, Kell CA, Giraud AL (2009) Dual neural routing of visual facilitation in speech processing. J Neurosci 29:13445–13453

    Article  CAS  PubMed  Google Scholar 

  • Bernstein LE, Auer ET Jr, Takayanagi S (2004) Auditory speech detection in noise is enhanced by lipreading. Speech Commun 44:5–18

    Article  Google Scholar 

  • Bertelson P (1999) Ventriloquism: a case of cross-modal perceptual grouping. In: Aschersleben G, Bachmann T, Müsseler J (eds) Cognitive contributions to the perception of spatial and temporal events. Elsevier, Amsterdam

    Google Scholar 

  • Besle J, Fort A, Delpuech C, Giard MH (2004) Bimodal speech: early suppressive visual effects in human auditory cortex. Eur J Neurosci 20:2225–2234

    Article  PubMed  Google Scholar 

  • Bolognini N, Rasi F, Coccia M, Ladavas E (2005) Visual search improvement in hemianopic patients after audio-visual stimulation. Brain 128:2830–2842

    Article  PubMed  Google Scholar 

  • Brainard DH (1997) The psychophysics toolbox. Spat Vis 10:433–436

    Article  CAS  PubMed  Google Scholar 

  • Chandrasekaran C, Ghazanfar AA (2009) Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. J Neurophysiol 101:773–788

    Article  PubMed  Google Scholar 

  • Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar AA (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5:e1000436

    Article  PubMed  CAS  Google Scholar 

  • Colin C, Radeau M, Soquet A, Deltenre P (2004) Generalization of the generation of an MMN by illusory McGurk percepts: voiceless consonants. Clin Neurophysiol 115:1989–2000

    Article  CAS  PubMed  Google Scholar 

  • de Gelder B, Vroomen J (2000) Bimodal emotion perception: integration across separate modalities, cross-modal perceptual grouping or perception of multimodal events? Cogn Emot 14:321–324

    Article  Google Scholar 

  • de Gelder B, Pourtois G, Weiskrantz L (2002) Fear recognition in the voice is modulated by unconsciously recognized facial expressions but not by unconsciously recognized affective pictures. Proc Natl Acad Sci USA 99:4121–4126

    Article  PubMed  CAS  Google Scholar 

  • Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147:332–343

    Article  PubMed  Google Scholar 

  • Frassinetti F, Bolognini N, Bottari D, Bonora A, Ladavas E (2005) Audiovisual integration in patients with visual deficit. J Cogn Neurosci 17:1442–1452

    Article  PubMed  Google Scholar 

  • Girard M, Perronet F (1999) Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473–490

    Article  Google Scholar 

  • Gordon PC (1997) Coherence masking protection in speech sounds: the role of formant synchrony. Percept Psychophys 59:232–242

    CAS  PubMed  Google Scholar 

  • Grant KW, Seitz PF (2000) The use of visible speech cues for improving auditory detection of spoken sentences. J Acoust Soc Am 108:1197–1208

    Article  CAS  PubMed  Google Scholar 

  • Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92:67–99

    Article  PubMed  Google Scholar 

  • Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8:393–402

    Article  CAS  PubMed  Google Scholar 

  • Kim J, Davis C (2004) Investigating the audio-visual speech detection advantage. Speech Commun 44:19–30

    Article  Google Scholar 

  • Lakatos P, Chen CM, O’Connell MN, Mills A, Schroeder CE (2007) Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–292

    Article  CAS  PubMed  Google Scholar 

  • Leo F, Bolognini N, Passamonti C, Stein BE, Ladavas E (2008) Cross-modal localization in hemianopia: new insights on multisensory integration. Brain 131:855–865

    Article  PubMed  Google Scholar 

  • Lovelace CT, Stein BE, Wallace MT (2003) An irrelevant light enhances auditory detection in humans: a psychophysical analysis of multisensory integration in stimulus detection. Brain Res Cogn Brain Res 17:447–453

    Article  PubMed  Google Scholar 

  • McGrath M, Summerfield Q (1985) Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. J Acoust Soc Am 77:678–685

    Article  CAS  PubMed  Google Scholar 

  • McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748

    Article  CAS  PubMed  Google Scholar 

  • Miller LM, D’Esposito M (2005) Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci 25:5884–5893

    Article  CAS  PubMed  Google Scholar 

  • Möttönen R, Krause CM, Tiippana K, Sams M (2002) Processing of changes in visual speech in the human auditory cortex. Brain Res Cogn Brain Res 13:417–425

    Article  PubMed  Google Scholar 

  • Munhall KG, Gribble P, Sacco L, Ward M (1996) Temporal constraints on the McGurk effect. Percept Psychophys 58:351–362

    CAS  PubMed  Google Scholar 

  • Musacchia G, Sams M, Nicol T, Kraus N (2006) Seeing speech affects acoustic information processing in the human brainstem. Exp Brain Res 168:1–10

    Article  PubMed  Google Scholar 

  • Pare M, Richler RC, ten Hove M, Munhall KG (2003) Gaze behavior in audiovisual speech perception: the influence of ocular fixations on the McGurk effect. Percept Psychophys 65:553–567

    Article  PubMed  Google Scholar 

  • Pelli DG (1997) The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis 10:437–442

    Article  CAS  PubMed  Google Scholar 

  • Pilling M (2009) Auditory event-related potentials (ERPs) in audiovisual speech perception. J Speech Lang Hear Res 52:1073–1081

    Article  PubMed  Google Scholar 

  • Poeppel D (2003) The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun 41:245–255

    Google Scholar 

  • Poeppel D, Idsardi WJ, van Wassenhove V (2008) Speech perception at the interface of neurobiology and linguistics. Philos Trans R Soc Lond B Biol Sci 363:1071–1086

    Article  PubMed  Google Scholar 

  • Rauschecker JP, Scott SK (2009) Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci 12:718–724

    Article  CAS  PubMed  Google Scholar 

  • Remez RE, Rubin PE, Pisoni DB, Carrell TD (1981) Speech perception without traditional speech cues. Science 212:947–949

    Article  CAS  PubMed  Google Scholar 

  • Sams M, Aulanko R, Hamalainen M, Hari R, Lounasmaa OV, Lu ST, Simola J (1991) Seeing speech: visual information from lip movements modifies activity in the human auditory cortex. Neurosci Lett 127:141–145

    Article  CAS  PubMed  Google Scholar 

  • Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106–113

    Article  PubMed  Google Scholar 

  • Schwartz JL, Berthommier F, Savariaux C (2004) Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition 93:B69–B78

    Article  PubMed  Google Scholar 

  • Soto-Faraco S, Alsius A (2009) Deconstructing the McGurk–MacDonald illusion. J Exp Psychol Hum Percept Perform 35:580–587

    Article  PubMed  Google Scholar 

  • Stekelenburg JJ, Vroomen J (2007) Neural correlates of multisensory integration of ecologically valid audiovisual events. J Cogn Neurosci 19:1964–1973

    Article  PubMed  Google Scholar 

  • Sumby WH, Pollack I (1954) Visual contributions to speech intelligibility in noise. J Acoust Soc Am 28:212–215

    Article  Google Scholar 

  • Tiippana K, Andersen TS, Sams M (2004) Visual attention modulates audiovisual speech perception. Eur J Cogn Psychol 16:457–472

    Article  Google Scholar 

  • Tuomainen J, Andersen TS, Tiippana K, Sams M (2005) Audio-visual speech perception is special. Cognition 96:B13–B22

    Article  PubMed  Google Scholar 

  • van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102:1181–1186

    Article  PubMed  CAS  Google Scholar 

  • van Wassenhove V, Grant KW, Poeppel D (2007) Temporal window of integration in auditory-visual speech perception. Neuropsychologia 45:598–607

    Article  PubMed  Google Scholar 

  • Vatakis A, Ghazanfar AA, Spence C (2008) Facilitation of multisensory integration by the “unity effect” reveals that speech is special. J Vis 8:14:1–11

    Google Scholar 

  • Vroomen J, Baart M (2009) Phonetic recalibration only occurs in speech mode. Cognition 110:254–259

    Article  PubMed  Google Scholar 

  • Vroomen J, Stekelenburg JJ (2010) Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli. J Cogn Neurosci 22:1583–1596

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kasper Eskelund.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eskelund, K., Tuomainen, J. & Andersen, T.S. Multistage audiovisual integration of speech: dissociating identification and detection. Exp Brain Res 208, 447–457 (2011). https://doi.org/10.1007/s00221-010-2495-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00221-010-2495-9

Keywords

Navigation