Elsevier

Cognition

Volume 96, Issue 1, May 2005, Pages B13-B22
Cognition

Brief article
Audio–visual speech perception is special

https://doi.org/10.1016/j.cognition.2004.10.004Get rights and content

Abstract

In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio–visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and visual stimuli. When the same subjects learned to perceive the same auditory stimuli as speech, they integrated the auditory and visual stimuli in a similar manner as natural speech. These results demonstrate the existence of a multisensory speech-specific mode of perception.

Section snippets

Subjects

Ten students of the Helsinki University of Technology were studied. All reported normal hearing and normal or corrected-to-normal vision. None of the subjects had earlier experience with SWS stimuli. Two subjects were excluded from the subject pool because they reported perceiving the SWS stimuli as speech before being instructed about their speech-like nature.

Stimuli

Four auditory stimuli (natural /omso/ and /onso/ and their sine wave replicas) and digitized video clips of a male face articulating

Experiment 2

In Experiment 1, the different tasks were always performed in the same order, so that the non-speech mode always preceded speech mode for the SWS stimuli. The reason for this was that once the subject “enters speech mode” it is impossible to hear the SWS stimuli as non-speech. However, this procedure might have created a learning effect so that subjects might have become more used to SWS stimuli. Then at least part of the large integration effect observed with the incongruent stimuli could have

Discussion

Our results demonstrate that acoustic and visual speech were integrated strongly only when the perceiver interpreted the acoustic stimuli as speech. If the SWS stimuli had always been processed in the same way, the influence of visual speech should have been the same in both speech and non-speech modes. This result does not depend on the amount of practise with listening to SWS stimuli as confirmed by the results obtained in Experiment 2.

We suggest that when SWS stimuli were perceived as

Acknowledgements

The research of T.S.A. was supported by the European Union Research Training Network “Multi-modal Human–Computer Interaction”. Financial support from the Academy of Finland to the Research Centre for Computational Science and Engineering and to MS is also acknowledged. We thank Ms Reetta Korhonen for help in data collection and Riitta Hari (Low Temperature Lab, HUT) for valuable comments on the manuscript.

References (27)

  • K.W. Grant et al.

    The use of visible speech cues for improving auditory detection of spoken sentences

    Journal of the Acoustical Society of America

    (2000)
  • K. Green et al.

    Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect

    Perception and Psychophysics

    (1991)
  • J.K. Hietanen et al.

    Does audiovisual speech perception use information about facial configuration?

    European Journal of Cognitive Psychology

    (2001)
  • Cited by (0)

    View full text