Prediction and constraint in audiovisual speech perception

Jonathan E Peelle; Mitchell S Sommers

doi:10.1016/j.cortex.2015.03.006

Prediction and constraint in audiovisual speech perception

Cortex. 2015 Jul:68:169-81. doi: 10.1016/j.cortex.2015.03.006. Epub 2015 Mar 20.

Authors

Jonathan E Peelle¹, Mitchell S Sommers²

Affiliations

¹ Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO, USA. Electronic address: peellej@ent.wustl.edu.
² Department of Psychology, Washington University in St. Louis, St. Louis, MO, USA.

Abstract

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms.

Keywords: Audiovisual speech; Multisensory integration; Predictive coding; Predictive timing; Speech perception.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Review

MeSH terms

Anticipation, Psychological / physiology
Auditory Perception / physiology*
Humans
Speech Perception / physiology*
Visual Perception / physiology*

Abstract

Publication types

MeSH terms

Grants and funding