Research report
Seeing and hearing others and oneself talk

https://doi.org/10.1016/j.cogbrainres.2004.11.006Get rights and content

Abstract

We studied the modification of auditory perception in three different conditions in twenty subjects. Observing other person's discordant articulatory gestures deteriorated identification of acoustic speech stimuli and modified the auditory percept, causing a strong McGurk effect. A similar effect was found when the subjects watched their own silent articulation in a mirror and acoustic stimuli were simultaneously presented to their ears. Interestingly, a smaller but significant effect was even obtained when the subjects just silently articulated the syllables without visual feedback. On the other hand, observing other person's or one's own concordant articulation and silently articulating a concordant syllable improved identification of the acoustic stimuli. The modification of auditory percepts caused by visual observation of speech and silently articulating it are both suggested to be due to the alteration of activity in the auditory cortex. Our findings support the idea of a close relationship between speech perception and production.

Introduction

In normal conversation, we both hear our companion's speech and see some of the corresponding articulatory gestures. Perceptually, the audiovisual nature of speech is manifested in two ways. First, concordant visual information improves intelligibility of auditory speech. This is especially evident when speech is presented with poor signal-to-noise ratio [27], [36], but seeing speech also improves identification of a difficult acoustic message when speech is presented without any noise [2]. Second, observing discordant articulatory gestures can change the auditory percept phonetically, as occurs in the McGurk effect [19]. When the acoustic syllable /pa/ is dubbed onto the visual presentation of articulatory gestures of /ka/, subjects typically hear /ta/ or /ka/ [35]. This change in perception occurs even when the acoustic syllables are identified perfectly when presented alone. In many experiments studying the McGurk effect, the proportion of correctly identified acoustic syllables, indicating the strength of the effect, is 10% or less [16], [35]. The change in perception is clearly auditory in nature and subjects seldom recognize the discrepancy between the auditory and visual components. However, the strength of the effect is quite individual and depends on the specific stimuli used in the experiments [35].

Observing articulatory movements of a talker has been shown to modulate activity of the human auditory (primary and/or non-primary) cortex, which accords with the perceptual “auditoriness” of the McGurk effect. Magnetoencephalographic (MEG) studies suggest that visual information about a speaking face has access to the auditory cortex within 200 ms from the stimulus onset during audiovisual speech perception [21], [33], [34]. Also, functional magnetic resonance imaging (fMRI) studies have shown that lip-reading may modify activity in the primary and secondary auditory cortices during audiovisual speech perception [5]. Thus, visual speech could influence auditory speech perception by modifying activation of auditory cortical areas. Evidence of the primary auditory cortex activation during silent lip-reading has been found in some but not in all recent imaging studies [3], [4], [28].

MEG studies have also indicated that a speaker's own utterances can modulate reactivity of the human auditory cortex [7], [13], [24], [25]. When subjects read aloud, the auditory cortex responses to the probe tones were small and delayed in comparison to those obtained when subjects read silently [25]. Curio and his coworkers [7] recorded MEG responses to self-produced vowels. The M100 response, peaking at 100 ms after the voice onset, was delayed in the left hemisphere relative to the right. No such asymmetry was observed when the utterances were taped and replayed to the subjects. Numminen and Curio [24] showed that even subjects' silent articulation can influence the processing of speech sounds in the auditory cortex. M100 response was damped in the left auditory cortex when the subjects silently produced the same vowel as the one presented to their ears. The effect was specific to the stimulus type and was not found when the utterance and presented vowel did not match.

The activation of auditory cortical areas during speech production has been demonstrated also in PET studies [12], [29], [30]. The rate at which the subjects whispered syllables correlated with the increase of cerebral blood flow at the left planum temporale and at the left posterior perisylvian cortex even when the auditory input was totally masked by white noise [30]. These areas contain secondary auditory areas and are known to be involved in perception of speech sounds [31]. Paus and coworkers [30] suggested that Broca's area and/or left primary face motor area modulate activity in the secondary auditory cortical areas.

In the present psychophysical study, we examined whether auditory perception of speech stimuli is modified by subjects' own silent articulation. Such a modification would be expected on the basis of the above-described neurophysiological studies. We also studied whether auditory percepts are modified when subjects silently articulate but also see their own articulation in a mirror. For comparison, our subjects also identified both concordant (acoustic and visual /pa/, acoustic and visual /ka/) and discordant (acoustic /pa/ dubbed onto /ka/ articulation) audiovisual utterances of another speaker. Seeing a concordant utterance was expected to improve the identification of the acoustic syllable. In contrast, seeing a discordant utterance was expected to decrease the identification of the acoustic syllable and produce a strong McGurk effect.

Section snippets

Methods

Twenty voluntary healthy subjects (native speakers of Finnish, 8 females, 21–33 years old, two left-handed) with normal or corrected-to-normal vision participated in the experiments. None of them were aware of the purpose of the experiment. They silently articulated or observed articulation of the Finnish syllables /ka/ or /pa/ when the acoustic /ka/ or /pa/ was simultaneously presented via earphones. The observed or articulated syllables were either concordant with the acoustic syllable

Results

Fig. 2 illustrates the proportions of correctly identified acoustic /pa/ syllables in different experimental conditions (white bars = discordant stimuli, grey bars = concordant stimuli). The horizontal grey line indicates that in the baseline condition the subjects correctly identified 68 ± 6% (mean ± SEM) of the syllables.

Visual inspection of Fig. 2 shows that seeing the articulatory gestures, either of another person (the audiovisual condition) or of oneself (the mirror condition), had a very

Discussion

As expected, observing other person's articulation improved identification of concordant acoustic syllables and deteriorated identification of discordant ones. Seeing oneself articulating in a mirror produced very similar effects. Moreover, a similar but slightly smaller effect was obtained when the subjects only silently articulated the stimuli: discordant utterances decreased and concordant ones improved identification of the acoustic syllables. Complicating the interpretation of the results,

Acknowledgments

This study is dedicated to the late Alvin Liberman. The authors thank Yurii Alexandrov and Riitta Hari for comments on the manuscript. The study was supported by the Academy of Finland.

References (40)

  • M. Sams et al.

    McGurk effect in Finnish syllables, isolated words, and words in sentences: effects of word meaning and sentence context

    Speech Commun.

    (1998)
  • K.E. Watkins et al.

    Seeing and hearing speech excites the motor system involved in speech production

    Neuropsychologia

    (2003)
  • P.K. Anokhin

    Biology and neurophysiology of the conditioned reflex and its role in adaptive behavior

    (1974)
  • P. Arnold et al.

    Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact

    Br. J. Psychol.

    (2001)
  • L. Bernstein et al.

    Visual speech perception without primary auditory cortex activation

    NeuroReport

    (2002)
  • G. Calvert et al.

    Activation of auditory cortex during silent lipreading

    Science

    (1997)
  • G.A. Calvert et al.

    Response amplification in sensory-specific cortices during crossmodal binding

    NeuroReport

    (1999)
  • G. Curio et al.

    Speaking modifies voice-evoked activity in the human auditory cortex

    Hum. Brain Mapp.

    (2000)
  • L. Fadiga et al.

    Speech listening specifically modulates the excitability of tongue muscles: a TMS study

    Eur. J. Neurosci.

    (2002)
  • P. Ferrari et al.

    Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex

    Eur. J. Neurosci.

    (2003)
  • Cited by (81)

    • Involvement of superior temporal areas in audiovisual and audiomotor speech integration

      2017, Neuroscience
      Citation Excerpt :

      Consistent with the McGurk effect (McGurk and MacDonald, 1976), we expected, when dubbing the acoustic syllable /pa/ onto the visual presentation of articulatory gestures of /ta/, subjects to typically misperceive the sound. We also expected a similar result when subjects themselves silently articulated an incongruent syllable (Sams et al., 2005; Mochida et al., 2013; Sato et al., 2013). Furthermore, we expected source localization of EEG to reveal STS/STG as the area discriminating between proper and improper perception, in support with the aforementioned imaging studies.

    • The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception

      2017, Brain and Language
      Citation Excerpt :

      Producing speech can modify speech perception online in specific ways and even cause a McGurk-MacDonald effect (McGurk & MacDonald, 1976). For example, participants hear an illusory speech percept when the sound they are actually producing is replaced with a speech sound other than the one they observe themselves producing (Mochida et al., 2013; Sams, Möttönen, & Sihvonen, 2005). Speech discrimination can be biased towards hearing certain sounds by stretching the face in a way that is similar to producing those sounds, or by applying inaudible air puffs to the skin to mimic production-associated aspiration of those sounds (Gick & Derrick, 2009; Ito, Tiede, & Ostry, 2009) .

    • Visual Hearing Aids: Artificial Visual Speech Stimuli for Audiovisual Speech Perception in Noise

      2023, Proceedings of the ACM Symposium on Virtual Reality Software and Technology, VRST
    View all citing articles on Scopus
    View full text