Audiovisual integration of emotional signals in voice and face: An event-related fMRI study
Introduction
Taking part in social interactions requires us to integrate a variety of different inputs from several sense organs into a single percept of the situation we are dealing with. Inability to perceive and understand non-verbal emotional signals (i.e. speech melody, facial expression, gestures) within this process will often result in impaired communication.
Over the past years a plethora of neuroimaging studies have addressed the processing of faces (e.g. Haxby et al., 1994, Kanwisher et al., 1997, Sergent et al., 1992) and emotional facial expressions (e.g. Blair et al., 1999, Breiter et al., 1996, Morris et al., 1996, Phillips et al., 1997; reviewed in Posamentier and Abdi, 2003) as well as the processing of voices (e.g. Belin et al., 2000, Belizaire et al., 2007, Fecteau et al., 2004, Giraud et al., 2004, Kriegstein and Giraud, 2004) and emotional prosody (e.g. Buchanan et al., 2000, Ethofer et al., 2006b, Ethofer et al., 2006c, George et al., 1996, Grandjean et al., 2005, Imaizumi et al., 1997, Kotz et al., 2003, Mitchell et al., 2003, Wildgruber et al., 2002, Wildgruber et al., 2004, Wildgruber et al., 2005; reviewed in Wildgruber et al., 2006) identifying neural networks subserving the perception of visual and auditory non-verbal emotional signals. Yet, to date only very few neuroimaging studies on audiovisual integration of non-verbal emotional communication are available (Dolan et al., 2001, Ethofer et al., 2006a, Ethofer et al., 2006d, Pourtois et al., 2005). Behavioural studies demonstrated that congruence between facial expression and prosody facilitates reactions to stimuli carrying emotional information (De Gelder and Vroomen, 2000, Dolan et al., 2001, Massaro and Egan, 1996). This parallels findings from audiovisual integration of non-emotional information which indicate shortened response latencies and heightened perceptual sensitivity upon audiovisual stimulation (Miller, 1982, Schroger and Widmann, 1998). Moreover, affective signals received within one sensory channel can affect information processing in another. For instance, the perception of a facial expression can be altered by accompanying emotional prosody (Ethofer et al., 2006a, Massaro and Egan, 1996). Since these crossmodal biases occur mandatorily and irrespective of attention (De Gelder and Vroomen, 2000, Ethofer et al., 2006a, Vroomen et al., 2001) one might assume that the audiovisual integration of non-verbal affective information is an automatic process. This assumption gains further support in the results of electrophysiological experiments providing evidence for multisensory crosstalk during an early perceptual stage about 110–220 ms after stimulus presentation (De Gelder et al., 1999, Pourtois et al., 2000, Pourtois et al., 2002).
Neuroimaging data on audiovisual integration of non-verbal emotional information highlights a stronger activation in left middle temporal gyrus (MTG) (Pourtois et al., 2005) and left posterior superior temporal sulcus (pSTS) (Ethofer et al., 2006d) during audiovisual stimulation as compared to either unimodal stimulation. These findings of activations adjacent to the superior temporal sulcus (STS) correspond well with reports on enhanced responses in pSTS during audiovisual presentation of animals (Beauchamp et al., 2004b) and tools (Beauchamp et al., 2004a, Beauchamp et al., 2004b), speech (Calvert et al., 2000, van Atteveldt et al., 2004, Wright et al., 2003) and letters (van Atteveldt et al., 2004).
Moreover, data from several electrophysiological and functional neuroimaging studies (Ghazanfar et al., 2005, Giard and Peronnet, 1999, von Kriegstein and Giraud, 2006) document early audiovisual integration processes within modality-specific cortices. To date it remains controversial to which extent these audiovisual response modulations arise from feedback connections of the STS with the respective unimodal auditory and visual cortices or from direct crosstalk between auditory and visual associative cortices.
So far, none of the neuroimaging studies on audiovisual integration of emotional prosody and facial expression employed dynamic visual stimuli. Also, stimulus material in these studies portrayed only two exemplary emotions (happy, fearful) and did not contain emotionally neutral stimuli.
In the present study, we used functional magnetic resonance imaging (fMRI) to delineate the audiovisual integration site for dynamic non-verbal emotional information. Participants were scanned while they performed a classification task on audiovisual (AV), auditory (A) or visual (V) presentation of people speaking single words expressing different emotional states in voice and face. Dynamic stimulation in combination with a broad variety of emotions was chosen in order to approximate real life conditions of social communication. The classification task was applied to ascertain constant attention to the stimuli and to acquire a behavioural measure of the audiovisual integration effect. In a prestudy, stimuli used in the fMRI experiment were tested for a relevant behavioural integration effect during the bimodal condition.
This experiment was designed to investigate audiovisual integration of nonverbal communication signals in the context of an explicit emotional categorization task. This, then embraces the class of stimuli with emotionally neutral prosody and facial expression as an “emotional” category. In other words, the main focus of the present study lies on audiovisual integration of nonverbal communication under the top-down influence of an emotional classification task, rather than on bottom-up (stimulus driven) effects of emotional content on the audiovisual integration of non-verbal communication.
For the identification of brain regions contributing to multimodal integration of emotional signals, responses during bimodal stimulation were compared to both unimodal conditions. Areas characterized by stronger responses to audiovisual than to either unimodal stimulation were considered candidate regions for the integration process.
We expected that in such a region, associated with audiovisual integration, the gain in classification accuracy under audiovisual stimulation as compared to either unimodal stimulation might be paralleled by increased cerebral activation.
Moreover, Macaluso and colleagues (2000) demonstrated that a perceptual gain during bimodal as compared to unimodal stimulation was paralleled by enhanced effective connectivity between associative sensory cortices and supramodal integration sites for vision and touch.
Accordingly, we expected enhanced effective connectivity between putative audiovisual integration areas and voice-sensitive (Belin et al., 2000) as well as face-sensitive (Haxby et al., 1994, Kanwisher et al., 1997, Sergent et al., 1992) areas during the audiovisual condition as compared to either unimodal condition as a possible correlate of the perceptional gain in congruent audiovisual integration.
A final point of interest was, if the putative integration sites exhibit a different sensitivity to stimuli with emotional prosody/facial expression as compared to stimuli with neutral prosody/facial expression.
In summary, our fMRI study was designed to delineate brain areas specifically involved in the process of audiovisual integration of dynamic emotional signals from voice and face on the basis that they exhibit stronger responses to audiovisual than to either unimodal stimulation.
Further expectations on the response pattern of regions subserving audiovisual integration of nonverbal emotional communication were:
- a)
increase of cerebral responses in correlation with gain of classification accuracy under audiovisual stimulation as compared to either unimodal stimulation;
- b)
enhanced effective connectivity with voice-sensitive and face-sensitive cortices during bimodal stimulation as compared to either unimodal stimulation.
In order to gain valuable add-on information about a possible differential sensitivity of the integration sites to the emotionality of stimulus content, we compared the responses to stimuli with emotional non-verbal content with those to stimuli with neutral non-verbal content.
Based on recent neuroimaging studies on audiovisual integration (Beauchamp et al., 2004a, Beauchamp et al., 2004b, Calvert et al., 2000, Ethofer et al., 2006d, van Atteveldt et al., 2004, Wright et al., 2003) we hypothesised that a region featuring the aforementioned characteristics might be located in the pSTS.
Section snippets
Subjects
Thirty right-handed subjects (15 male, 15 female; mean age 23, S.D. 3 years) participated in the behavioural prestudy. Twenty-four right-handed volunteers (12 male, 12 female, mean age 26, S.D. 5 years) who did not take part in the behavioural experiment were included in the fMRI experiment. All of the participants were native speakers of German language and had neither history of neurological or psychiatric illness nor of substance abuse or impaired vision or hearing. None of the participants
Behavioural data
Mean unbiased hit rates (Hu) in the classification task (± standard error of the mean, S.E.M.) were 0.35 ± 0.02 (A), 0.58 ± 0.02 (V) and 0.76 ± 0.02 (AV) corresponding to 56% (A), 75% (V) and 86% (AV) correct classifications. A two-way ANOVA with experimental condition (A, V, AV) and stimulus type (emotional vs. neutral) as within-subject factors indicated significant differences in the data regarding experimental condition (F(43.2,1.8) = 94,0, P < 0.001) and stimulus type (F(24,1) = 24,0, P < 0.001) while
Behavioural integration
At the behavioural level we found a perceptual gain in the emotional classification task when contrasting the bimodal condition to either of the unimodal conditions. This is in good keeping with results from a series of experiments conducted by De Gelder and Vroomen (2000) which demonstrate that emotional facial expressions are more easily classified if accompanied by congruent emotional prosody. One of the most noticeable differences between the present study and the one performed by de Gelder
Conclusion
Bilateral pSTG and right thalamus reacted stronger to audiovisual than to visual and auditory stimuli. Of these areas, the left pSTG conformed best to the further a priori assumed characteristics of an integrative brain region for non-verbal emotional information, namely a positive linear relationship of the BOLD response under audiovisual stimulation with the behaviourally documented perceptual gain during the bimodal condition and an enhanced effective connectivity with auditory as well as
Acknowledgments
This study was supported by the Deutsche Forschungsgemeinschaft (SFB 550/B10).
References (73)
- et al.
Integration of auditory and visual information about objects in superior temporal sulcus
Neuron
(2004) - et al.
Response and habituation of the human amygdala during visual processing of facial expression
Neuron
(1996) - et al.
Recognition of emotional prosody and verbal components of spoken language: an fMRI study
Brain Res. Cogn. Brain Res.
(2000) - et al.
Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex
Curr. Biol.
(2000) - et al.
The preparation and execution of self-initiated and externally-triggered movement: a study of event-related fMRI
NeuroImage
(2002) - et al.
From preparation to online control: reappraisal of neural circuitry mediating internally generated and externally guided actions
NeuroImage
(2006) - et al.
Cerebral pathways in processing of affective prosody: a dynamic causal modeling study
NeuroImage
(2006) - et al.
Investigating audiovisual integration of emotional signals in the human brain
Prog. Brain Res.
(2006) - et al.
Is voice processing species-specific in human auditory cortex? An fMRI study
NeuroImage
(2004) - et al.
Psychophysiological and modulatory interactions in neuroimaging
NeuroImage
(1997)
Classical and Bayesian inference in neuroimaging: applications
NeuroImage
Modeling regional and psychophysiologic interactions in fMRI: the importance of hemodynamic deconvolution
NeuroImage
Neocortical modulation of the amygdala response to fearful stimuli
Biol. Psychiatry
Phonetic perception and the temporal cortex
NeuroImage
The neurophysics of consciousness
Brain Res. Brain Res. Rev.
On the lateralization of emotional prosody: an event-related functional MR investigation
Brain Lang.
Distinct functional substrates along the right superior temporal sulcus for the processing of voices
NeuroImage
Task instructions modulate neural responses to fearful facial expressions
Biol. Psychiatry
Divided attention: evidence for coactivation with redundant signals
Cogn. Psychol.
The neural response to emotional prosody, as revealed by functional magnetic resonance imaging
Neuropsychologia
Valid conjunction inference with the minimum statistic
NeuroImage
The assessment and analysis of handedness: the Edinburgh inventory
Neuropsychologia
Facial expressions modulate the time course of long latency auditory brain potentials
Brain Res. Cogn. Brain Res.
Perception of facial expressions and voices and of their combination in the human brain
Cortex
Action sets and decisions in the medial prefrontal cortex
Trends Cogn. Sci.
Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey
Brain Res.
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain
NeuroImage
Integration of letters and speech sounds in the human brain
Neuron
Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex
NeuroImage
Identification of emotional intonation evaluated by fMRI
NeuroImage
Cerebral processing of linguistic and emotional prosody: fMRI studies
Prog. Brain Res.
Unraveling multisensory integration: patchy organization within human STS multisensory cortex
Nat. Neurosci.
Voice-selective areas in human auditory cortex
Nature
Cerebral response to ‘voiceness’: a functional magnetic resonance imaging study
NeuroReport
Dissociable neural responses to facial expressions of sadness and anger
Brain
Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space
J. Comput. Assist. Tomogr.
Cited by (231)
Brain responses in aggression-prone individuals: A systematic review and meta-analysis of functional magnetic resonance imaging (fMRI) studies of anger- and aggression-eliciting tasks
2022, Progress in Neuro-Psychopharmacology and Biological PsychiatryNeural Basis of Impaired Emotion Recognition in Adult Attention-Deficit/Hyperactivity Disorder
2022, Biological Psychiatry: Cognitive Neuroscience and NeuroimagingCitation Excerpt :The only ROIs that showed a significant difference with lower activation in patients with ADHD, however, were the right posterior STG/MTG and the right posterior thalamus. The posterior STG/MTG has consistently been reported to be involved during audiovisual integration of facial and vocal information (19,35,45,46). Furthermore, audiovisual percepts, such as the McGurk illusion, depend on the activation level in this area (47) and can be disrupted by means of transcranial magnetic stimulation (48).
Weighted RSA: An Improved Framework on the Perception of Audio-visual Affective Speech in Left Insula and Superior Temporal Gyrus
2021, NeuroscienceCitation Excerpt :Many studies have used combined visual and auditory emotion information to further explore the mechanism of cross-modality representation of emotions. For the audio-visual integration of emotion, a large number of studies showed that the superior temporal gyrus and superior temporal sulcus played an important role in the integration and control of audio-visual emotion information (Kreifelts et al., 2007; Robins et al., 2009; Park et al., 2010; Müller et al., 2012; Hagan et al., 2013). However, little research has been conducted on emotion perception in audio-visual modality.