Research reportCross-modal interactions between human faces and voices involved in person recognition
Introduction
Human social interactions are shaped by our ability to identify individuals, a process to which face and voice recognition contributes both separately and jointly. Much research has been devoted to unimodal face recognition. Neuroimaging studies have shown that human faces are mainly processed by temporo-occipital regions of the brain with a right hemispheric dominance, and in particular in the fusiform gyrus (the so-called Fusiform Face Area – FFA, Sergent et al., 1992, Kanwisher et al., 1997, Rhodes et al., 2004). Studies with brain-damaged patients have revealed a selective impairment of face recognition, called prosopagnosia, associated with lesions of the right fusiform gyrus (De Renzi et al., 1994, Takahashi et al., 1995). Fewer studies have focused on voice recognition. Voice recognition takes place bilaterally in the superior temporal cortex, with a particular recruitment of the anterior part of the right superior temporal sulcus (STS, Belin et al., 2000, Belin et al., 2002, Von Kriegstein et al., 2003). Phonagnosia, the selective impairment of voice recognition, is predominantly associated with lesions of the right hemisphere (Neuner and Schweinberger, 2000), and Van Lancker et al. (1989) showed that an impairment in voice recognition was significantly correlated with right parietal lobe damages.
Although neuroanatomically segregated, faces and voices interact, not only at a perceptual level (Calvert et al., 1999, Olson et al., 2002, Sekiyama et al., 2003), but also during person recognition (Burton et al., 1990, Ellis et al., 1997). Such integration skills emerge very early in life (Bahrick et al., 2005), but only few studies investigated cross-modal interactions between faces and voices in person identification. Schweinberger et al. (2007) showed that voice recognition was easier when simultaneously presented with an associated face, whereas it was hampered when presented with a face that did not share the same identity. This demonstrates that listeners cannot ignore a face as soon as it is presented in time synchrony with a voice. This effect was not observed with unfamiliar voices, which suggests that audio–visual integration in person recognition depends on multimodal representation of people, established through experience (for a review, see Campanella and Belin, 2007). However, the brain processes by which voices and faces, which are processed by distinct cerebral regions, are integrated into a unique and coherent representation of a person are still largely unknown.
Here we performed a functional magnetic resonance imaging (fMRI) study to investigate the cerebral correlates of voice–face interactions in a recognition task. We expected that voices alone would elicit a bilateral activation of the temporal cortex and in particular the anterior part of the right STS, and that faces alone would elicit an activation of the right FFA, i.e., the classical areas dedicated to the processing of voices and faces respectively. On the basis of neuroimaging studies showing an involvement of unimodal and multimodal areas in cross-modal binding (Wada et al., 2003, Bushara et al., 2003), we also predicted that bimodal stimulations would activate both unimodal visual and auditory areas (as previously observed for auditory speech perception, Calvert et al., 1999), and multimodal areas such as the anterior part of the temporal lobes, the hippocampus (Brown and Aggleton, 2001, Kirwan and Stark, 2004) and the parietal cortex (Saito et al., 2005, Bernstein et al., 2008). The anterior part of the temporal regions are known to be involved in the cross-modal processing of personal identity (Gorno-Tempini et al., 1998, Gainotti et al., 2003, Tsukiura et al., 2005, Calder and Young, 2005), and the hippocampus is known to be involved in the conjunction of features (Brown and Aggleton, 2001), in particular in the associative processes devoted to the recognition of faces (Kirwan and Stark, 2004). We also expected an activation of the left parietal cortex as (1) we have already observed its specific activation in a Positron Emission Tomography (PET) study investigating the associative processes between faces and written names (Campanella et al., 2001), and (2) the left parietal cortex is known to be a part of the heteromodal associative cortex (Niznikiewicz et al., 2000, Booth et al., 2002, Booth et al., 2003) involved in the binding of visual and auditory speech (Saito et al., 2005).
Section snippets
Participants
Fourteen healthy volunteers [7 females, mean age: 23.5, standard deviation (SD): 3.99] participated in the fMRI study. All were right handed, native French speakers, had normal vision and audition, and gave their written informed consent. The experimental protocol was approved by the Biomedical Ethical Committee of the Catholic University of Louvain.
Stimuli
The stimuli consisted of four associations. Each association formed a schematic person (or identity) and was composed of a female face (black and
Behavioral data
We observed significant latencies differences between V, F and VF [F(2,26) = 31.27, p < .001]. Subsequent paired t-tests revealed that (1) voices were identified more slowly than faces [t(13) = 6.47, p < .001] or voice–face associations [t(13) = 5.55, p < .001], although these two latter conditions did not significantly differ [t(13) = 1.36, ns, Table 2 and Fig. 2].
The percentages of correct responses showed the same patterns of results. Significant differences between V, F and VF [F(2,26) = 22.93, p < .001]
Discussion
The aim of the present study was to investigate the cerebral correlates of voice–face interactions involved in person recognition. By using a subtraction method between bimodal and unimodal conditions, we isolated the cerebral regions sustaining face–voice integration. We observed the activation of a cortical network including unimodal visual and auditory regions along with multimodal regions such as the hippocampus and the left angular gyrus.
Acknowledgments
This work was supported by grant No. 1.5.130.05F from the National Fund for Scientific Research (Belgium), and grant No. 01/06-267 from the Communauté Française de Belgique – Actions de Recherche Concertées (Belgium).
Frédéric Joassin and Pierre Maurage are Postdoctoral Researchers, and Salvatore Campanella and Mauro Pesenti Research Associates at the National Fund for Scientific Research (Belgium). We thank the Radiodiagnosis Unit at the Cliniques St. Luc (Brussels) for its support, and Ms. Sue
References (70)
- et al.
Human temporal-lobe response to vocal sounds
Brain Research Cognitive Brain Research
(2002) - et al.
Thinking the voice: Neural correlates of voice perception
Trends in Cognitive Sciences
(2004) - et al.
Spatiotemporal dynamics of audiovisual speech processing
NeuroImage
(2008) - et al.
Functional anatomy of intra- and cross-modal lexical tasks
NeuroImage
(2002) - et al.
Detection of audio–visual integration sites in humans by application of electrophysiological criteria to the BOLD effect
NeuroImage
(2001) - et al.
Associations of the distinct visual representations of faces and names: A PET activation study
NeuroImage
(2001) - et al.
Integrating face and voice in person perception
Trends in Cognitive Sciences
(2007) Time-locked multiregional retroactivation: A system-level proposal for neural substrates of recall and recognition
Cognition
(1989)- et al.
Prosopagnosia can be dissociated with damage confined to the right hemisphere – An MRI and PET study and a review of the literature
Neuropsychologia
(1994) - et al.
Psychophysiological and modulatory interactions in neuroimaging
NeuroImage
(1997)
When audition alters vision: An event-related potential study of the cross-modal interactions between faces and voices
Neuroscience Letters
Neuroanatomic overlap of working memory and spatial attention networks: A functional MRI comparison within subjects
NeuroImage
Cognitive neuroscience and the study of memory
Neuron
Convergence of unimodal and polymodal sensory input to the entorhinal cortex in the fascicularis monkey
Neuroscience
Neuropsychological impairments in the recognition of faces, voices, and personal names
Brain and Cognition
A comparison of bound and unbound audio–visual information processing in the human cerebral cortex
Cognitive Brain Research
Functional topography of working memory for face or voice identity
NeuroImage
Auditory–visual speech perception examined by fMRI and PET
Neuroscience Research
Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition
NeuroImage
Prosopagnosia: A clinical and anatomical study of four patients
Cortex
Modulation of neural responses to speech by directing attention to voices or verbal content
Brain Research Cognitive Brain Research
Audio–visual integration in temporal perception
International Journal of Psychophysiology
Functional imaging of human crossmodal identification and object recognition
Experimental Brain Research
The development of infant learning about specific face–voice relations
Developmental Psychology
Unraveling multisensory integration: Patchy organization within human STS multisensory cortex
Nature Neuroscience
Statistical criteria in fMRI studies of multisensory integration
Neuroinformatics
FMRI study of emotional speech comprehension
Cerebral Cortex
Voice-selective areas in human auditory cortex
Nature
Human brain language areas identified by functional magnetic resonance imaging
Journal of Neuroscience
Relation between brain activation and lexical performance
Human Brain Mapping
Recognition memory: What are the roles of the perirhinal cortex and hippocampus?
Nature Review Neuroscience
Understanding face recognition with an interactive model
British Journal of Psychology
Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans
Nature Neuroscience
Neural correlates of cross-modal binding
Nature Neuroscience
Understanding the recognition of facial identity and facial expression
Nature Review Neuroscience
Cited by (95)
The hearing hippocampus
2022, Progress in NeurobiologyAssociations between childhood chronic stress and dynamic functional connectivity in drug-naïve, first-episode adolescent MDD
2022, Journal of Affective DisordersFace processing in the temporal lobe
2022, Handbook of Clinical NeurologySpeech-Driven Spectrotemporal Receptive Fields Beyond the Auditory Cortex
2021, Hearing Research