Elsevier

Cortex

Volume 47, Issue 3, March 2011, Pages 367-376
Cortex

Research report
Cross-modal interactions between human faces and voices involved in person recognition

https://doi.org/10.1016/j.cortex.2010.03.003Get rights and content

Abstract

Faces and voices are key features of human recognition but the way the brain links them together is still unknown. In this study, we measured brain activity using functional magnetic resonance imaging (fMRI) while participants were recognizing previously learned static faces, voices and voice–static face associations. Using a subtraction method between bimodal and unimodal conditions, we observed that voice–face associations activated both unimodal visual and auditory areas, and specific multimodal regions located in the left angular gyrus and the right hippocampus. Moreover, a functional connectivity analysis confirmed the connectivity of the right hippocampus with the unimodal areas. These findings demonstrate that binding faces and voices rely on a cerebral network sustaining different aspects of integration such as sensory inputs processing, attention and memory.

Introduction

Human social interactions are shaped by our ability to identify individuals, a process to which face and voice recognition contributes both separately and jointly. Much research has been devoted to unimodal face recognition. Neuroimaging studies have shown that human faces are mainly processed by temporo-occipital regions of the brain with a right hemispheric dominance, and in particular in the fusiform gyrus (the so-called Fusiform Face Area – FFA, Sergent et al., 1992, Kanwisher et al., 1997, Rhodes et al., 2004). Studies with brain-damaged patients have revealed a selective impairment of face recognition, called prosopagnosia, associated with lesions of the right fusiform gyrus (De Renzi et al., 1994, Takahashi et al., 1995). Fewer studies have focused on voice recognition. Voice recognition takes place bilaterally in the superior temporal cortex, with a particular recruitment of the anterior part of the right superior temporal sulcus (STS, Belin et al., 2000, Belin et al., 2002, Von Kriegstein et al., 2003). Phonagnosia, the selective impairment of voice recognition, is predominantly associated with lesions of the right hemisphere (Neuner and Schweinberger, 2000), and Van Lancker et al. (1989) showed that an impairment in voice recognition was significantly correlated with right parietal lobe damages.

Although neuroanatomically segregated, faces and voices interact, not only at a perceptual level (Calvert et al., 1999, Olson et al., 2002, Sekiyama et al., 2003), but also during person recognition (Burton et al., 1990, Ellis et al., 1997). Such integration skills emerge very early in life (Bahrick et al., 2005), but only few studies investigated cross-modal interactions between faces and voices in person identification. Schweinberger et al. (2007) showed that voice recognition was easier when simultaneously presented with an associated face, whereas it was hampered when presented with a face that did not share the same identity. This demonstrates that listeners cannot ignore a face as soon as it is presented in time synchrony with a voice. This effect was not observed with unfamiliar voices, which suggests that audio–visual integration in person recognition depends on multimodal representation of people, established through experience (for a review, see Campanella and Belin, 2007). However, the brain processes by which voices and faces, which are processed by distinct cerebral regions, are integrated into a unique and coherent representation of a person are still largely unknown.

Here we performed a functional magnetic resonance imaging (fMRI) study to investigate the cerebral correlates of voice–face interactions in a recognition task. We expected that voices alone would elicit a bilateral activation of the temporal cortex and in particular the anterior part of the right STS, and that faces alone would elicit an activation of the right FFA, i.e., the classical areas dedicated to the processing of voices and faces respectively. On the basis of neuroimaging studies showing an involvement of unimodal and multimodal areas in cross-modal binding (Wada et al., 2003, Bushara et al., 2003), we also predicted that bimodal stimulations would activate both unimodal visual and auditory areas (as previously observed for auditory speech perception, Calvert et al., 1999), and multimodal areas such as the anterior part of the temporal lobes, the hippocampus (Brown and Aggleton, 2001, Kirwan and Stark, 2004) and the parietal cortex (Saito et al., 2005, Bernstein et al., 2008). The anterior part of the temporal regions are known to be involved in the cross-modal processing of personal identity (Gorno-Tempini et al., 1998, Gainotti et al., 2003, Tsukiura et al., 2005, Calder and Young, 2005), and the hippocampus is known to be involved in the conjunction of features (Brown and Aggleton, 2001), in particular in the associative processes devoted to the recognition of faces (Kirwan and Stark, 2004). We also expected an activation of the left parietal cortex as (1) we have already observed its specific activation in a Positron Emission Tomography (PET) study investigating the associative processes between faces and written names (Campanella et al., 2001), and (2) the left parietal cortex is known to be a part of the heteromodal associative cortex (Niznikiewicz et al., 2000, Booth et al., 2002, Booth et al., 2003) involved in the binding of visual and auditory speech (Saito et al., 2005).

Section snippets

Participants

Fourteen healthy volunteers [7 females, mean age: 23.5, standard deviation (SD): 3.99] participated in the fMRI study. All were right handed, native French speakers, had normal vision and audition, and gave their written informed consent. The experimental protocol was approved by the Biomedical Ethical Committee of the Catholic University of Louvain.

Stimuli

The stimuli consisted of four associations. Each association formed a schematic person (or identity) and was composed of a female face (black and

Behavioral data

We observed significant latencies differences between V, F and VF [F(2,26) = 31.27, p < .001]. Subsequent paired t-tests revealed that (1) voices were identified more slowly than faces [t(13) = 6.47, p < .001] or voice–face associations [t(13) = 5.55, p < .001], although these two latter conditions did not significantly differ [t(13) = 1.36, ns, Table 2 and Fig. 2].

The percentages of correct responses showed the same patterns of results. Significant differences between V, F and VF [F(2,26) = 22.93, p < .001]

Discussion

The aim of the present study was to investigate the cerebral correlates of voice–face interactions involved in person recognition. By using a subtraction method between bimodal and unimodal conditions, we isolated the cerebral regions sustaining face–voice integration. We observed the activation of a cortical network including unimodal visual and auditory regions along with multimodal regions such as the hippocampus and the left angular gyrus.

Acknowledgments

This work was supported by grant No. 1.5.130.05F from the National Fund for Scientific Research (Belgium), and grant No. 01/06-267 from the Communauté Française de Belgique – Actions de Recherche Concertées (Belgium).

Frédéric Joassin and Pierre Maurage are Postdoctoral Researchers, and Salvatore Campanella and Mauro Pesenti Research Associates at the National Fund for Scientific Research (Belgium). We thank the Radiodiagnosis Unit at the Cliniques St. Luc (Brussels) for its support, and Ms. Sue

References (70)

  • F. Joassin et al.

    When audition alters vision: An event-related potential study of the cross-modal interactions between faces and voices

    Neuroscience Letters

    (2004)
  • K.S. LaBar et al.

    Neuroanatomic overlap of working memory and spatial attention networks: A functional MRI comparison within subjects

    NeuroImage

    (1999)
  • B. Milner et al.

    Cognitive neuroscience and the study of memory

    Neuron

    (1998)
  • A. Mohedano-Moriano et al.

    Convergence of unimodal and polymodal sensory input to the entorhinal cortex in the fascicularis monkey

    Neuroscience

    (2008)
  • F. Neuner et al.

    Neuropsychological impairments in the recognition of faces, voices, and personal names

    Brain and Cognition

    (2000)
  • I.R. Olson et al.

    A comparison of bound and unbound audio–visual information processing in the human cerebral cortex

    Cognitive Brain Research

    (2002)
  • P. Rämä et al.

    Functional topography of working memory for face or voice identity

    NeuroImage

    (2005)
  • K. Sekiyama et al.

    Auditory–visual speech perception examined by fMRI and PET

    Neuroscience Research

    (2003)
  • R.A. Stevenson et al.

    Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition

    NeuroImage

    (2009)
  • N. Takahashi et al.

    Prosopagnosia: A clinical and anatomical study of four patients

    Cortex

    (1995)
  • K. Von Kriegstein et al.

    Modulation of neural responses to speech by directing attention to voices or verbal content

    Brain Research Cognitive Brain Research

    (2003)
  • Y. Wada et al.

    Audio–visual integration in temporal perception

    International Journal of Psychophysiology

    (2003)
  • A. Amedi et al.

    Functional imaging of human crossmodal identification and object recognition

    Experimental Brain Research

    (2005)
  • L.E. Bahrick et al.

    The development of infant learning about specific face–voice relations

    Developmental Psychology

    (2005)
  • M.S. Beauchamp et al.

    Unraveling multisensory integration: Patchy organization within human STS multisensory cortex

    Nature Neuroscience

    (2004)
  • M.S. Beauchamp

    Statistical criteria in fMRI studies of multisensory integration

    Neuroinformatics

    (2005)
  • V. Beaucousin et al.

    FMRI study of emotional speech comprehension

    Cerebral Cortex

    (2007)
  • P. Belin et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • J.R. Binder et al.

    Human brain language areas identified by functional magnetic resonance imaging

    Journal of Neuroscience

    (1997)
  • J.R. Booth et al.

    Relation between brain activation and lexical performance

    Human Brain Mapping

    (2003)
  • M.W. Brown et al.

    Recognition memory: What are the roles of the perirhinal cortex and hippocampus?

    Nature Review Neuroscience

    (2001)
  • A.M. Burton et al.

    Understanding face recognition with an interactive model

    British Journal of Psychology

    (1990)
  • K.O. Bushara et al.

    Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans

    Nature Neuroscience

    (1999)
  • K.O. Bushara et al.

    Neural correlates of cross-modal binding

    Nature Neuroscience

    (2003)
  • A.J. Calder et al.

    Understanding the recognition of facial identity and facial expression

    Nature Review Neuroscience

    (2005)
  • Cited by (95)

    • The hearing hippocampus

      2022, Progress in Neurobiology
    • Face processing in the temporal lobe

      2022, Handbook of Clinical Neurology
    View all citing articles on Scopus
    View full text