Elsevier

NeuroImage

Volume 48, Issue 2, 1 November 2009, Pages 475-485
NeuroImage

The role of actions in auditory object discrimination

https://doi.org/10.1016/j.neuroimage.2009.06.041Get rights and content

Abstract

Action representations can interact with object recognition processes. For example, so-called mirror neurons respond both when performing an action and when seeing or hearing such actions. Investigations of auditory object processing have largely focused on categorical discrimination, which begins within the initial 100 ms post-stimulus onset and subsequently engages distinct cortical networks. Whether action representations themselves contribute to auditory object recognition and the precise kinds of actions recruiting the auditory–visual mirror neuron system remain poorly understood. We applied electrical neuroimaging analyses to auditory evoked potentials (AEPs) in response to sounds of man-made objects that were further subdivided between sounds conveying a socio-functional context and typically cuing a responsive action by the listener (e.g. a ringing telephone) and those that are not linked to such a context and do not typically elicit responsive actions (e.g. notes on a piano). This distinction was validated psychophysically by a separate cohort of listeners. Beginning ~ 300 ms, responses to such context-related sounds significantly differed from context-free sounds both in the strength and topography of the electric field. This latency is > 200 ms subsequent to general categorical discrimination. Additionally, such topographic differences indicate that sounds of different action sub-types engage distinct configurations of intracranial generators. Statistical analysis of source estimations identified differential activity within premotor and inferior (pre)frontal regions (Brodmann's areas (BA) 6, BA8, and BA45/46/47) in response to sounds of actions typically cuing a responsive action. We discuss our results in terms of a spatio-temporal model of auditory object processing and the interplay between semantic and action representations.

Introduction

Recognition of visual, linguistic, and auditory stimuli can be influenced by associated actions (Gibson, 1979, Rizzolatti et al., 1996, Grèzes et al., 2003, Aziz-Zadeh et al., 2004, Barraclough et al., 2005, Pizzamiglio et al., 2005, Pulvermüller, 2005, Tettamanti et al., 2005, Hauk et al., 2006, Hauk et al., 2008, Lahav et al., 2007, Pazzaglia et al., 2008), and distinct neuronal response patterns or networks can be observed for objects linked to actions. In the case of sounds, these networks can include premotor and (pre)frontal cortices often, but not exclusively, attributed to the so-called audio–visual mirror neuron system (Kohler et al., 2002, Keysers et al., 2003). More generally, such activations are consistent with current anatomical models of the auditory ‘what’ pathway (Rauschecker, 1998, Rauschecker and Tian, 2000) that includes projections from auditory regions of the superior temporal cortex ultimately to prefrontal and premotor regions (Romanski et al., 1999a, Romanski et al., 1999b, Kaas and Hackett, 2000). In agreement, functional imaging studies have documented reliable activations within prefrontal cortices and elsewhere in response to environmental sounds and vocalizations (e.g. Lewis et al., 2005; Fecteau et al., 2005; Murray et al., 2006). One implication of this construct is that action representation is itself operating in concert with and perhaps guiding object recognition processes. However, the precise spatio-temporal relationship between object and action-related processes remains poorly understood, particularly with regard to sounds of environmental objects, and was the focus of the present electrical neuroimaging study that capitalized on the high temporal resolution of scalp-recorded electroencephalography as well as recent improvements in source estimations (Michel et al., 2004).

Auditory object recognition has been shown to include categorical discrimination, such that sounds of living (including the sub-category of vocalizations) and man-made environmental objects, for example, can engage distinct brain networks (Belin et al., 2000; Fecteau et al., 2005; Lewis et al., 2005, Altmann et al., 2007) at post-stimulus latencies as early as 70 ms and with different durations of activity (Murray et al., 2006; reviewed in Murray and Spierer, 2009). In particular, stronger responses have been observed to sounds of man-made objects within premotor and prefrontal cortices (Lewis et al., 2005, Murray et al., 2006), raising the possibility that sounds of man-made objects have stronger associations with action representations than sounds of living objects, which may instead have stronger associations with visual representations (e.g. Murray et al., 2004, Murray et al., 2005, Amedi et al., 2005). While the precise type(s) of actions necessary to elicit response modulations in these premotor and prefrontal regions remain undefined, it is noteworthy that the stimuli in the abovementioned studies included a wide variety of man-made objects, including tools (Lewis et al., 2005) as well as a mixture of musical instruments, household items/appliances, and alarms (Murray et al., 2006). Still others have documented responses within lateral ventral (pre)frontal cortices in response to vocalizations (Fecteau et al., 2005 for functional magnetic resonance imaging results; Murray et al., 2006 for electrical neuroimaging data).

This pattern of results led us to hypothesize that there is general activity within mirror neuron regions in response to sounds of objects that may in turn modulate as a function of sub-types of associated actions. One line of support for this hypothesis comes from research examining responsiveness of neurons within ventral (lateral) prefrontal cortices (vPFC) to animal vocalizations. These neurons differentially responded to vocalizations referring to food discovery vs. other communicative situations, irrespective of the quality of the foods to which they referred (Cohen et al., 2006). Such results are suggestive of a dichotomy in the responsiveness within vPFC (and perhaps elsewhere) between sound categories that may reflect their social and/or functional context as well as their cuing of the listener to react in a specific manner (e.g. partake in the discovered food vs. greetings). The present study considers two sub-groups of sounds of actions: those conveying a specific social and/or functional context often cuing listeners to act in response and those sounds not forcibly linked to a specific context and not cuing a responsive action. We use the terms ‘context-related’ and ‘context-free’, respectively, as shorthand to refer to this distinction (see Seyfarth et al., 1980, Hauser, 1998 for similar varieties of distinctions).

Links between modulated activity within the mirror system and action representations elicited by sounds have been established (e.g. Kohler et al., 2002, Keysers et al., 2003, Hauk et al., 2006, Hauk et al., 2008, Kaplan and Iacoboni, 2006; Galati et al., 2008). For example, in their study that first described the responsiveness of ventral premotor mirror neurons to sounds of actions, Kohler et al. (2002) presented macaque monkeys with sounds of actions (e.g. paper ripping or a stick hitting the floor), animal vocalizations, and noise bursts. They found that while these neurons reliably responded to sounds of actions they failed to exhibit robust responses to sounds of vocalizations or noise bursts (see also Keysers et al., 2003). In addition, these authors reported a near-perfect correspondence between a single neuron's selectivity for a given action when presented as a sound and when presented visually. This selectivity and inter-sensory correspondence would suggest that these modulations are not reflecting simple semantic analysis. However, specification of the spatio-temporal brain dynamics of semantic and action-related processes remains to be fully established.

Investigations in humans that studied the interplay between environmental sound recognition and action representations are relatively rare and have thus far generated discordant conclusions regarding the temporal dynamics of these processes. On the one hand, Pizzamiglio et al. (2005) reported effects starting at ~ 300 ms post-stimulus onset using a masked repetition priming paradigm with sounds produced by human beings or not (e.g. hands clapping vs. water boiling). By contrast, Hauk et al. (2006) reported effects as early as ~ 100 ms post-stimulus onset using an adaptation of a multi-deviant mismatch negativity (MMN) paradigm and comparing responses to clicks produced by the finger or tongue both with each other and also with respect to acoustically controlled synthetic variants. While the use of an MMN paradigm allowed Hauk et al. to also assess whether action representations are accessed pre-attentively, a potential limitation of their contrast, which the authors themselves acknowledge, is that complex spectral features were only presented in the naturalistic stimuli and could have elicited larger MMNs than the control sounds (though such would not account for the topographically distinct and somatopic effects they observed between finger and tongue sounds). That is, larger MMNs have been reported for meaningful than for meaningless control stimuli (e.g. Frangos et al., 2005; also Hauk et al., 2006 for discussion). More generally, the difference in the latency of effects reported in these studies could stem from numerous sources, including but not limited to task-related effects (i.e. explicit discrimination of actions in Pizzamiglio et al. 2005 vs. passive listening in Hauk et al., 2006). As such, it remains unresolved both when action representations are accessed, in particular with respect to ordinate-level object discrimination, and whether such access occurs pre-attentively.

A further complication for generating a synthesis in terms of the necessary conditions for observing response modulations within the human auditory mirror neuron system is that action-related differences between stimuli are often confounded by semantic differences. For example, response differences between the sound of paper being ripped and a non-speech vocalization may either reflect action-related processes and/or man-made vs. living categorization. The present study sought to circumvent this confound by comparing different sub-types of sounds of man-made environmental objects that were further sorted between ‘context-related’ and ‘context-free’ actions. Specifically, we applied electrical neuroimaging analyses (Murray et al., 2008a) to auditory evoked potentials (AEPs) in response to distracter trials during a living vs. man-made discrimination task in order to identify the spatio-temporal mechanism whereby representations of responsive actions impact sound discrimination and situate such with respect to current models of auditory object processing (Griffiths and Warren, 2004, Murray and Spierer, 2009).

Section snippets

Subjects

Ten healthy, right-handed individuals (7 female), aged 21–34 years participated. All subjects provided written, informed consent to participate in the study, the procedures of which were approved by the Ethics Committee of the University of Geneva. None had a history of neurological or psychiatric illnesses, and all reported normal hearing. None were musicians. Data from these individuals have been previously published in an investigation of living versus man-made categorical discrimination (

Behavioral results

Participants accurately performed the target detection task (see Murray et al., 2006 for details). In terms of their performance with context-free and context-related sounds, the mean (± s.e.m.) percentage of correct responses were 94.0 ± 3.1% and 93.5 ± 1.9%, respectively, and did not significantly differ (t(9) = 0.25; p > 0.8). Likewise, reaction times to context-free sounds versus context-related sounds were 883 ± 41 ms and 903 ± 38 ms, respectively, and did not significantly differ (t(9) = 0.76; p > 0.45).

Discussion

We identified the timing and neurophysiologic mechanism by which sounds of man-made environmental objects, all of which involve actions for their generation, are discriminated from one another. To determine the role of action representations in object discrimination, we focused here on the impact of whether or not the sounds typically cue the production of an action in response by the listener. Electrical neuroimaging analyses revealed that AEPs to context-related sounds (i.e. those that also

Acknowledgments

Cartool software (http://brainmapping.unige.ch/Cartool.htm) has been programmed by Denis Brunet, from the Functional Brain Mapping Laboratory, Geneva, Switzerland, and is supported by the Electroencephalography Brain Mapping Core of the Center for Biomedical Imaging (http://www.cibm.ch) of Geneva and Lausanne. Christoph Michel and Jean-François Knebel provided additional analysis tools. Financial support was provided by the Swiss National Science Foundation (grants K-33K1_122518/1 to MDL,

References (68)

  • MurrayM.M. et al.

    Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging

    NeuroImage

    (2004)
  • MurrayM.M. et al.

    The brain uses single-trial multisensory memories to discriminate without awareness

    NeuroImage

    (2005)
  • MurrayM.M. et al.

    Plasticity in representations of environmental sounds revealed by electrical neuroimaging

    NeuroImage

    (2008)
  • PerrinF. et al.

    Mapping of scalp potentials by surface spline interpolation

    Electroencephalogr. Clin. Neurophysiol.

    (1987)
  • PazzagliaM. et al.

    The sound of actions in apraxia

    Curr. Biol.

    (2008)
  • PizzamiglioL. et al.

    Separate neural systems for processing action- or non-action-related sounds

    NeuroImage

    (2005)
  • PlonseyR.

    The nature of the sources of bioelectric and biomagnetic fields

    Biophys. J.

    (1982)
  • RizzolattiG. et al.

    Premotor cortex and the recognition of motor actions

    Brain Res. Cogn. Brain Res.

    (1996)
  • SenkforA.J.

    Memory for pantomimed actions versus actions with real objects

    Cortex

    (2008)
  • ToepelU. et al.

    The brain tracks the energetic value in food images

    NeuroImage

    (2009)
  • von KriegsteinK. et al.

    Neural representation of auditory size in the human voice and in sounds from other resonant sources

    Curr. Biol.

    (2007)
  • AltmannC.F. et al.

    Selectivity for animal vocalizations in the human auditory cortex

    Cereb. Cortex

    (2007)
  • AmediA. et al.

    Functional imaging of human crossmodal identification and object recognition

    Exp. Brain Res.

    (2005)
  • Aziz-ZadehL. et al.

    Left hemisphere motor facilitation in response to manual action sounds

    Eur. J. Neurosci.

    (2004)
  • BarracloughN.E. et al.

    Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions

    J. Cogn. Neurosci.

    (2005)
  • BelinP. et al.

    Voice-selective areas in human auditory cortex

    Nature

    (2000)
  • BergerbestD. et al.

    Neural correlates of auditory repetition priming: reduced fMRI activation in the auditory cortex

    J. Cogn. Neurosci.

    (2004)
  • CohenY.E. et al.

    Spontaneous processing of abstract categorical information in the ventrolateral prefrontal cortex

    Biol. Lett.

    (2006)
  • De LuciaM. et al.

    Single-subject EEG analysis based on topographic information

    Int. J. Bioelectromagn.

    (2007)
  • De Lucia, M., Michel, C.M., Clarke, S., Murray, M.M., 2007b. Single-trial topographic analysis of human EEG: a new...
  • De LuciaM. et al.

    Perceptual and semantic contributions to repetition priming of environmental sounds

    Cereb. Cortex

    (2009)
  • di PellegrinoG. et al.

    Understanding motor events: a neurophysiological study

    Exp. Brain Res.

    (1992)
  • DoehrmannO. et al.

    Probing category selectivity for environmental sounds in the human auditory brain

    Neuropsychologia

    (2008)
  • Fabbri-DestroM. et al.

    Mirror neurons and mirror systems in monkeys and humans

    Physiology (Bethesda)

    (2008)
  • Cited by (35)

    • Multisensory contributions to object recognition and memory across the life span

      2019, Multisensory Perception: From Laboratory to Clinic
    • From bird to sparrow: Learning-induced modulations in fine-grained semantic discrimination

      2015, NeuroImage
      Citation Excerpt :

      Starting at approximately 170 ms, neural activity within the right superior temporal cortex distinguishes the (non-verbal) vocalizations of humans vs. animals (Renvall et al., 2012; De Lucia et al., 2010). Within the category of man-made sounds, those which typically cue for a responsive action by the listener (e.g. a ringing telephone) and those which do not yield different neural activity in the premotor and inferior prefrontal cortex starting at 300 ms (De Lucia et al., 2009). Despite this accumulating knowledge regarding the representation of different semantic categories of sounds, little is known about the mechanisms involved in the discrimination within narrow categories.

    • Roaring lions and chirruping lemurs: How the brain encodes sound objects in space

      2015, Neuropsychologia
      Citation Excerpt :

      Although the ventral stream is the key structure for sound recognition, specific sound categories involve additional parts of the frontal cortex. In particular, action-related environmental sounds have been shown to activate parts of the motor, premotor and prefrontal cortices that are classically associated with the dorsal stream (Lahav et al., 2007; Lewis et al., 2005; Pizzamiglio et al., 2005; Gazzola et al., 2006; Hauk et al., 2006; Doehrmann et al., 2008; De Lucia et al., 2009). Furthermore, evoked muscular potentials induced by transcranial magnetic stimulation of the motor cortex are significantly increased when participants listen to action-related compared to action-unrelated sounds (Aziz-Zadeh et al., 2004; Bourquin et al., 2013b).

    View all citing articles on Scopus
    View full text