The role of actions in auditory object discrimination
Introduction
Recognition of visual, linguistic, and auditory stimuli can be influenced by associated actions (Gibson, 1979, Rizzolatti et al., 1996, Grèzes et al., 2003, Aziz-Zadeh et al., 2004, Barraclough et al., 2005, Pizzamiglio et al., 2005, Pulvermüller, 2005, Tettamanti et al., 2005, Hauk et al., 2006, Hauk et al., 2008, Lahav et al., 2007, Pazzaglia et al., 2008), and distinct neuronal response patterns or networks can be observed for objects linked to actions. In the case of sounds, these networks can include premotor and (pre)frontal cortices often, but not exclusively, attributed to the so-called audio–visual mirror neuron system (Kohler et al., 2002, Keysers et al., 2003). More generally, such activations are consistent with current anatomical models of the auditory ‘what’ pathway (Rauschecker, 1998, Rauschecker and Tian, 2000) that includes projections from auditory regions of the superior temporal cortex ultimately to prefrontal and premotor regions (Romanski et al., 1999a, Romanski et al., 1999b, Kaas and Hackett, 2000). In agreement, functional imaging studies have documented reliable activations within prefrontal cortices and elsewhere in response to environmental sounds and vocalizations (e.g. Lewis et al., 2005; Fecteau et al., 2005; Murray et al., 2006). One implication of this construct is that action representation is itself operating in concert with and perhaps guiding object recognition processes. However, the precise spatio-temporal relationship between object and action-related processes remains poorly understood, particularly with regard to sounds of environmental objects, and was the focus of the present electrical neuroimaging study that capitalized on the high temporal resolution of scalp-recorded electroencephalography as well as recent improvements in source estimations (Michel et al., 2004).
Auditory object recognition has been shown to include categorical discrimination, such that sounds of living (including the sub-category of vocalizations) and man-made environmental objects, for example, can engage distinct brain networks (Belin et al., 2000; Fecteau et al., 2005; Lewis et al., 2005, Altmann et al., 2007) at post-stimulus latencies as early as 70 ms and with different durations of activity (Murray et al., 2006; reviewed in Murray and Spierer, 2009). In particular, stronger responses have been observed to sounds of man-made objects within premotor and prefrontal cortices (Lewis et al., 2005, Murray et al., 2006), raising the possibility that sounds of man-made objects have stronger associations with action representations than sounds of living objects, which may instead have stronger associations with visual representations (e.g. Murray et al., 2004, Murray et al., 2005, Amedi et al., 2005). While the precise type(s) of actions necessary to elicit response modulations in these premotor and prefrontal regions remain undefined, it is noteworthy that the stimuli in the abovementioned studies included a wide variety of man-made objects, including tools (Lewis et al., 2005) as well as a mixture of musical instruments, household items/appliances, and alarms (Murray et al., 2006). Still others have documented responses within lateral ventral (pre)frontal cortices in response to vocalizations (Fecteau et al., 2005 for functional magnetic resonance imaging results; Murray et al., 2006 for electrical neuroimaging data).
This pattern of results led us to hypothesize that there is general activity within mirror neuron regions in response to sounds of objects that may in turn modulate as a function of sub-types of associated actions. One line of support for this hypothesis comes from research examining responsiveness of neurons within ventral (lateral) prefrontal cortices (vPFC) to animal vocalizations. These neurons differentially responded to vocalizations referring to food discovery vs. other communicative situations, irrespective of the quality of the foods to which they referred (Cohen et al., 2006). Such results are suggestive of a dichotomy in the responsiveness within vPFC (and perhaps elsewhere) between sound categories that may reflect their social and/or functional context as well as their cuing of the listener to react in a specific manner (e.g. partake in the discovered food vs. greetings). The present study considers two sub-groups of sounds of actions: those conveying a specific social and/or functional context often cuing listeners to act in response and those sounds not forcibly linked to a specific context and not cuing a responsive action. We use the terms ‘context-related’ and ‘context-free’, respectively, as shorthand to refer to this distinction (see Seyfarth et al., 1980, Hauser, 1998 for similar varieties of distinctions).
Links between modulated activity within the mirror system and action representations elicited by sounds have been established (e.g. Kohler et al., 2002, Keysers et al., 2003, Hauk et al., 2006, Hauk et al., 2008, Kaplan and Iacoboni, 2006; Galati et al., 2008). For example, in their study that first described the responsiveness of ventral premotor mirror neurons to sounds of actions, Kohler et al. (2002) presented macaque monkeys with sounds of actions (e.g. paper ripping or a stick hitting the floor), animal vocalizations, and noise bursts. They found that while these neurons reliably responded to sounds of actions they failed to exhibit robust responses to sounds of vocalizations or noise bursts (see also Keysers et al., 2003). In addition, these authors reported a near-perfect correspondence between a single neuron's selectivity for a given action when presented as a sound and when presented visually. This selectivity and inter-sensory correspondence would suggest that these modulations are not reflecting simple semantic analysis. However, specification of the spatio-temporal brain dynamics of semantic and action-related processes remains to be fully established.
Investigations in humans that studied the interplay between environmental sound recognition and action representations are relatively rare and have thus far generated discordant conclusions regarding the temporal dynamics of these processes. On the one hand, Pizzamiglio et al. (2005) reported effects starting at ~ 300 ms post-stimulus onset using a masked repetition priming paradigm with sounds produced by human beings or not (e.g. hands clapping vs. water boiling). By contrast, Hauk et al. (2006) reported effects as early as ~ 100 ms post-stimulus onset using an adaptation of a multi-deviant mismatch negativity (MMN) paradigm and comparing responses to clicks produced by the finger or tongue both with each other and also with respect to acoustically controlled synthetic variants. While the use of an MMN paradigm allowed Hauk et al. to also assess whether action representations are accessed pre-attentively, a potential limitation of their contrast, which the authors themselves acknowledge, is that complex spectral features were only presented in the naturalistic stimuli and could have elicited larger MMNs than the control sounds (though such would not account for the topographically distinct and somatopic effects they observed between finger and tongue sounds). That is, larger MMNs have been reported for meaningful than for meaningless control stimuli (e.g. Frangos et al., 2005; also Hauk et al., 2006 for discussion). More generally, the difference in the latency of effects reported in these studies could stem from numerous sources, including but not limited to task-related effects (i.e. explicit discrimination of actions in Pizzamiglio et al. 2005 vs. passive listening in Hauk et al., 2006). As such, it remains unresolved both when action representations are accessed, in particular with respect to ordinate-level object discrimination, and whether such access occurs pre-attentively.
A further complication for generating a synthesis in terms of the necessary conditions for observing response modulations within the human auditory mirror neuron system is that action-related differences between stimuli are often confounded by semantic differences. For example, response differences between the sound of paper being ripped and a non-speech vocalization may either reflect action-related processes and/or man-made vs. living categorization. The present study sought to circumvent this confound by comparing different sub-types of sounds of man-made environmental objects that were further sorted between ‘context-related’ and ‘context-free’ actions. Specifically, we applied electrical neuroimaging analyses (Murray et al., 2008a) to auditory evoked potentials (AEPs) in response to distracter trials during a living vs. man-made discrimination task in order to identify the spatio-temporal mechanism whereby representations of responsive actions impact sound discrimination and situate such with respect to current models of auditory object processing (Griffiths and Warren, 2004, Murray and Spierer, 2009).
Section snippets
Subjects
Ten healthy, right-handed individuals (7 female), aged 21–34 years participated. All subjects provided written, informed consent to participate in the study, the procedures of which were approved by the Ethics Committee of the University of Geneva. None had a history of neurological or psychiatric illnesses, and all reported normal hearing. None were musicians. Data from these individuals have been previously published in an investigation of living versus man-made categorical discrimination (
Behavioral results
Participants accurately performed the target detection task (see Murray et al., 2006 for details). In terms of their performance with context-free and context-related sounds, the mean (± s.e.m.) percentage of correct responses were 94.0 ± 3.1% and 93.5 ± 1.9%, respectively, and did not significantly differ (t(9) = 0.25; p > 0.8). Likewise, reaction times to context-free sounds versus context-related sounds were 883 ± 41 ms and 903 ± 38 ms, respectively, and did not significantly differ (t(9) = 0.76; p > 0.45).
Discussion
We identified the timing and neurophysiologic mechanism by which sounds of man-made environmental objects, all of which involve actions for their generation, are discriminated from one another. To determine the role of action representations in object discrimination, we focused here on the impact of whether or not the sounds typically cue the production of an action in response by the listener. Electrical neuroimaging analyses revealed that AEPs to context-related sounds (i.e. those that also
Acknowledgments
Cartool software (http://brainmapping.unige.ch/Cartool.htm) has been programmed by Denis Brunet, from the Functional Brain Mapping Laboratory, Geneva, Switzerland, and is supported by the Electroencephalography Brain Mapping Core of the Center for Biomedical Imaging (http://www.cibm.ch) of Geneva and Lausanne. Christoph Michel and Jean-François Knebel provided additional analysis tools. Financial support was provided by the Swiss National Science Foundation (grants K-33K1_122518/1 to MDL,
References (68)
- et al.
Listening to action-related sentences modulates the activity of the motor system: a combined TMS and behavioral study
Brain Res. Cogn. Brain Res.
(2005) - et al.
A selective representation of the meaning of actions in the auditory mirror system
NeuroImage
(2008) - et al.
Empathy and the somatotopic auditory mirror system in humans
Curr. Biol.
(2006) - et al.
Electrical neuroimaging based on biophysical constraints
NeuroImage
(2004) - et al.
Activations related to “mirror” and “canonical” neurones in the human brain: an fMRI study
NeuroImage
(2003) - et al.
The time course of action and action–word comprehension in the human brain as revealed by neurophysiology
J. Physiol. Paris
(2008) Functional referents and acoustic similarity: field playback experiments with rhesus monkeys
Anim. Behav.
(1998)- et al.
Effect of environmental sound familiarity on dynamic neural activation/inhibition patterns: an ERD mapping study
NeuroImage
(1998) - et al.
Reference-free identification of components of checkerboard-evoked multichannel potential fields
Electroencephalogr. Clin. Neurophysiol.
(1980) - et al.
EEG source imaging
Clin. Neurophysiol.
(2004)
Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging
NeuroImage
The brain uses single-trial multisensory memories to discriminate without awareness
NeuroImage
Plasticity in representations of environmental sounds revealed by electrical neuroimaging
NeuroImage
Mapping of scalp potentials by surface spline interpolation
Electroencephalogr. Clin. Neurophysiol.
The sound of actions in apraxia
Curr. Biol.
Separate neural systems for processing action- or non-action-related sounds
NeuroImage
The nature of the sources of bioelectric and biomagnetic fields
Biophys. J.
Premotor cortex and the recognition of motor actions
Brain Res. Cogn. Brain Res.
Memory for pantomimed actions versus actions with real objects
Cortex
The brain tracks the energetic value in food images
NeuroImage
Neural representation of auditory size in the human voice and in sounds from other resonant sources
Curr. Biol.
Selectivity for animal vocalizations in the human auditory cortex
Cereb. Cortex
Functional imaging of human crossmodal identification and object recognition
Exp. Brain Res.
Left hemisphere motor facilitation in response to manual action sounds
Eur. J. Neurosci.
Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions
J. Cogn. Neurosci.
Voice-selective areas in human auditory cortex
Nature
Neural correlates of auditory repetition priming: reduced fMRI activation in the auditory cortex
J. Cogn. Neurosci.
Spontaneous processing of abstract categorical information in the ventrolateral prefrontal cortex
Biol. Lett.
Single-subject EEG analysis based on topographic information
Int. J. Bioelectromagn.
Perceptual and semantic contributions to repetition priming of environmental sounds
Cereb. Cortex
Understanding motor events: a neurophysiological study
Exp. Brain Res.
Probing category selectivity for environmental sounds in the human auditory brain
Neuropsychologia
Mirror neurons and mirror systems in monkeys and humans
Physiology (Bethesda)
Cited by (35)
Multisensory contributions to object recognition and memory across the life span
2019, Multisensory Perception: From Laboratory to ClinicAuditory object perception: A neurobiological model and prospective review
2017, NeuropsychologiaFrom bird to sparrow: Learning-induced modulations in fine-grained semantic discrimination
2015, NeuroImageCitation Excerpt :Starting at approximately 170 ms, neural activity within the right superior temporal cortex distinguishes the (non-verbal) vocalizations of humans vs. animals (Renvall et al., 2012; De Lucia et al., 2010). Within the category of man-made sounds, those which typically cue for a responsive action by the listener (e.g. a ringing telephone) and those which do not yield different neural activity in the premotor and inferior prefrontal cortex starting at 300 ms (De Lucia et al., 2009). Despite this accumulating knowledge regarding the representation of different semantic categories of sounds, little is known about the mechanisms involved in the discrimination within narrow categories.
Roaring lions and chirruping lemurs: How the brain encodes sound objects in space
2015, NeuropsychologiaCitation Excerpt :Although the ventral stream is the key structure for sound recognition, specific sound categories involve additional parts of the frontal cortex. In particular, action-related environmental sounds have been shown to activate parts of the motor, premotor and prefrontal cortices that are classically associated with the dorsal stream (Lahav et al., 2007; Lewis et al., 2005; Pizzamiglio et al., 2005; Gazzola et al., 2006; Hauk et al., 2006; Doehrmann et al., 2008; De Lucia et al., 2009). Furthermore, evoked muscular potentials induced by transcranial magnetic stimulation of the motor cortex are significantly increased when participants listen to action-related compared to action-unrelated sounds (Aziz-Zadeh et al., 2004; Bourquin et al., 2013b).