The discovery of audiovisual mirror neurons in monkeys gave rise to the hypothesis that premotor areas are inherently involved not only when observing actions but also when listening to action-related sound. However, the whole-brain functional formation underlying such “action–listening” is not fully understood. In addition, previous studies in humans have focused mostly on relatively simple and overexperienced everyday actions, such as hand clapping or door knocking. Here we used functional magnetic resonance imaging to ask whether the human action-recognition system responds to sounds found in a more complex sequence of newly acquired actions. To address this, we chose a piece of music as a model set of acoustically presentable actions and trained non-musicians to play it by ear. We then monitored brain activity in subjects while they listened to the newly acquired piece. Although subjects listened to the music without performing any movements, activation was found bilaterally in the frontoparietal motor-related network (including Broca's area, the premotor region, the intraparietal sulcus, and the inferior parietal region), consistent with neural circuits that have been associated with action observations, and may constitute the human mirror neuron system. Presentation of the practiced notes in a different order activated the network to a much lesser degree, whereas listening to an equally familiar but motorically unknown music did not activate this network. These findings support the hypothesis of a “hearing–doing” system that is highly dependent on the individual's motor repertoire, gets established rapidly, and consists of Broca's area as its hub.
When we observe someone performing an action, our brain may produce neural activity similar to that seen when we perform it ourselves. This neural simulation happens via a multimodal mirror neuron system, originally found in the monkey ventral premotor cortex (Gallese et al., 1996; Rizzolatti et al., 1996) and recently extended to humans in a variety of action-observation tasks (Decety and Grezes, 1999; Nishitani and Hari, 2000; Buccino et al., 2001; Grezes et al., 2003; Kilner et al., 2003; Gangitano et al., 2004; Hamilton et al., 2004; Calvo-Merino et al., 2005; Haslinger et al., 2005; Iacoboni, 2005; Nelissen et al., 2005; Thornton and Knoblich, 2006). Mirror neurons, however, may not only be triggered by visual stimuli. A subgroup of premotor neurons in monkeys also responded to the sound of actions (e.g., peanut breaking), and it has been recently suggested that there might also be a cross-modal neural system that formally orchestrates these neurons in humans (Kohler et al., 2002; Keysers et al., 2003). However, the whole-brain functional formation and mechanism underlying such “action–listening” is not fully understood.
Action listening takes part in many of our daily activities. Consider, for example, listening to door knocking or finger snapping. You would probably recognize the sound, although at the same time, your brain may also simulate the action (Aziz-Zadeh et al., 2004). Previous studies, however, have mostly focused on actions that we all learn from infancy and that are typically overexperienced, for example, hand clapping (Pizzamiglio et al., 2005), tongue clicking (Hauk et al., 2006), and speech (Fadiga et al., 2002; Wilson et al., 2004; Buccino et al., 2005). One other concern, at least with regard to the sound of speech, is that speech is not representative of all sounds; it carries meaning and is limited to communicative mouth actions, which all together, may activate different types of neural circuits than nonspeech sounds (Pulvermuller, 2001; Zatorre et al., 2002; Thierry et al., 2003; Schon et al., 2005; Ozdemir et al., 2006). Thus, in the present study, we ask whether and how the mirror neuron system will respond to actions and sounds that do not have verbal meaning and, most importantly, are well controlled and newly acquired.
To address this, we chose piano playing as a model task and a piece of music as a set of acoustically presentable novel actions. The use of music making as a sensory–motor framework for studying the acquisition of actions has been demonstrated in several previous studies (Stewart et al., 2003; Bangert et al., 2006b). Typically, when playing the piano, auditory feedback is naturally involved in each of the player's movements, leading to an intimate coupling between perception and action (Bangert and Altenmuller, 2003; Janata and Grafton, 2003). We therefore hypothesized that music one knows how to play (even if only recently learned) may be strongly associated with the corresponding elements of the individual's motor repertoire and might activate an audiomotor network in the human brain (Fig. 1).
Materials and Methods
The present study had two stages. First, we trained musically naive subjects to play a novel piano piece (“trained-music”) and measured their learning progress over a period of 5 d. Next, on day 5, we performed functional magnetic resonance imaging (fMRI) scans on subjects while they passively listened to short passages taken from the newly acquired piece (this was considered the action–listening condition). Similar passages taken from two other original musical pieces that subjects had only listened to (to control for familiarity) but had never learned to play were used for the (two) control conditions (Fig. 2B).
Nine non-musicians (six females and three males; mean age, 22.4 ± 2.2 years old) participated in the study. All subjects were right-handed [assessed by a questionnaire (Oldfield, 1971)], had no previous musical training (including voice), and had no history of neurological, psychiatric, or auditory problems. This study was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center and Boston University, and all subjects gave written informed consent.
Drawing on previous MIDI-based interactive software tools (Bangert et al., 2001), we have developed our own MIDI-based interactive software for learning to play by ear (Lahav et al., 2005) and have trained subjects to play the piano part of a novel musical piece (trained-music) with no sight-reading required. This was done to abolish the requirement for visuomotor translation of musical notation into key presses and to enhance auditory–motor learning. During piano training, subjects learned to play the piano part (along with a prerecorded accompaniment), using their right hand and a set of five keys in a fixed fingering position (F, G, A, Bb, C) (Fig. 1A, gray keys) (the same finger always hit the same key). To complete a piano session, subjects had to gradually go through a series of trials until reaching error-free performance with the musical piece, while a computer notified them of a note (wrong key press) or timing (> of a beat) error after each playing trial.
Two behavioral tests to assess auditory–motor learning
Measuring learning time.
The learning time of the musical piece was a function of the number of errors made during a given session. We measured learning progress in subjects over 5 consecutive days (one session per day), specifically looking at the duration (minutes) it took for each subject to achieve error-free performance and mastery of the piece.
Subjects heard 30 single piano notes [from the set of five notes used in the trained-music: F, G, A, Bb, C] and had to press the corresponding piano key with the matching right-hand finger for each note at a time. The notes were played in random order, out of musical context. To rule out possible learning effects, subjects did not receive knowledge of the results (auditory feedback) when pressing the piano keys. This test was done to assess pitch–key mapping ability and was performed before and after the 5 d piano-training period.
Three musical pieces were involved in this study. Subjects learned how to play only one musical piece (trained-music), whereas in addition, they listened to (to control for familiarity) but did not physically train with two other control musical pieces: (1) a musical piece composed of a completely different set of notes (F#, G#, B, C#, D#) than the one used in the trained-music (i.e., “untrained-different-notes-music”); and (2) a musical piece in which the same notes (F, G, A, Bb, C) were used in a different order to compose a new melody (i.e., untrained-same-notes-music). The auditory exposure time for all three musical pieces was equivalent. All musical pieces were novel and composed specifically for this study based on principles of western music; they were all of the same length (24 s, eight measures, 15 notes in total) and same tempo (80 beats per minute) and were played by piano accompanied by guitar, base, and drums. Short samples of the three musical pieces are shown in Figure 2B. The first three bars of the trained-music are shown in Figure 1A; for a fully orchestrated score and detailed description of software setup, see Lahav et al. (2005).
To ensure motionless listening, we trained subjects to listen passively to music in a simulated scanning position (supine, palms facing up, fixated wrist) and digitally monitored their finger movements. We used a specially designed motion-tracking system plus a passive-marker glove, implemented in the EyesWeb development environment (www.eyesweb.org). Subjects wore a cotton glove with red markers on the fingertips and listened to music in a simulated scanning position just before the fMRI procedure. A camera (Quickcam Pro 4000; Logitech, Fremont, CA), faced downward, detected possible changes in finger position (<1 mm), based on real-time computation of pixel color change, collecting 352 × 288 pixel frames at a rate up to 30 Hz in the RGB (red/green/blue) color space.
fMRI acquisition and data analysis
A 3T GE whole-body system (GE Medical Systems, Milwaukee, WI) was used to acquire a set of high-resolution T1-weighted anatomical images (voxel size, 0.93 × 0.93 × 1.5 mm) and functional magnetic resonance images using a gradient echo-planar T2* sequence sensitive to the blood oxygenation level-dependent contrast (voxel size, 2 × 2 × 4 mm). To reduce scanner noise artifacts and interference with the music, we used a sparse temporal sampling technique [repetition time (TR), 18 s], acquiring 28 axial slices in a cluster [acquisition time (TA), 1.75 s]. MR images were acquired after listening to 5, 6, 7, and 8 s musical passages. A total of 32 fully orchestrated short passages extracted from three musical pieces were presented in a counterbalanced block design, with a total of 108 sets of axial images acquired during nine functional runs (see Fig. 2B,C for details). During each run, we acquired 12 sets of 28 axial slices (eight listening scans and four rest scans; total run time, 234 s). To reduce the chance of movement artifacts, subjects lay supine with their eyes closed and their palms facing up (an “uninviting” playing position) and followed instructions to stay as still as possible. fMRI data were analyzed using the SPM99 software package (Institute of Neurology, London, UK). Each set of axial images for each subject was realigned to the first image, coregistered with the corresponding T1-weighted data set, spatially normalized to the T1 template, and smoothed with an isotropic Gaussian kernel (8 mm full-width at half-maximum). Subject and condition effects were estimated using a general linear model. Global differences in scan intensity were removed by scaling each scan in proportion to its global intensity, and low-frequency drifts were removed using the default temporal high-pass filter.
A behavioral control task during the fMRI procedure
To ensure attentive listening to music, we included a behavioral control task during the fMRI procedure. After listening to each musical passage, subjects heard a three-tone sequence and had to press a button with their left hand if these notes had appeared as a subsequence in the preceding musical passage they had heard. Approximately 50% of the time, the three-tone sequence was part of the musical passage heard before. We intentionally arranged the acquisition of images so that no acquisition would reflect the neural activity induced by this control task (for details of the task design, see Fig. 2C).
Action–sound training: learning the musical piece
In the first piano session, the time required to reach error-free performance with the musical piece was highly variable across subjects (mean, 29.11 min; SD, 5.53). The following sessions (2–5) were used to ensure mastery of the musical piece and to reduce performance variability between subjects during their fMRI scanning session (Fig. 2A). A plateau level of performance is notable in sessions 4–5, obtaining (and staying within) the minimum possible learning time (12 min) set by the software.
Results of the pitch-recognition–production (PRP) test indicate that as a byproduct of learning to play by ear, subjects also developed (albeit not to a perfection) a pitch–key mapping ability for the five pitches–keys used during piano training (i.e., the ability to recognize pitches and identify them in real time on the piano keyboard). Subjects significantly improved their PRP scores from 24% before the piano training period (which is around the level of chance) to 77% after training was completed.
Behavioral control during fMRI
The three-tone recognition task revealed similar performances across all three musical pieces. The motivation for implementing such a task was to make sure subjects are indeed listening and attending to the music (and not just hearing it in the background). The proportion of errors during this task did not differ across fMRI listening conditions, indicating that the differences in neural activation across music conditions were not caused by possible variations in the level of attention or auditory engagement with a given musical piece (Fig. 2D) [repeated-measure ANOVA with conditions (trained-music, untrained-same-notes music, untrained-different-notes-music) as within-subjects factors; F(2,16) = 1.536; p = 0.489].
Contrasting trained-music versus untrained-different-notes-music
All listening conditions (compared with rest) showed a similar activation pattern in primary and secondary auditory cortices bilaterally (Fig. 3A,B). However, when subjects listened to the trained-music (but not to the untrained-different-notes-music), activation was found in additional frontoparietal motor-related regions including, prominently, the posterior inferior frontal gyrus (IFG; Broca's region and Broca's homolog on the right; BA44, BA45) as well as the posterior middle premotor region and, to a lesser degree, the inferior parietal lobule (supramarginal gyrus and angular gyrus) bilaterally and cerebellar areas on the left (Fig. 3A,C) [trained-music > untrained-different-notes-music; p < 0.05, false discovery rate (FDR) corrected]. The complete lack of primary motor cortex (M1) activation in the hand/finger region during all conditions may be taken as evidence that subjects indeed did not physically move, as instructed (this was additionally verified by an observer in the scanning room).
Contrasting trained-music versus untrained-same-notes-music
To further investigate the tuning of the action-recognition system, we compared brain activity in subjects during listening to the trained-music versus the untrained-same-notes-music (in which the notes of the trained-music were arranged in a different order). Interestingly, we found significant activation in the left posterior premotor cortex, as well as in the posterior part of the IFG containing Broca's area and in its right-hemispheric homolog (Fig. 4B) (p < 0.05, FDR corrected). Yet, despite those differences, several premotor and parietal regions were still active bilaterally (although to a much lesser degree) even when subjects listened to the untrained-same-notes-music (Fig. 4A). In addition, a region-of-interest analysis of the pars opercularis of the IFG indicates significant pick activations on the left only when subjects listen to the trained-music, whereas the right IFG remained fairly active across listening conditions (Fig. 4C) [repeated-measure ANOVA shows a significant condition effect for the left IFG (F(2,16) = 10.324; p = 0.001) and no effect for the right IFG (F(2,16) = 0.026; p = 0.973)].
A unique aspect of the present study is the use of completely novel actions for studying auditory operations of the mirror neuron system. Previous EEG (Pizzamiglio et al., 2005; Hauk et al., 2006) and transcranial magnetic stimulation (Aziz-Zadeh et al., 2004) studies have mainly focused on the sound of over-experienced actions laying the groundwork for this field. The advantage, however, of training subjects with novel actions is twofold. First, it allows following the neural formation of auditory–motor learning, while eliminating possible influences from previous learning experiences. Second, it allows dealing with actions and sounds that are exclusively associated with a specific body part (here, the right hand). Such kind of control is almost impossible when investigating freestyle everyday actions that could potentially be bimanual (paper tearing) or mixed-handed (door knocking).
In addition, the use of music as an audible sequence of actions reveals important sequential aspects of the human action-recognition system. One should bear in mind that during the fMRI procedure, subjects did not listen to the musical piece in its entirety but to only short segments of it (5–8 s each) containing only part of the entire sequence of actions. The fact that listening to such short subsequences was still enough to activate an audiomotor recognition network suggests high levels of detection for action-related sounds (Fig. 3A,C). It is also striking that these audiomotor activation patterns are, in fact, within the core regions of the frontoparietal mirror neuron circuit previously found in humans in a variety of action-observation tasks (Grezes et al., 2003; Iacoboni et al., 2005; Lotze et al., 2006) including music observations, such as watching chord progression on the guitar (Buccino et al., 2004a) or finger-playing movements on the piano (Haslinger et al., 2005).
When subjects listened to the trained-music, a sound–action network became active but actions were not executed (Fig. 3A,C). However, when subjects listened to music they had never played before, they could not match it with existing action representations, and thus auditory activation was entirely dominant (Fig. 3B). These findings point to the heart of the action–listening mechanism and are in keeping with previous evidence for increased motor excitability (D'Ausilio et al., 2006) and premotor activity (Lotze et al., 2003; Bangert et al., 2006a) during listening to a rehearsed musical piece. Furthermore, our findings may be analogous to studies in the visuomotor domain, in which the mirror neuron system was involved only when the observed action was part of the observer's motor repertoire, such as in the case of dancers watching movements from their own dance style (but not from other styles) (Calvo-Merino et al., 2005) or in the case of humans watching biting actions (but not barking) (Buccino et al., 2004b).
It is particularly interesting that the posterior IFG, including Broca's area, was active only when subjects listened to the music they knew how to play. Even the presentation of music made of the same notes in an untrained order did not activate this area (Fig. 4B,C). These results reinforce the previous literature in two important ways. First, Broca's area is the human homolog of area F5 (ventral premotor cortex) in the monkey, where mirror neurons have been located (Rizzolatti and Arbib, 1998; Binkofski and Buccino, 2006). Our findings thus support the view that Broca's area is presumably a central region (“hub”) of the mirror neuron network (Iacoboni et al., 1999; Nishitani and Hari, 2000; Hamzei et al., 2003; Rizzolatti and Craighero, 2004; Nelissen et al., 2005), demonstrating here its multifunctional role in action listening. Second, mounting evidence suggest that in addition to its “classical” involvement in language, Broca's area functions also as a sensorimotor integrator (Binkofski and Buccino, 2004, 2006; Baumann et al., 2005), manual imitator (Krams et al., 1998; Heiser et al., 2003), supramodal sequence predictor (Maess et al., 2001; Kilner et al., 2004; Iacoboni et al., 2005), and internal simulator (Platel et al., 1997; Nishitani and Hari, 2000; Schubotz and von Cramon, 2004) of sequential actions. It is therefore possible that the activity in Broca's area during the action–listening condition reflects sequence-specific priming of action representations along with unconscious simulations and predictions as to the next action/sound to come. Such implicit motor predictions may be partially overlapped with functional operations of the mirror neuron system and thus could have hardly been made while listening to the never-learned musical pieces.
The significant activity in Broca's area during the action–listening condition strongly supports recent evidence for left-hemisphere dominance for action-related sounds (Pizzamiglio et al., 2005; Aziz-Zadeh et al., 2006). Region-of-interest analysis strongly confirms this laterality effect and also shows that the right IFG remained fairly active across all listening conditions (Fig. 4C). It is therefore likely that the observed activation in the right IFG may not fully reflect action representation per se but rather simply operations of music perception, consistent with the role of the right hemisphere in melodic/pitch processing (Zatorre et al., 1992). However, despite the left-hemispheric lateralization of the IFG, our data suggest that, at least with regard to the overall frontoparietal activation pattern, actions may be represented across the two hemispheres (Fig. 3A,C). The ipsilateral premotor activity, although seemingly illogical (because only the right hand was trained) is, in fact, in accordance with previous evidence for bilateral premotor representation of finger movements (Kim et al., 1993; Hanakawa et al., 2005), as well as with the general view that the mirror neuron system is bihemispheric in nature regardless of the laterality of the involved hand (Rizzolatti and Craighero, 2004; Molnar-Szakacs et al., 2005; Aziz-Zadeh et al., 2006). Support for such bilateral operation has mainly come from action-observation studies, whereas evidence from action listening is still quite limited. Having abstract representations of actions in both hemispheres may be essential for making actions more generalizable (Grafton et al., 2002; Vangheluwe et al., 2005) and potentially transferable from one limb to another (Rijntjes et al., 1999).
One may wonder what could be the reason for the premotor activity seen when subjects listened to the untrained-same-notes-music (Fig. 4A), especially because they had never played that piece. A possible explanation is that this premotor activity reflects the ability of subjects to link some of the notes they heard to the matching fingers and piano keys (77% matching accuracy in the PRP test; see Materials and Methods) (Fig. 5). Strikingly, this pitch–key linking mechanism seemed to be purely implicit because subjects were completely unaware (presumably because of their musical naivety) that the piece was composed of the same notes as the trained-music (as confirmed in post-fMRI interviews). As a result of this linking ability, the notes themselves appeared motorically familiar, which in turn was sufficient to activate a small action–sound circuit. This circuit supposedly demonstrates basic operations of audiomotor recognition at the level of one single action (finger press) and one sound (piano pitch). Nevertheless, this audiomotor recognition of individual notes, without the complete motor representation of the melodic sequence, was not enough to fully engage the hearing–doing mirror neuron system for action listening.
To what extent does action listening involve an implicit mental rehearsal component of the heard action? This may be somewhat difficult to answer based on the present study, because we have not included an explicit imagery condition. Yet, the lack of activation in the “classic” (although still controversial) motor imagery network [including the contralateral (left) M1/S1, the supplementary motor area, and the ipsilateral cerebellum] (Fig. 3A) argues against the possibility that subjects consciously imagined themselves playing the music (Grafton et al., 1996; Porro et al., 1996; Langheim et al., 2002). Furthermore, imagery was very unlikely, because our auditory control task during fMRI scanning sidetracked subjects from playing the music in their minds (as confirmed by post-fMRI interviews). Furthermore, evidence suggest that mental simulations and operations of the mirror neuron system might all be functionally related or possibly even a form of one another (Grezes and Decety, 2001; Patuzzo et al., 2003; Cisek and Kalaska, 2004). Additional studies are still needed to support this hypothesis.
Finally, what could be the functional advantage of having audiomotor recognition networks? It has been suggested that such audiomotor networks are essential for the acquisition of language, serving as a critical sensorimotor feedback loop during speech perception (Rizzolatti and Arbib, 1998; Theoret and Pascual-Leone, 2002). We suggest that, in addition, such networks may have been developed to protect the survival of all hearing organisms, allowing the understanding of actions even when they cannot be observed but can only be heard (for example, the sound of footsteps in the dark).
In conclusion, this study indicates that the human action-recognition system is highly sensitive to the individual's motor experience and has the tuning capabilities needed to discriminate between the sound of newly acquired actions and the sound of actions that are motorically unknown. Thus, acquiring actions that have an audible output quickly generates a functional neural link between the sound of those actions and the presumably corresponding motor representations. These findings serve as an important brain imaging addition to a growing body of research on the auditory properties of the mirror neuron system and may encourage additional investigations in the realm of action listening.
This work was supported in part by grants from the National Institutes of Health–National Institute of Neurological Disorders and Stroke (E.S. and G.S.) and the International Foundation for Music Research (G.S.) and by a Dudley Allen Sargent Research Grant (A.L.). We thank Adam Boulanger for computer music support, Marc Bangert for discussions at initial stages, Eugene Wong for help with graphic design, and Galit Lahav and Daniel Segre for insightful comments.
- Correspondence should be addressed to Amir Lahav, Department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215.