Abstract
Previous studies have demonstrated that when we observe somebody else executing an action many areas of our own motor systems are active. It has been argued that these motor activations are evidence that we motorically simulate observed actions; this motoric simulation may support various functions such as imitation and action understanding. However, whether motoric simulation is indeed the function of motor activations during action observation is controversial, due to inconsistency in findings. Previous studies have demonstrated dynamic modulations in motor activity when we execute actions. Therefore, if we do motorically simulate observed actions, our motor systems should also be modulated dynamically, and in a corresponding fashion, during action observation. Using magnetoencephalography, we recorded the cortical activity of human participants while they observed actions performed by another person. Here, we show that activity in the human motor system is indeed modulated dynamically during action observation. The finding that activity in the motor system is modulated dynamically when observing actions can explain why studies of action observation using functional magnetic resonance imaging have reported conflicting results, and is consistent with the hypothesis that we motorically simulate observed actions.
Introduction
Previous studies have demonstrated that when we observe somebody else executing an action many areas of our own motor systems are active. Functional magnetic resonance imaging (fMRI) studies have demonstrated such activations in ventral and dorsal premotor cortices, inferior parietal lobule, and primary motor cortex (Rizzolatti and Craighero, 2004; Morin and Grèzes, 2008; Gazzola and Keysers, 2009; Kilner et al., 2009a; Rizzolatti and Sinigaglia, 2010). Whereas there is no doubt that motor areas are active during action observation, there is uncertainty as to whether these activations are either in part or entirely due to mirror neurons. Mirror neurons have been found in areas F5 and PF of the macaque monkey and discharge when an action of the same type is either executed or observed (di Pellegrino et al., 1992; Gallese et al., 1996). As some of the areas in humans that are active during action observation are believed to be the human homologues of areas of the macaque monkey where mirror neurons have been found, this network is sometimes referred to as the mirror neuron system. However, given that the presence of mirror neurons in humans remains controversial, and that not all areas active have been shown to have mirror neurons, we will refer to this network as the action observation network (AON).
The vast majority of studies that have investigated the functional role of activity in the AON have used fMRI. As a result we know a lot about which areas of the human brain are active when we observe an action, but very little about how this activity changes across time. The current study was designed to address this question. Previous studies that have used magnetoencephalography (MEG) have demonstrated dynamic modulations in the power of oscillatory activity in the 15–30 Hz (β) range during action execution (Kilner et al., 2000, 2003a); effects originating in sensorimotor cortex, specifically primary motor cortex (Murthy and Fetz, 1992). For example, Kilner et al. (2000, 2003a) found that when participants moved a lever with their finger and thumb, β oscillations were attenuated when they were at the midpoints of action compared with when they were at the endpoints. Furthermore, studies using MEG and electroencephalography (EEG) have demonstrated that sensorimotor oscillatory activity in both the 8–12 Hz (μ) and β ranges (Cochin et al., 1998, 1999; Hari et al., 1998; Babiloni et al., 2002; Caetano et al., 2007; Kilner et al., 2009b) is attenuated when observing actions. However, it is not known whether the sensorimotor β oscillations are modulated dynamically during action observation, which would be consistent with the notion that we motorically simulate observed actions, and if they are modulated dynamically then what features of the observed action drive this modulation. To address these questions, the present study used MEG to measure β oscillations while participants watched videos of sinusoidal arm movements with a human or point form, and moving with human or constant velocity kinematics.
Materials and Methods
Participants
Fourteen paid healthy participants took part in this study (four male, mean age 22.5 years, range 18–29 years). All were right-handed, had normal or corrected-to-normal vision, were naive with respect to the purpose of the experiment, and gave informed consent. The experiment was performed with the approval of the ethics committee of University College London, and performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki.
Stimuli
Stimuli were generated by filming two models (one male and one female) executing sinusoidal up and down movements with their left or right arm, at 250 frames per second. There were four such stimuli (male left arm, male right arm, female left arm, female right arm). A black box was superimposed over the model's head, and a white fixation cross was added to the center of the videos. Videos started with a 1000 ms static of the first frame. The moving video then lasted for 5360–5480 ms, showing between 1.75 and 2.25 sinusoids of arm movement.
Offline the videos were edited to produce four different videos. These manipulations generated a 2 × 2 factorial design. The four videos were: (1) human BM (biological motion), created by selecting every 10th frame from these videos so as to preserve the biological velocity profile generated by the actors; (2) human CV (constant velocity), created by calculating the mean speed in the BM videos, according to the location of the index fingertip, and selecting frames such that the index fingertip moved at all times with this mean speed; (3) point BM, and (4) point CV stimuli, created by substituting the index fingertip with a round beige point, and presenting this single point on a dark background. The total luminance of the point videos was matched to the total luminance of the human videos, and the luminance of the beige point was matched to the luminance of the index fingertip. Four videos in four conditions generated 16 videos. Example frames from the human and point video types are shown in Figure 1.
Analysis periods. The endpoints of the actions were found by taking the points of minimum velocity, and the midpoints were found by taking the points of maximum velocity. Two endpoints and two midpoints were found for each video type. A 600 ms time period was taken around these endpoints and midpoints (300 ms either side).
Procedure
Participants were tested individually in a dimly lit room. They were positioned in the scanner with the computer screen ∼50 cm away from their face. They were given two response buttons, one to be held in their left hand and one to be held in their right hand. Videos were presented with a 2000–3000 ms (mean = 2500 ms) intertrial interval. A fixation cross remained on the screen and participants were asked to maintain fixation throughout the experiment. An infra-red eyetracker (Tracksys Ltd.) was used to ensure that participants maintained fixation on the cross.
To ensure that participants paid attention to the videos, on ∼10% of the videos a red or blue dot was superimposed on the index fingertip (human conditions) or point (point conditions) at 1480 or 5480 ms into the movement phase, with equal numbers of red and blue dots, and equal numbers of early and late presentations. The dot was superimposed for 1000 ms. On the trials where a red or blue dot appeared, a question screen appeared at the end of the trial asking the participant whether they had seen a red or blue dot, and telling them whether they should press the left button for a blue dot and the right button for a red dot, or vice versa. Button assignments were not known in advance so that participants could not prepare a movement, and the number of left button presses for blue and red dots was equal. All response trials were excluded from analysis.
There were 272 trials (240 test trials and 32 response trials). The test trials consisted of 15 repetitions of each of the 16 videos. There were two response trials for each video type. These trials were presented in a different pseudo-randomized order for each participant; the only constraint being that a video would not be presented twice in a row. These trials were split into eight blocks of 34 trials, and participants were permitted to rest between blocks. Before testing commenced, participants completed 10 practice trials to ensure that they were able to maintain fixation on the cross and could perform the task.
We wished to define regions of interest (ROIs) based on the conjunction of areas involved in action observation and execution. Therefore, at the end of the observation blocks, participants executed sinusoidal, up and down, actions with their left and right arm. Half of the participants executed actions with their left arm first and the other half executed actions with their right arm first. A board was inserted between their torso and arm such that they could not observe their arm actions. They were instructed to rest their elbow on the armrest, to ensure that their head did not move, and move their arm up and down taking ∼2 s for one complete cycle. Participants were told to perform this action continuously whenever a green fixation cross appeared on the screen and to hold their arm still whenever a red cross appeared on the screen. This fixation cross appeared at the same location as that in the observation condition, and the timing of red and green crosses reflected the trial and intertrial interval timing of the observation condition (green cross for 6450 ms, and red cross for 2000–3000 ms). Participants performed 20 trials (where one trial equaled a red cross followed by green cross) with each arm.
At the end of the experiment, participants observed a video from each of the four categories, six times. After each video had played, they were asked to rate a statement according to how much they agreed with it, on a scale of 0–25 (0 = least agreement, 25 = most agreement), to ascertain how human they perceived the movement to be. The statements were as follows. (1) The movement appeared purposeful and goal-directed; (2) The image appeared to be moving by itself rather than driven by something else; (3) The movement appeared to be active rather than passive; (4) The movement appeared to be natural; (5) The movement appeared to be human; and (6) The movement appeared to be computer generated.
MEG recording and data analysis
Recording and preprocessing
MEG was recorded using 275 third-order axial gradiometers with the Omega 275 CTF MEG system (VSM Medtech) at a sampling rate of 480 Hz. All MEG analyses were performed in SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK, www.fil.ion.ucl.ac.uk/spm). The data were epoched relative to the onset of the video clip, bandpass filtered at 1 and 45 Hz, and then downsampled to 100 Hz.
Sensor space analysis
Wavelet decomposition.
Quantification of the oscillatory activity was performed using a wavelet decomposition of the MEG signal. The wavelet decomposition was performed across a 1–45 Hz frequency range. The wavelet decomposition was performed for each trial, for each of the 275 sensors and for each participant. These time-frequency maps were averaged across trials of the same type (e.g., male left arm, human BM). The maps were subsequently log10 transformed to normalize, and averaged over 15–30 Hz, producing a single β power time course for each sensor for each participant, for each trial type.
Analyses averaged over time range.
The time course was averaged from 500 to 4500 ms after the onset of the movement phase. This time window was chosen to capture β modulations during a period of movement observation that did not contain possible confounds of event related fields associated with the onset or offset of the observed movement. This time window was compared against a baseline at the start of each trial where no movement was observed, averaged from 500 ms before the static appeared to 500 ms after the onset of the static. This analysis produced one value per sensor, per participant, for each trial type. Trials in the same condition (human BM, human CV, point BM, and point CV) were averaged. For each participant and for each condition, 2D sensor space maps of these data were calculated, and then smoothed using a Gaussian kernel [full-width half-maximum (FWHM), 20 mm] (Kilner and Friston, 2010). Analyses of β power during action execution were performed in a similar way.
Analyses of dynamic effects.
As the movements made by the different actors differed in the period of the sinusoidal movement, the data had to be aligned before further analysis so that modulations in the kinematics of the observed action were coincident for the different videos. Two endpoints and two midpoints were defined for each video type that occurred in the central part of the videos. The endpoints were defined as points of minimum absolute velocity and the midpoints were defined as the points of maximum absolute velocity. The velocity varied slightly according to the video observed, but the issue of importance is that it was always higher at midpoints than at endpoints. These time points were defined according to the BM videos and applied to both the BM and CV videos. Although the CV by definition does not have a maximum or minimum absolute velocity we cut the CV videos around the same points as the BM to control for a general effect of any modulations that might occur as a result of time during the movement. A 600 ms time period was taken around these endpoints and midpoints (300 ms either side; Fig. 1). For “averaged over ROI” analyses, averages were computed for the 300 ms before the endpoints, the 300 ms after the endpoints, the 300 ms before the midpoints, and the 300 ms after the midpoints. This analysis produced four values per sensor, per participant, for each trial type. Trials in the same condition (human BM, human CV, point BM, and point CV) were averaged. For each participant and for each condition, 2D sensor space maps of these data were calculated, and then smoothed using a Gaussian kernel (FWHM, 20 mm in space and 120 mm in time).
ROI.
A spatial ROI was defined by calculating the conjunction of the areas with lower β power in the observation conditions and the execution condition in the analysis averaged over time, relative to their baselines, at t > 4.72. This ROI was necessary because it is a prerequisite of a system supporting motor stimulation of observed action that it should be active during both action observation and action execution.
Contrasts of all images were taken to the second level with a design matrix including a participant-specific regressor to remove global differences in power between participants.
Source space analysis
A beamformer technique was implemented (e.g., Van Veen et al., 1997; Robinson and Vrba, 1999; Gross et al., 2001), based on areas with lower β power in the observation conditions and the execution condition, relative to baseline. Given that the beamformer analysis required time windows of equal durations, the baseline period of 1000 ms was compared against a randomly chosen 1000 ms during observation and execution (1500 ms–2500 ms after movement onset). Once the source had been estimated, a single time course for this derived source was calculated by weighted combination of the sensors contributing to it, across the whole time range. The wavelet decomposition, analyses averaged over time range, and dynamic analyses, were then conducted in the same way as the analyses performed in sensor space.
Results
Unless otherwise stated, statistical tests were corrected for family-wise error rate (FWE). Initial nondynamic analyses are corrected across all sensors, and all subsequent analyses are corrected across the ROI defined on the basis of the nondynamic analyses.
Nondynamic effects
Before proceeding with an analysis of any dynamic modulation of β power during action observation, we performed a preliminary analysis to confirm that we could reproduce the previous finding that β power over central sensors is attenuated during action observation. When averaged across the four conditions, β power was significantly attenuated during action observation compared with baseline, over central sensors (see Materials and Methods, Sensor space analysis) (Fig. 2A; peak voxel: t(13) = 6.04, p < 0.05). The key property of the action observation network is that it is similarly modulated during action observation and action execution. Therefore, it is important to show an overlap in the analysis for both action execution and observation. To this end we performed the same analysis for action execution. Relative to baseline, the β power during action execution showed a similar pattern of attenuation in sensor space as did action observation (Fig. 2B; peak voxel: t(13) = 6.74, p < 0.05). A conjunction of the two contrasts for action observation and action execution showed significant overlap in sensor space of the location of β power attenuation (Fig. 2C; peak voxel: t(13) = 5.5, p < 0.05), over central sensors. This is consistent with previous studies that have reported that the same motor areas are recruited during action observation and execution (Rizzolatti et al., 1996; Buccino et al., 2001; Grèzes and Decety, 2001; Gazzola and Keysers, 2009; Kilner et al., 2009).
Nondynamic effects of observation and execution. A, T and contrast sensor space statistical parametric maps of areas where the β power when observing action averaged over all four conditions (human BM, human CV, point BM, point CV) is lower than baseline, averaged over the time range of the trial. T maps represent the t-statistic at each sensor, and contrast maps represent the mean difference in power. B, T and contrast sensor space statistical parametric maps of areas where the β power when executing action is lower than baseline, averaged over the time range of the trial. C, T sensor space statistical parametric map of areas where the β power when observing action, and executing action, is lower than baseline, averaged over the time range of the trial. All maps are thresholded at t > 4.72.
Subsequent analysis of the β power averaged across the period of action observation showed that in sensor space there was only a significant main effect of form (whether a human or point stimulus was observed) (supplemental Fig. 1, available at www.jneurosci.org as supplemental material; peak voxel: t(13) = 6.5, p = 0.001). The main effect of kinematics (whether the velocity profile was BM or CV) and the interaction between form and kinematics were not significant anywhere in sensor space (all t < 2.7 all p > 0.3). The significant main effect of form showed a greater attenuation of β power at posterior sensors (supplemental Fig. 1, available at www.jneurosci.org as supplemental material) when observing human relative to point form videos, and did not appear to overlap substantially with the pattern of β attenuation found when both observing and executing action, relative to baseline. To test this we defined an ROI in sensor space based on the conjunction analysis of the sensor maps of β attenuation during action observation and execution (Fig. 2C). Averaged across our ROI in sensor space, there were no significant main effects of form (F(1,13) = 3.0, p = 0.1) or kinematics (F(1,13) = 1.8, p = 0.2), and no significant interaction between form and kinematics (F(1,13) = 4.3, p = 0.06). The significant main effect of form seen in the sensor space analysis most likely reflects the vast difference in the visual appearance of the two sets of stimuli, one a point and the other a human form. As these modulations lie away from motor areas we will not consider them further here.
Dynamic effects of observation
All subsequent analysis in sensor space focused on dynamic modulations in power during action observation. To this end, two endpoints and two midpoints were defined for each trial type, as described above. This resulted in a 2 × 2 × 2 factorial design where the factors were form (human or point), kinematics (BM or CV) and velocity (maximum velocity or minimum velocity). Conducting this repeated measures 2 × 2 × 2 ANOVA in sensor space and time, within the observation-execution conjunction mask (Fig. 2C), at a p < 0.001 uncorrected threshold, revealed an interaction between kinematics and velocity (peak voxel: t(1,13) = 3.9, p = 0.001 uncorrected, p = 0.2 FWE). The pattern of this effect in sensor space was consistent with activity in the sensorimotor cortex and occurred 240 ms before the time of minimum/maximum velocity (Fig. 3A–C). This interaction was generated by an effect of velocity for BM videos (t(13) = 3.1, p = 0.007), such that there was lower β power before the point of maximum velocity, relative to minimum velocity, but no effect for CV videos (t(13) = 1.3, p = 0.6). This shows that the β oscillations are modulated by the kinematics of the observed action. This modulation cannot simply be attributed to the phase of the observed action, namely whether it was a turning point or a straight movement, as there was no such modulation for the CV condition.
Dynamic effects of observation: sensor space analysis. A, T sensor space statistical parametric map of the interaction between velocity (minimum vs maximum) and kinematics (BM vs CV), at 240 ms before the point of maximum or minimum velocity. The map is thresholded at t > 3.01, and is masked by the observation and execution conjunction mask in Figure 2C. B, The t values for the 600 ms time window (−300 to 300 ms) for the peak voxel for this interaction (marked by the crosshair in A). C, The mean velocity across the 600 ms time window for the minimum and maximum velocity segments, averaged across all four videos. D, The averaged β power in the 300 ms before the point of minimum velocity (min) and the point of maximum velocity (max), for BM and CV videos.
To further investigate this effect we conducted an analysis averaged over ROI. The sensor-time maps analyzed above were first averaged across the spatial ROI described previously (Fig. 2C) and subsequently averaged across two windows, one from −300 to 0 ms (“pre”) and the second from 0 to 300 ms (“post”). This now formed a 2 × 2 × 2 × 2 ANOVA where the factors were form (human or point), kinematics (BM or CV), velocity (maximum velocity or minimum velocity) and time (pre or post). This analysis revealed a three-way interaction between kinematics, velocity and time (F(1,13) = 4.4, p = 0.05). This effect did not interact with the form of the stimulus (F(1,13) < 1). This interaction was generated by the presence of a velocity × time interaction for BM videos (F(1,13) = 11.9, p < 0.005) but no such interaction for CV videos (F(1,13) = 0.2, p = 0.7). For BM videos, there was an effect of velocity in the 300 ms before the point of minimum/maximum velocity (F(1,13) = 5.3, p < 0.05), such that there was lower β power in the 300 ms before the point of maximum velocity relative to minimum velocity, but not in the 300 ms after (F(1,13) = 1.4, p = 0.3; Fig. 3D).
Here we have demonstrated, both in a peak voxel analysis, and averaged over ROI, that: (1) β power is modulated dynamically during action observation; (2) the pattern of this dynamic modulation is dependent upon the kinematics of the observed action; and (3), this pattern temporally predicts the dynamics that would be expected if executing the observed action. However, all of these effects were observed using a sensor space analysis. Although the spatial patterns are not inconsistent with generators in sensorimotor cortices we cannot be certain that the modulations observed reflect sensorimotor activity (see Kilner and Friston, 2010). We have used an axial gradiometer MEG, which means that one should not interpret the peaks in the β power map as overlying the sources of activity (in fact, these peaks should lie away from the underlying source). To address whether modulations are found in sensorimotor cortex, we repeated the same analysis in source space.
Dynamic effects in source space
We performed two beamformer analyses; one revealing areas with lower β power when observing action relative to baseline, and the other revealing areas with lower β power when executing action. The conjunction of these two analyses revealed a source in the hand/arm area of sensorimotor cortex, with its peak at [−40.9, −29.0, 58.8], corresponding to the left postcentral gyrus (Fig. 4). This is consistent with a previous MEG study that found stronger effects in the left hemisphere regardless of whether the observed action was a left or right arm movement (Kilner et al., 2009b). This source analysis therefore also provides further evidence that action observation, like execution, activates sensorimotor cortex. The estimated time course of this source was used in all subsequent source analyses.
The conjunction of the sources identified as driving lower β power both in action observation and execution conditions, relative to baseline, in Brodmann area 4, on the basis of a beamformer analysis, thresholded at t > 3.63. The source identified as corresponding to the hand/arm area in sensorimotor cortex, with its peak in the left postcentral gyrus (coordinates = [−40.9, −29.0, 58.8]), is marked with a crosshair.
Similarly to analyses in sensor space, there were no main effects of form or kinematics, and no interaction, when analyses were performed in source space at the left postcentral gyrus (all F(1,13) < 1, all p > 0.45).
The dynamic analysis replicated the sensor space findings. The analysis across the entire 600 ms time window revealed a two-way interaction between kinematics and velocity at 210 ms before the point of minimum/maximum velocity (t(1,13) = 2.8, p < 0.04; Fig. 5A–C). This interaction was generated by the presence of an effect of velocity for BM videos (t(13) = 3.9, p = 0.001), such that there was lower β power before points of maximum velocity relative to minimum velocity, but not for CV videos (t(13) = 0.2, p = 0.4).
Dynamic effects of observation: source space analysis. A, T statistical parametric map of the interaction between velocity (minimum vs maximum) and kinematics (BM vs CV), across time, for the 600 ms time window (−300 to 300 ms), and across frequency, for 1–45 Hz, at the left postcentral gyrus source. The map is thresholded at t > 1.96. B, The t values for the power averaged across the β band for the 600 ms time window (−300 to 300 ms) for this source. C, The mean velocity across the 600 ms time window for the minimum and maximum velocity segments, averaged across all four videos. D, The averaged β power in the 300 ms before the point of minimum velocity (min) and the point of maximum velocity (max), for BM and CV videos.
The ROI analysis, averaged across the pre and post time windows, demonstrated the same effect, such that there was a three-way interaction between kinematics, velocity and time (F(1,13) = 18.6, p = 0.001), which did not interact with form (F(1,13) < 1). Again, this interaction was generated by the presence of a velocity × time interaction for BM videos (F(1,13) = 22.4, p < 0.001) but no such interaction for CV videos (F(1,13) = 2.3, p = 0.2). For BM videos, there was an effect of velocity in the 300 ms before the point of minimum/maximum velocity (F(1,13) = 16.5, p = 0.001), such that there was lower β power in the 300 ms before a point of maximum velocity relative to minimum velocity, but not in the 300 ms after (F(1,13) = 1.4, p = 0.3; Fig. 5D).
Statement ratings
The mean ratings of the statements at the end of the experiment were entered into an ANOVA (with the responses to question 6 inverted, such that a higher numerical response indicated that participants thought it was more human), with factors of form and kinematics. This ANOVA indicated a main effect of kinematics (F(1,13) = 6.4, p < 0.03), and a borderline effect of form (F(1,13) = 4.5, p = 0.054). There was no form × kinematics interaction (F < 1). Participants rated the human BM videos as most human (mean = 15.6, SEM = 1.1), the human CV (mean = 11.8, SEM = 1.4) and point BM (mean = 11.5, SEM = 1.6) videos as next most human, and the point CV videos as least human (mean = 8.8, SEM = 1.2).
Discussion
In the present study we tested the hypothesis that activity in the sensorimotor cortex is modulated dynamically during action observation in a similar way to that previously observed during action execution. MEG research has established a dynamic modulation in power of sensorimotor β oscillations during action execution over central sensors, with β oscillations greater at the endpoints of an executed action than the midpoints (Kilner et al., 2000, 2003a). Furthermore, neuronal activity in the primary motor cortex of the macaque monkey is modulated dynamically by the kinematics of the executed action (Stark et al., 2007). We reasoned that if sensorimotor activations found when observing actions reflect motoric simulation of that action then a similar dynamic modulation of β oscillations should be observed. This is precisely what we found: the results demonstrated that oscillatory activity generated in the sensorimotor cortex in the β range is significantly attenuated during action observation, confirming previous findings from MEG and EEG research, and this activity was modulated dynamically according to the phase of the observed action. These effects were found both in sensor and source space.
Observation of correspondence of the dynamics of sensorimotor cortex activation when observing and executing actions is consistent with what one would expect if there was an MNS in humans. Mirror neurons, in the ventral premotor area F5 of macaque monkeys (di Pellegrino et al., 1992; Gallese et al., 1996; Rizzolatti and Luppino, 2001; Umiltà et al., 2001) and inferior parietal lobule, area PF (Gallese et al., 2002; Fogassi et al., 2005), discharge not only when the monkey executes an action of a certain type (e.g., pincer grip), but also when it observes the experimenter performing that action. A number of neuroimaging studies have found that homologous areas in the human brain are similarly active when observing and executing actions (Rizzolatti et al., 1996; Buccino et al., 2001; Grèzes and Decety, 2001; Gazzola and Keysers, 2009; Kilner et al., 2009a), and therefore have claimed that an MNS also exists in humans. However, over a decade since their discovery in the macaque, there is still controversy over whether humans have such a system (Dinstein et al., 2008; Kilner et al., 2009a; Lingnau et al., 2009). Here we report a dynamic modulation in β power during action observation that has its source in sensorimotor cortex; not one of the areas that is usually considered part of the human MNS. There are two ways in which the present findings can be considered consistent with the hypothesis of an MNS in humans. First, it has been argued that, given the anatomical connection between premotor cortex and sensorimotor cortex (Matelli et al., 1986; Dum and Strick, 2005), sensorimotor cortex is activated postsynaptically during periods of action observation, and therefore that the attenuation of the β oscillations during action observation is likely to have resulted from activation of an MNS in premotor cortex (Rizzolatti and Craighero, 2004). Second, mirror neurons have recently been reported in primary motor cortex (Dushanova and Donoghue, 2010) and as a result there is an argument that sensorimotor cortex may now be considered an intrinsic part of the MNS [it seems likely that if there are indeed mirror neurons in multiple areas (see also Mukamel et al., 2010), neurons in the different areas are mirroring distinct aspects of the observed actions].
Here we have shown that the amplitude of β power was modulated when observing movements with biological kinematics. The β power was lowest ∼200–250 ms before a midpoint relative to an endpoint. In other words, the modulations in β power across time did not coincide with the endpoints and midpoints of the observed action but preceded them. This is not what has been observed during action execution. In previous MEG studies, Kilner et al. (2000; 2003a) have found that β power was modulated such that it was attenuated during periods of maximum velocity and maximal during periods of zero velocity. In these studies, the maximal changes in β power occurred at, or slightly after, the endpoints and midpoints of the action. This finding that activity in M1 is modulated by the velocity of the executed action is supported by single cell studies of neurons in M1. Neurons in M1 increase their firing rate when the monkey moves with a faster, rather than slower, velocity (Stark et al., 2007). Therefore, the modulations in β power that we have reported here during action observation precede those that would be expected if one were executing the action. Although speculative, this pattern would fit with recent models of the AON that have suggested that activity in these regions is predictive. These include models that are based on active inference (Kilner et al., 2007a,b); models that employ forward modeling (Miall, 2003; Wolpert et al., 2003); models that explicitly claim a prospective prediction (Stadler et al., 2010), and those where activity reflects a learned sequence of visuomotor associations (Heyes, 2001, 2010), whereby observation of an action can activate visual and motor representations of the subsequent element in a learned sequence (Hollis, 1984; Bird and Heyes, 2005).
The dynamic modulation of motor activity was only found when participants observed actions where the arm moved with naturalistic kinematics; not when it moved with constant velocity. This may suggest that the dynamic effects that we see are driven by differences in sensorimotor activation simply when observing fast and slow movements. Alternatively, it may suggest, in line with several previous studies, that observing human action activates motor codes to a greater extent than observing nonbiological motion (Kilner et al., 2003b; Tai et al., 2004; Press et al., 2005). Specific sensitivity to the kinematic information is consistent with findings in two previous fMRI studies (Dayan et al., 2007; Casile et al., 2010). These show that motor structures such as the dorsal premotor cortex are activated more when observing movements that obey the two-thirds power law, that is, movements that slow down at curved relative to straight parts of motion (Lacquaniti et al., 1983), than movements with the inverted kinematic profile. The present study adds to this literature by indicating that observing human action activates motor cortical representations to a greater extent than nonbiological movement, in a manner corresponding dynamically to that which would expected if executing the action.
The fact that, within overlapping observation-execution areas, there was no influence of form (human or point), suggests that it may be only the kinematics that determine the degree to which the motor system will process an observed action. Giese and Poggio (2003) have modeled processing of form and kinematic biological motion information in separate pathways, but acknowledge that the two pathways are likely to interact at several levels. If only the kinematics and not the form of the observed action influence motor activations, it is likely that there is little interaction between pathways before visual information feeds into motor areas (cf. J. Kilner et al., 2007). Vangeneugden et al. (2009) found evidence of separate processing of form and kinematic information in the superior temporal sulcus (STS) in the macaque monkey, which is in line with the hypothesis of separate processing in the motor system, given that the STS is known to feed into motor structures investigated in the present study (Keysers and Perrett, 2004).
Despite the number of studies that have found greater motor activations when observing human action relative to nonbiological motion, some studies have found no such biological specificity. For example, in an fMRI study, Gazzola et al. (2007) found that motor structures such as the inferior frontal gyrus were activated equally when observing humans and industrial robots performing arm actions. In fact, if considering the analyses in the present study where we averaged cortical activations over the time period of action observation, we also found no evidence of biological specificity. The differences only emerged when analyzing changes in cortical activation over time. Such dynamic analyses therefore appear to provide greater sensitivity for investigating specificity of these AONs, and therefore could provide a useful tool for exploration of other questions concerning this system in the future.
The present study found evidence that observation of action elicits changes in sensorimotor activation across time, according to the phase of the movement that is being observed. These changes are in line with those that would be expected if one were executing the observed action, indicating that observing action is automatically activating motor programs required for its execution. These effects were driven by the kinematics of the observed actions, as they were only present when the actor moved with biological kinematics and not constant velocity.
Footnotes
This research was supported by an Interdisciplinary Postdoctoral Fellowship awarded to C.P. by the Medical Research Council and the Economic and Social Research Council, and a Research Career Development Fellowship awarded to J.K. by the Wellcome Trust. J.C. is supported by a Wellcome Trust four year PhD in neuroscience. S.J.B. is supported by the Royal Society. We are grateful to Daniel Wolpert and James Ingram for allowing us to use their camera for filming and for help with the filming equipment, to Gareth Barnes for help with the source space analyses, and to David Bradbury for technical assistance. We also thank Chris Frith and Karl Friston for comments on the design and analysis.
- Correspondence should be addressed to Clare Press, School of Psychology and Clinical Language Sciences, University of Reading, Whiteknights RG6 6AL, UK. c.m.press{at}reading.ac.uk