Abstract
Recent studies have highlighted cognitive and neural similarities between planning and perceiving actions. Given that action planning involves a simulation of potential action plans that depends on the actor's body posture, we reasoned that perceiving actions may also be influenced by one's body posture. Here, we test whether and how this influence occurs by measuring behavioral and cerebral (fMRI) responses in human participants predicting goals of observed actions, while manipulating postural congruency between their own body posture and postures of the observed agents. Behaviorally, predicting action goals is facilitated when the body posture of the observer matches the posture achieved by the observed agent at the end of his action (action's goal posture). Cerebrally, this perceptual postural congruency effect modulates activity in a portion of the left intraparietal sulcus that has previously been shown to be involved in updating neural representations of one's own limb posture during action planning. This intraparietal area showed stronger responses when the goal posture of the observed action did not match the current body posture of the observer. These results add two novel elements to the notion that perceiving actions relies on the same predictive mechanism as planning actions. First, the predictions implemented by this mechanism are based on the current physical configuration of the body. Second, during both action planning and action observation, these predictions pertain to the goal state of the action.
Introduction
Several studies have suggested that perception of others' actions engages the observer's motor system (Cattaneo et al., 2011; Press et al., 2011). More precisely, observed movements are thought to be simulated internally via forward models (Jeannerod, 2001; Oztop et al., 2005). Forward models are also computed during production of movements, imagined or actual (Shadmehr and Krakauer, 2008), and those computations are modulated by the spatial relationship between current and intended body posture of an action, the latter being the action-goal posture, which is the body posture occurring when the action goal is achieved (Shenton et al., 2004; de Lange et al., 2006; Lorey et al., 2009; Ionta et al., 2012; Zimmermann et al., 2012). This modulation can be seen as an instance of the end-state comfort principle, according to which action plans are hierarchically organized around temporally distal goals and goal postures (Rosenbaum et al., 1995; Homnmel, 2003; Grafton and Hamilton, 2007; Kilner et al., 2007). Here we test whether action perception also follows this principle, considering the relation between an observer's body posture and the action-goal posture.
Suggestive evidence for the general idea that the state of the observer's body influences action observation comes from a study showing that chronically deafferented patients are impaired in inferring motoric expectations of an actor (Bosbach et al., 2005). This suggests that lack of somatosensory information of one's own body influences perception of others' actions. However, chronically deafferented patients might experience substantial functional reorganization (Chen et al., 2002), and it remains unclear whether body posture influences action observation through cerebral regions involved in action planning and state estimation. Recently, Ambrosini et al. (2012) showed that having one's hand tied behind one's back impairs proactive eye movements during action observation. Others did not find any influence of body posture on action observation, either on behavior (Fischer, 2005) or cerebral motor structures (Lorey et al., 2009), making it unclear whether and at which level of the action hierarchy the observer's body posture might influence action perception.
Here we assess whether and how the body posture of an observer influences action perception. Participants predicted the goal state of visually presented actions, while their cerebral activity was monitored with fMRI and their right arm was either pronated or supinated. The visually presented actions showed an actor grasping a bar with a pronated or supinated right arm, using either a rotation or a translation movement to move the bar. This procedure allowed us to disentangle the effects of participants' own arm posture on the perception of actions across different goal postures and biomechanical complexities of the observed actions. We expected that action perception would be facilitated when participants' body posture matches the actions' goal posture, and that this modulatory effect would be supported by cerebral regions generating state estimates of one's own body using proprioceptive or visual information [i.e., portions of the intraparietal sulcus (IPS) and the extrastriate body area (EBA)] (Wolpert et al., 1998; Homnmel, 2003; Pellijeff et al., 2006; Urgesi et al., 2007; Desmurget and Sirigu, 2009; Parkinson et al., 2010).
Materials and Methods
Participants.
Twenty-nine healthy, naive participants [17 female; age, 24.1 ± 3.9 (mean ± SD) years] participated after giving informed consent according to institutional guidelines (Commissie Mensgebonden Onderzoek region Arnhem-Nijmegen, The Netherlands) for payment of 10 €/h or course credit. All participants were consistent right-handers and had normal or corrected-to-normal vision. Two participants were excluded from the analysis due to technical problems with the MR imaging system. Three participants were excluded because of poor behavioral performance (showing error rates and/or reaction times that were >2.5 SDs larger than the group mean). The remaining 24 participants (13 female; age, 24.3 ± 3.9 years) were included in the analyses.
Experimental paradigms.
The experiment consisted of three parts completed in a fixed order, spread over two sessions. A bar-grasping task (see below for task descriptions) was performed during the first session only. An action-prediction task was performed during both sessions. To collect behavioral data, the first session took place in a dummy MR scanner identical in appearance to a real MR scanner. Several days later (average, 3.2 d) the action-prediction task was performed in a functional MR scanner.
Bar-grasping task.
The purpose of the bar-grasping task was to familiarize participants with the actions they were about to observe in the prediction tasks later on. The participants were seated at a table with three cradles positioned next to each other at 5 cm distance between adjacent cradles. Participants were instructed to use a power grip to grasp the bar (length, 25 cm; diameter, 2.5 cm; one end black, one end white), positioned horizontally on the middle cradle, and place it on either the left or right cradle according to instructions presented on a screen. Instructions involved both a direction (i.e., whether to place the bar on the left or right cradle) and a goal orientation of the bar (i.e., where the white and black ends of the bar should point).
Some actions required a translation of the bar from the middle cradle to the left or right cradle (16 trials). Other actions required an additional clockwise or counterclockwise rotation of the bar by 90° (16 trials) or 180° (16 trials). All actions were performed using the right hand, and participants were free to choose whether to use an overhand or underhand power grip when grasping the bar. Task duration was ∼15 min.
Action-prediction task.
Participants performed the action-prediction task both outside and inside the MR environment. First, they performed the task in a dummy MR scanner, where we collected behavioral data concerning their predictions on the observed actions. In the second session, the participants performed an adapted version of the task in the MR scanner, where we measured BOLD responses. Below we describe the task in general, followed by a description of the aspects that differed between the two versions of the task.
In the action-prediction tasks (Fig. 1), participants watched short videos of actions while they were asked to predict the goal state of the observed actions as quickly as possible. The stimulus videos lasted ∼2 s. In each video, an actor sitting at a table grasped and moved a bar with his right hand to one of the two cradles. Each video started with a static image of the actor in a rest position with his right hand on the table and his left hand out of view (below the table, on the actor's lap). After a variable delay (250–500 ms), the video started, showing the actor moving his right arm to grasp the bar with either an overhand or an underhand grip. Subsequently, the actor moved the bar to the left or right cradle, using either a rotation or translation movement. It has been shown that participants choose between different grip configurations depending on the action goal (Table 1) (Zimmermann et al., 2012). Namely, translation actions are more likely to be executed with an overhand grip; rotation actions to the left (actor's perspective) are more likely to be performed with an underhand grip of the bar; and rotation actions to the right are more likely to be performed with an overhand grip of the bar. We refer to these action preferences as low-frequency and high-frequency grip strategies. The set of videos used in this study displayed combinations of rest posture (overhand, underhand), grasp posture (overhand, underhand), initial bar orientations (black end on the left or on the right), movement direction (left, right), and action types (translation, rotation) in an equiprobable distribution, including both highly frequent and less frequent grip strategies. Time until the bar was grasped (∼800 ms) and total duration of the grasping movement (∼1600 ms) were standardized across trials. Videos were stopped when the goal was achieved (i.e., the actor's hand rested on the bar in its final configuration) and the last frame was shown until 2 s from video onset were elapsed.
Action prediction task. A, B, D, E, In each trial, participants were shown videos of an action (A, D: eight representative still frames) that involved either a bar translation (schematically illustrated in B) or a bar rotation (E). C, F, Participants were lying in a scanner while the spatial relation between the posture of their right hand and the start/goal posture of the observed action was manipulated. Participants used their left hand to indicate their prediction of the goal state of the observed action when required (100% of the trials during the behavioral session, 10% of the trials during the fMRI session).
Frequencies of grip strategies
Participants were asked to predict the goal state of each observed action as quickly as possible. Goal state was defined as the final orientation of the bar on the cradle to which the actor moved the bar. Therefore, for each trial there were four possible goal states (black bar end “pointing” up, left, down, or right). Participants indicated their decision using one of four buttons on a button box held in their left hand. Each button was assigned to one final state, defined as the bar orientation on the target cradle (white end pointing up, left, down, or right), irrespectively of the movement used to achieve that final state. The mapping between final states and buttons was constant throughout the experiment. The mapping was displayed during practice and during breaks between trials.
During the task, we manipulated the arm posture of each participant's right arm (Fig. 1). Participants could either have their arm in a prone posture (i.e., palm down), or in a supine posture (i.e., palm up), lying to the right side of their body on the scanner table. Posture was changed after every block of nine trials. The posture manipulation resulted in different patterns of congruency between participants' own arm posture and the observed arm posture(s) in the videos. During translation trials, participants' posture could either be “overall congruent” or “overall incongruent” with the observed action (because start posture and goal posture are the same for these actions). During rotation trials, the participant's posture could either be in a “goal-posture congruent” state or in a “goal-posture incongruent” state. After each arm posture change instruction, there was a short break (5 s) to allow for arm repositioning.
Participants engaged in a total of 432 trials. On average, ∼11% of trials were filler trials. In these trials, the bar was placed vertically rather than horizontally on a cradle. These trials, in which the bar was rotated 90° (instead of 0° or 180° as occurring during the experimental conditions), were introduced to increase the number of possible observed movements and reduce predictability. These filler trials were excluded from subsequent analyses. Of the remaining trials (N = 372), half were translation trials (N = 186). The other half were rotation trials. In each group, half of the trials (N = 93) were goal-posture congruent (rotation trials) or overall congruent (translation trials), and half were goal-posture/overall incongruent. Sessions were divided into six blocks of 72 trials each, with self-paced rest breaks between blocks. Trials were presented in pseudorandom order, such that each block consisted of the same number of trials of each condition, and the same action was not presented twice in a row.
The goal of the prediction task performed in the dummy scanner was to examine whether postural congruency affected decision speed on the prediction task. Therefore, in this session, participants were asked to give a response as quickly as possible after they inferred the action goal. The video was stopped when the subjects pressed the button to indicate that they could predict the action goal. The intertrial interval (ITI) varied between 0.5 and 1 s. Before the task, participants practiced the task until they could correctly predict 8 of 10 consecutive trials, with a reaction time <2 s. The behavioral session lasted ∼40 min.
During functional imaging, we were interested in how postural congruency affected neural responses during the prediction task, while avoiding any motor preparation processes related to responding (i.e., button presses). Therefore, participants were probed to respond only to a small number of “catch” trials (10% of all trials), during which the stimulus video was replaced by a green exclamation mark at an unpredictable moment during the video, between 1000 and 1500 ms after stimulus onset. Participants then, using one of four buttons, had to choose the likely goal of the observed action. These catch trials (as well as other trials where participants mistakenly pressed a button) were modeled separately in the fMRI analysis. The ITI varied between 2 and 4 s. Before the fMRI session, participants engaged in a number of practice trials until they could correctly respond to 8 of 10 consecutive catch trials within 2 s. During the neuroimaging session, eye movements were measured using an MR-compatible infrared camera (MRI-LR, SensoMotoric Instruments). Muscle activity of participants' right forearms (approximately above musculus pronator teres and musculus supinator, to optimally detect pronosupination of the forearm) was measured using an MR-compatible EMG system (Brain Products) and silver/silver-chloride (Ag/AgCl) electrodes (Easycap). The fMRI session lasted ∼55 min.
EBA localizer task.
As detailed in the introduction, we wanted to test for the presence of posture congruency effects in the EBA. To functionally localize the EBA we used a set of previously validated stimuli (http://pages.bangor.ac.uk/∼pss811/page7/page7.html). This set consisted of 20 pictures of human bodies without heads and 20 pictures of chairs. Stimuli were presented in an alternating blocked design with stimuli presentation time of 300 ms on and 450 ms off, and 20 stimuli per block. Two stimuli of each block were presented twice in succession. Participants were instructed to detect stimulus repetitions (1-back task) to ensure attention to the stimuli. To prevent low-level adaptation, the location of each stimulus on the screen was slightly shifted at random. The functional localizer took ∼10 min and was administered after the prediction task was completed.
Analysis of behavioral data.
We obtained the time required to predict the goal state of observed actions [prediction time, (PT)] and error rate from the button box responses. Trials with prediction times exceeding 2.5 SDs above a participant's condition mean were removed from the analysis (on average, 1.7% of the trials were removed by this procedure). Mean PTs were computed from all remaining, correct responses. Given the low error rate (7.5%), we did not analyze error trials.
PTs were defined as the time elapsed between the first video frame when the actor grasped the bar and the moment the participant pressed a button. We investigated the influence of three task-related factors on PT. The effect of action complexity was assessed by comparing PTs during translation actions with PTs during rotation actions. To probe the orthogonal effect of action frequency on performance, we compared PTs of actions performed with high-frequency and low-frequency grip strategies. Finally, we assessed the effect of postural congruency during translation and rotation actions on PTs. For translation actions, we compared PTs for translation trials with overall congruent and overall incongruent body posture. For rotation actions, we compared PTs for rotation trials where participants' own posture was either congruent or incongruent with the goal posture of the observed action.
We used two-tailed paired-sample t tests for all comparisons on behavioral data. Comparisons that exceeded t values corresponding to p values <0.05 were considered significant.
To assess performance during the fMRI version of the prediction task, we analyzed the error rate during the catch trials as a function of viewing duration (i.e., the time before video playback was stopped and the catch trial signal was presented). We calculated the error rate for trials depending on viewing duration in bins of 100 ms. Note that we cannot calculate PTs during the fMRI version of the prediction task, since the decision moment was imposed by the experimenter, rather than the participant.
Eye movement and EMG data.
To regress out potential interpretational confounds related to cerebral effects of eye and muscle movements during the action-prediction task, regressors describing eye movement and EMG activity recorded during the fMRI session of the prediction task were included in the first-level fMRI analysis. For eye movements, we computed trajectory length and number of eye blinks for each MR volume. For EMG activity, we computed the root mean square (RMS) activity for each MR volume. These eye-movement and EMG time series were included as additional nuisance regressors in the first-level analysis of imaging data (see below).
The eye-movement recordings were also used to compare eye movements between conditions (see analysis of behavioral data) by segmenting the recordings into trials and time-locking each segment to video onset.
Image data acquisition.
We used a 3 T Trio MR-scanner (Siemens) with a 32-channel head coil for signal reception to acquire whole-brain T2*-weighted multiecho echo-planar images (TR, 2070 ms; TE(1), 9.4 ms; TE(2), 21.2 ms; TE(3), 33.0 ms; TE(4), 45.0 ms; voxel-size, 3.5 × 3.5 × 3.0 mm; gap size, 0.5 mm) during all functional scans. For each participant, we collected ∼1400 volumes for the prediction task and 180 volumes for the EBA localizer. The first 30 volumes of each scan were used for echo weighting (see Imaging data analysis) and were discarded from the analysis. This also ensured signal equilibration of T1. Anatomical images were acquired with a T1-weighted MP-RAGE sequence (TR/TE, 2300/3.03 ms; voxel size, 1.0 × 1.0 × 1.0 mm) after the EBA localizer task.
The head of each participant was carefully constrained using cushions on both sides of the head. Participants were instructed to remain as still as possible during the experiment. For additional somatosensory feedback on head movements, the forehead of each participant was taped, with tape extending to both sides of the head coil. Data inspection showed that no head movements of participants ever exceeded 2 mm.
Imaging data analysis.
Imaging data were analyzed using MatLab (MathWorks) and SPM8 (Wellcome Department of Cognitive Neurology). First, functional images were spatially realigned using a sinc interpolation algorithm that estimates rigid body transformations (translations, rotations) by minimizing head movements between the first echo of each image and the reference image (Friston et al., 1995). Next, the four echoes were combined to form a single volume. For this, the first 30 volumes of each scan were used to estimate the best echo combination to optimally capture the BOLD response over the brain (Poser et al., 2006). These weights were then applied to the entire time series. Subsequently, the time series for each voxel were temporally realigned to the acquisition of the first slice. Images were normalized to a standard EPI template centered in Talairach space (Ashburner and Friston, 1999) by using linear and nonlinear parameters and resampled at an isotropic voxel size of 2 mm. The normalized images were smoothed with an isotropic 8 mm full-width-at-half-maximum Gaussian kernel. Anatomical images were spatially coregistered to the mean of the functional images and spatially normalized by using the same transformation matrix applied to the functional images. The ensuing preprocessed fMRI time series were analyzed on a subject-by-subject basis using an event-related approach in the context of the general linear model.
For each trial type, square-wave functions were constructed with a duration corresponding to the stimulus duration and convolved with a canonical hemodynamic response function (hrf) and its temporal derivative (Friston et al., 1996). Additionally, the statistical model included 34 separate regressors of no interest, modeling catch trials and false alarms, residual head movement-related effects by including Volterra expansions of the six rigid-body motion parameters (Lund et al., 2005), and compartment signals from white matter, cerebrospinal fluid, and out-of-brain regions (Verhagen et al., 2008). Volterra expansions consisted of linear and quadratic effects of the six movement parameters for each volume, and included temporal derivatives. Finally, to covary out any potential confounding effects of eye and muscle movements, hrf-convolved metrics of eye movements (path trajectory and number of eye blinks) and muscle activity data were included as additional regressors of no interest.
Parameter estimates for all regressors were obtained by maximum-likelihood estimation, using a temporal high-pass filter (cutoff, 128 s), modeling temporal autocorrelation as an AR(1) process. Linear contrasts pertaining to the main effects of the functional design were calculated based on parameter estimates of canonical hrfs.
For analysis of the experimental task, we looked at the same comparisons as those we looked at during the behavioral prediction task, including those related to action complexity, frequency, and postural congruency. Contrasts of the parameter estimates for these comparisons constituted the data for the second-stage analyses, which treated participants as a random effect (Friston et al., 1999). Contrasts were thresholded, if not otherwise specified, at p < 0.05 after familywise error (FWE) correction for multiple comparisons at the voxel level. Anatomical details of significant clusters were obtained by superimposing the structural parametric maps onto the structural images of the MNI template. Brodmann areas (BAs) were assigned based on the SPM anatomy toolbox (Eickhoff et al., 2005).
Apart from a whole-brain search for significant differences, we specifically focused on two predefined regions of interest (ROIs; spherical, radius: 5 mm). The first ROI consisted of individually localized EBA (on the basis of a separate EBA localizer session), to test whether action-prediction effects were visible in this area, which is sensitive to observation of body parts, as has been previously suggested (Downing et al., 2001). The second ROI was a region in the IPS, which has been found to be sensitive to body-posture manipulations during planning of goal-directed actions. Here we used previously published stereotactic coordinates (MNI: −22, −60, 58; Zimmermann et al., 2012) to extract the difference in brain activation for contrasts related to posture congruency.
Effective connectivity analysis.
After having identified that regions in parietal and dorsal premotor cortex and the left EBA are more strongly involved in predicting goals of low-frequency actions compared with high-frequency actions (see Results), we assessed whether there were changes in effective connectivity between EBA and parietal or premotor regions as a function of action frequency.
More specifically, we expected an increased connectivity between EBA and parietal/premotor cortex during prediction of unlikely (i.e., low frequency compared with high frequency) observed actions, under the hypothesis that EBA forms predictions about potential goal states during observation of another agent. Moreover, it has previously been shown that predictions about observed actions are influenced by one's own, previously executed actions (Cattaneo et al., 2011). Therefore predictions may also be influenced by the likelihood with which the observed action would be chosen to reach a particular goal state in general. With accumulating evidence, predictions in EBA can be updated, and this updated information may be forwarded to the parietal or precentral regions to inform the action plan. This would result in the hypothesized increase in connectivity.
To analyze changes in connectivity, we performed a psychophysiological interaction (PPI) analysis (Friston et al., 1997). PPI analysis tries to model regionally specific responses based on an interaction between a psychological factor and physiological activity of one specific (seed) brain region. Here, the analysis was set up to test for differences in connectivity (measured by correlation strength between activity of 2 areas) between left EBA and all remaining brain areas, depending on the grip strategy used in the observed video (low frequency or high frequency). To define activity in EBA, we used the peak location of the left EBA from the independent localizer task as a starting point. We drew a 5-mm-radius sphere around that voxel and extracted the first eigenvalue of voxels in this sphere that showed a relative increase in BOLD signal during observation of low-frequency actions (first level, p < 0.05 uncorrected). First, a PPI analysis was performed for each subject. Then, contrasts of parameter estimates for the interaction term constituted the data for the second-stage PPI analysis, treating participants as a random effect. Finally, contrasts were corrected for multiple comparisons by applying FWE correction at the cluster level (p < 0.05) over the search volume (whole brain, IPS-ROI), on the basis of an intensity-based voxelwise threshold of p < 0.001 uncorrected.
Results
In this section, we describe behavioral and neuroimaging results during the different tasks. Each set of results is structured along the three dimensions assessed in this study, namely action complexity (rotation trials vs translation trials), action frequency (high-frequency vs low-frequency actions), and observer's posture (congruent vs incongruent with actor's goal posture).
Behavioral results
Action complexity
As can be seen from Figure 2A, PTs for observed rotation actions were longer than those for observed translation actions [translation, 753 ± 124 ms (mean ± SD); rotation, 944 ± 152 ms; t(23) = 14.04, p < 0.001]. This finding indicates that, even when the timing of observed actions of different motoric complexity is comparable, it takes longer to predict the goal state of the more complex actions.
A, B, Action prediction times increase for biomechanically complex (A) and low-frequency actions (B). C, D, Prediction times increase when the observer's hand posture does not match the action goal posture during biomechanically complex (rotation trials, D), but not simple actions (translation trials, C). *p < 0.05; ***p < 0.001. n.s., Not significant.
Action strategy frequency
Within each action type, PTs differed depending on the frequency of the grip orientation used by actors when picking up the bar (Fig. 2B). For translation actions, participants were faster to predict pronated than supinated translation actions (pronated, 732 ± 24 ms; supinated, 774 ± 25 ms; t(23) = 7.97, p < 0.001). For rotation actions, PTs were faster for supinated rotations to the left (pronated, 971 ± 32 ms; supinated, 892 ± 29 ms; t(23) = 9.09, p < 0.001) and for pronated rotations to the right (pronated, 898 ± 31 ms; supinated, 1016 ± 32 ms; t(23) = 7.37, p < 0.001). This pattern of results is fully in line with the frequency of different action strategies (Table 1). Namely, the more frequently an action is executed (the more likely participants are to use a particular grasp orientation in a condition), the faster its goal state is predicted during action observation.
Effect of observer's body posture
Next we assessed the effect of one's own arm posture on predicting the goal state of the observed actions. When observing translation actions, there was no effect of the observer's arm posture on prediction times (congruent, 752 ± 25 ms; incongruent, 755 ± 24 ms; t(23) < 1, p > 0.10; Fig. 2C). For rotation actions, however, participants were faster in predicting action goals when their arm posture matched the goal posture of the observed action (goal-posture congruent, 936 ± 30 ms; goal-posture incongruent, 952 ± 30 ms; t(23) = 2.44, p = 0.022; Fig. 2D). That is, when participants observed a rotation action performed with a supinated grip and thus ending with a prone arm posture, prediction of the action goal was faster when the participant's own arm was also in a prone posture. Similarly, when observing a rotation action performed with a pronated grip, PTs were faster when the participant's arm was in a supine posture.
Behavioral performance during fMRI session
We analyzed the performance (error rate) on catch trials during the fMRI session of the action-prediction task as a function of viewing duration (i.e., in 100 ms bins, with an average 12 trials per participant in each bin). Performance increased with an increase in viewing duration (linear increase of performance across subsequent bins, β = 0.368; t(23) = 4.295; p < 0.001, R2 = 0.135). This finding indicates that the longer participants could watch the action, the better they could predict it. For the first bin (1000–1100 ms) participants correctly predicted 75.7% of the actions, which increased to 89.7% correctly predicted actions in the two last bins (1300–1400 ms, 1400–1500 ms).
Neuroimaging results
Action complexity modulates activity in intraparietal and precentral regions
During observation of actions of higher motoric complexity (rotation trials, compared with translation trials) neural activity increased bilaterally in the IPS and the precentral gyrus, as well as the EBA (Fig. 3A, Table 2). These activity increases were localized in the superior parietal lobe [BA7; 40–60% probability (Eickhoff et al., 2005)] on the upper bank of the IPS, and extended ventrally into the inferior parietal lobe. There were also complexity-related activity increases in the frontal cortex, restricted to the dorsal premotor cortex (BA6; 20–50%) and ventral premotor cortex (BA44; 40–50%). Activity differences within the middle occipital gyrus overlapped with EBA: within the individually localized left EBA, activity was stronger for rotation trials compared with translation trials (t(23) = 4.58, p < 0.001).
A, B, Activation maps, illustrating areas that show stronger activation during observation of complex actions compared to observation of simple actions (A) and observation of low-frequency action strategies compared to observation of high-frequency action strategies (B). Both contrasts show stronger activations in the EBA, and posterior parietal and premotor cortices. p < 0.001, uncorrected for illustration purposes.
Brain regions associated with increased activity during observation and prediction of rotation actions compared to translation actions
Action strategy frequency modulates activity in intraparietal and precentral regions
Observation of low-frequency actions (compared with high-frequency actions) increased activity in cortical regions partially overlapping those sensitive to action complexity (Fig. 3B; Table 3). These activity differences were observed in the left posterior parietal cortex (upper bank of IPS, BA7; 10–20%), the left and right dorsal premotor cortex (BA6; 20–40%), as well as the left and right middle occipital cortex. The latter regions overlapped with the individually localized EBA, where activity was stronger for low-frequency compared with high-frequency actions for translation (t(23) = 3.31, p = 0.003) as well as rotation actions (t(23) = 3.75, p = 0.001).
Brain regions associated with increased activity during observation and prediction of low-frequency actions compared to high-frequency actions
Effect of body-posture congruency in IPS
We next assessed whether there were any activity differences related to the congruency of the participants' arm posture with the action-goal posture. Focusing on the a priori defined intraparietal ROI, we observed increased activity in the left IPS when participants' body posture was incongruent to the goal posture of the observed action, compared with trials in which the two postures were congruent (t(23) = 2.48, p = 0.021; Fig. 4). This region did not show an activity difference as a function of body-posture congruency during translation actions (t(23) = 0.03, p = 0.974), which was similar to the behavioral results. A whole-brain search for differences in neural activation associated with postural congruency found none in other regions examined in either translation or rotation trials.
BOLD signal amplitude in a region of interest of the left IPS (indicated in the rendered brain image) during observation of translation (left bar) and rotation actions (right bar). BOLD signal increases when the observer's body posture does not match (is incongruent with) the actor's goal posture during rotation actions only. *p < 0.05. n.s., Not significant.
Effective connectivity between EBA and IPS is modulated by action frequency
If EBA is involved in action prediction, then its activity should modulate (or should be modulated by) processes occurring in intraparietal and precentral regions that support action perception. Using PPI analysis, which is designed to assess changes in effective connectivity between brain regions (Friston et al., 1997), we found that activity in the left IPS, at the same site as the above-mentioned effect of observer's body posture (IPS-ROI at MNI: −22, −58, 60), correlated with activity in left EBA as a function of action frequency. Namely, observing low-frequency actions increases the coupling between EBA and IPS (t(23) = 3.66, p = 0.046). There were no differences in connectivity when searching over the whole brain.
Eye movements
To control for the possibility that the cerebral effects described above are due to differences in eye movements, we tested for between-condition differences in eye-movement trajectory length. There was no difference in trajectory length between eye movements corresponding to different trial types (i.e., rotation vs translation, posture effects, grip choice within trial types; all t < 1.50, all p > 0.10).
Discussion
This study investigated whether and how one's body posture influences one's observations of the actions of others and affects predictions of the goals of those actions. The results provide empirical support for a direct influence of the observer's own body posture on action observation, indicating that the prediction of an action goal is facilitated when the observer's body posture matches the action-goal posture. In neural terms, postural incongruency between the observer's body posture and the action-goal posture leads to increased activity within a region of the left IPS known to be implicated in generating state estimates of one's own body (Wolpert and Ghahramani, 2000, Pellijeff et al., 2006, Parkinson et al., 2010).
Behavioral effects
PTs were modulated by the biomechanical complexity of the observed action, the frequency of those actions (as assessed in an independent production task), and the spatial relationship between the observer's body posture and the actor's goal posture. In detail, it took observers longer to predict goals of actions when the actions' biomechanical complexity was higher, and it took them longer to predict goals of actions that they would make less frequently. Moreover, when the body posture of an observer matched the goal posture of the observed action, predicting the action goal required less time than when those postures did not match. These effects closely resemble the pattern of reaction times observed when participants planned actions of the same type as shown in the videos used in this study (Zimmermann et al., 2012).
Action-observation effects in parietal and precentral cortex
Posterior parietal and precentral regions were sensitive to the complexity and frequency of the observed actions. These brain regions showed a stronger response to complex actions compared with simple ones, and they also showed a stronger response to low-frequency actions compared with high-frequency ones. This activation pattern fits with earlier observations of planning-related activity (Zimmermann et al., 2012) and with studies showing brain regions that allow decoding of action intentions in object-directed actions (Gallivan et al., 2011).
Sensitivity to frequency and complexity of the observed actions, together with the known involvement of these regions in motor preparatory processes (Thoenissen et al., 2002) further supports the idea that planning and observation of actions engage overlapping brain regions. The increased parietal and precentral activity during the observation of low-frequency actions might reflect competition among multiple forward models (Oztop et al., 2005) or familiarity with the observed action (Calvo-Merino et al., 2005; Neal and Kilner, 2010). Future studies may want to use individual priors for different action strategies to test for effects of expertise and individual preferences. In particular, it seems plausible that individual differences in forward models (e.g., due to differences in exposure to particular motor programs) may have consequences for both perception and action. Incidentally, this may also explain the qualitative differences in action perception observed in deafferented patients (Bosbach et al., 2005).
A region within IPS was sensitive to the postural congruency between the observer's body posture and the kinematics of observed actions. It has been shown earlier that the same region maintains a body-state estimate (Wolpert et al., 1998; Pellijeff et al., 2006; Parkinson et al., 2010), and that it is modulated by one's body posture during action production (Shenton et al., 2004; de Lange et al., 2006; Lorey et al., 2009; Ionta et al., 2012; Zimmermann et al., 2012). Modulatory effects of one's body posture during observation of others' actions in the same region within IPS suggest that it not only represents one's own estimated body states, but also the estimated goal states of others' actions. However, it is also possible that there are two classes of neurons in the same region, with some neurons representing one's own body state and other neurons representing the body state of others.
Action-observation effects in the EBA
Observing actions of higher complexity evoked stronger responses in the EBA. Given that the actor's hand during rotation actions is visible from both sides, these trials might provide more structural information about that body part and the tool being manipulated than the less complex translation trials. These features have been suggested to increase EBA activity (Downing et al., 2001), and the lateral occipitotemporal cortex is particularly responsive during the perception of hands (Bracci et al., 2010) and visually presented man-made tools (Bracci et al., 2012). However, these features cannot explain the larger EBA activity during observation of rotation actions when the action is executed infrequently, with structural body and tool information being matched between these conditions.
EBA, rather than having only perceptual functions, may during motor control represent desired goal postures for future actions, which can be used to guide selection of an appropriate motor plan (van Nuenen et al., 2012; Zimmermann et al., 2012). If action observation makes use of the same processes that underlie action planning, EBA could potentially provide a visual representation of a predicted goal state of the observed action, which can be used to guide action simulation. In case a low-frequency action strategy is observed, the initial prediction may be inaccurate (since other actions/goal states are more likely) and updated when more evidence is available, thereby increasing overall brain activity. Because we perform many actions with our hands, and many actions involve tools, desired goal states for action production may be tool-specific (i.e., different tools may require specific grip strategies), and the same combined representations of tools and body parts may be used to infer goals of observed actions. To infer an action goal, it is important not only to know how and where someone is moving, but also, to resolve ambiguity, to anticipate what can be done with the object(s) that are part of the scene and to understand the action's context in general (Kilner et al., 2007).
The suggestion that parietal and precentral regions are engaged during action observation, guided by predicted goal states from EBA, is further supported by the finding that the functional connectivity between EBA and IPS is strengthened during observation of infrequent actions. Drawing from earlier explanations, the increase in connection strength may reflect the updating of information about the predicted goal state after the initially false representation within EBA is corrected based on additional evidence about the action's goal state.
Goal-state estimation and body posture
Ambrosini et al. (2012) recently showed that proactive eye movements are impaired when observers have their hands tied behind their back. This demonstrated that the observer's body posture can modulate action perception. Here we extend this finding by showing that prediction of others' actions is facilitated by congruence of one's body posture with the goal state of the observed action. This finding is consistent with studies showing that simultaneous execution of congruent actions during action observation can assist perception of these actions (Hamilton et al., 2004; Miall et al., 2006). We assume that in these situations an estimated goal state of one's own action is congruent with a predicted goal state of the observed action. The characteristics of the goal-state effect observed in the current study may also explain why previous studies did not find an effect of observers' posture on action observation. For instance, in Lorey et al. (2009), the observed actions lacked a clear goal, and in Fischer et al. (2005), the actions were unfamiliar to the observers (i.e., reaching a dot from a bent posture while sitting on a chair).
Conclusions
This study has shown that planning and perceiving actions rely on a common predictive mechanism that generates internal simulations of these actions. In both situations, predictions pertain to the goal state of the action, and they take into account the current state of the body. During planning, predicted goal states may be evaluated with respect to the task goal of the actor, to anticipate future states, adjust for movement errors and improve perception (Desmurget et al., 1999; Wolpert and Ghahramani, 2000; Voss et al., 2008). During observation, predicted goal states can be used to anticipate another's actions or help to understand the intentions of the observed agent (Kilner et al., 2007; Urgesi et al., 2010).
Overall, our results are in line with theories assuming a tight link between action observation and execution (Jeannerod, 2001; Oztop et al., 2005), and suggest that action observation (prediction) is organized around the prediction of goal postures, as appears to be the case during action planning (Rosenbaum et al., 1990, 2012; Graziano et al., 2002).
Footnotes
- Correspondence should be addressed to Marius Zimmermann, Donders Institute for Brain, Cognition and Behaviour, PO Box 9101, 6500 HB, Nijmegen, The Netherlands. m.zimmermann{at}donders.ru.nl