In both monkeys and humans, the observation of actions performed by others activates cortical motor areas. An unresolved question concerns the pathways through which motor areas receive visual information describing motor acts. Using functional magnetic resonance imaging (fMRI), we mapped the macaque brain regions activated during the observation of grasping actions, focusing on the superior temporal sulcus region (STS) and the posterior parietal lobe. Monkeys viewed either videos with only the grasping hand visible or videos with the whole actor visible. Observation of both types of grasping videos activated elongated regions in the depths of both lower and upper banks of STS, as well as parietal areas PFG and anterior intraparietal (AIP). The correlation of fMRI data with connectional data showed that visual action information, encoded in the STS, is forwarded to ventral premotor cortex (F5) along two distinct functional routes. One route connects the upper bank of the STS with area PFG, which projects, in turn, to the premotor area F5c. The other connects the anterior part of the lower bank of the STS with premotor areas F5a/p via AIP. Whereas the first functional route emphasizes the agent and may relay visual information to the parieto-frontal mirror circuit involved in understanding the agent's intentions, the second route emphasizes the object of the action and may aid in understanding motor acts with respect to their immediate goal.
The ability to understand actions performed by others is a fundamental aspect of social behavior. It is therefore no surprise that the topics of action observation and recognition have attracted a strong interest within the cognitive neurosciences. The work of Perrett and others (Bruce et al., 1981; Perrett et al., 1985, 1989, 1990; Oram and Perrett, 1994) showed an unequivocal role for the monkey anterior superior temporal sulcus (aSTS) in the visual analyses of actions performed by others. Neurons in the aSTS respond to a wide variety of biological movements, including walking, bending the torso, head turning, arm movements, and goal-directed hand actions (Perrett et al., 1989, 1990; Jellema et al., 2000).
The topic of action observation/recognition received increased attention after the discovery of mirror neurons in ventral premotor cortex of the monkey (area F5), a class of neurons discharging both when a monkey performs a goal-directed motor act and when it observes another individual performing the same or a similar motor act (di Pellegrino et al., 1992; Gallese et al., 1996; Rizzolatti et al., 1996). Recently, Nelissen et al. (2005), using functional magnetic resonance imaging (fMRI), discovered that the convexity of F5 (F5c) responded to the observation of actions performed by others provided the agent was visible in the video, whereas the parts located in the posterior bank of the inferior arcuate sulcus (F5a and F5p) were also activated when only the hand and forearm performing the action were visible.
A fundamental, still unsolved question concerns the brain areas and connections conveying the visual descriptions of the observed actions from the aSTS to the premotor cortex (F5). The available anatomical data indicate that there are no direct anatomical connections between these two regions (Matelli et al., 1986; Ghosh and Gattera, 1995). It is generally accepted that the inferior parietal lobe (IPL), which is connected with both the superior temporal sulcus region (STS) and F5, is the anatomical link between the two regions (Rizzolatti et al., 2001). This view is supported by single-neuron studies reporting mirror neurons in IPL (PFG) (Fogassi et al., 2005). Furthermore, recent imaging studies also point to a role of anterior intraparietal (AIP) in encoding hand grasping performed by others (Peeters at al., 2009; Rizzolatti and Sinigaglia, 2010).
The aim of the present study is threefold: first, to investigate which cortical areas of monkey STS and parietal cortex are involved in the visual analysis of the observed actions; second, to assess to what extent these visual activations depend on the presence of the agent in the action videos; third, to investigate the possible functional routes linking anterior STS with F5 by combining the results of the functional MRI experiments involving grasping observation with data from neural tracers injected into the parietal and frontal areas responsive to action observation.
The combination of functional and anatomical data allowed us to define two main functional routes by which the visual information about others' actions is projected from the STS, via the parietal cortex, to premotor area F5. One route uses PFG as an intermediate station, the other, area AIP.
Materials and Methods
The experiments were performed on five macaque monkeys (Macaca mulatta; 3–6 kg; 4–7 years of age), including four male (M3, M5, M6, and M15) and one female (M13). All animal care and experimental procedures met the national and European guidelines and were approved by the ethical committee of the Katholieke Universiteit Leuven Medical School. The details of the surgical procedures, training of monkeys, image acquisition, eye monitoring, and statistical analysis of monkeys scans have been described previously (Vanduffel et al., 2001; Fize et al., 2003; Nelissen et al., 2005, 2006) and will be described here only briefly.
During the experiments, the monkeys sat in a sphinx position in a plastic monkey chair directly facing the screen. In the training and scanning sessions, they were required to maintain fixation within a 2 × 2° window centered on a red dot (0.35 × 0.35°) in the middle of the screen. Eye position was monitored at 50 or 120 Hz (in later experiments) via the pupil position and corneal reflection (ISCAN). During scanning, the fixation window was slightly elongated in the vertical direction to 3°, to accommodate an occasional artifact on the vertical eye trace induced by the scanning sequence. The monkeys were rewarded (with apple juice) for fixating the small red dot within the fixation window for long periods (several minutes), while stimuli were projected in the background. Before each scanning session, a contrast agent, MION (monocrystalline iron oxide nanoparticle) (or Sinerem), was injected into the femoral/saphenous vein (6–11 mg/kg).
Visual stimuli were projected from a Barco 6300 liquid crystal display projector (1024 × 768 pixels; 60 Hz) onto a screen 54 cm from the monkeys' eyes. Unless otherwise mentioned, all tests included a simple fixation condition, in which the fixation target was shown on an empty gray screen, as baseline.
In experiment 1, three monkeys (M3, M5, and M6) were scanned in a 1.5 T scanner and one animal (M15) in a 3 T scanner with custom-made eight-channel monkey Rx coils to achieve higher signal-to-noise ratios in the anterior STS. Stimuli (Fig. 1) consisted of video clips showing a hand (and forearm) grasping and picking up an object (“isolated hand” action) (supplemental Video S1, 13 × 16° size, available at www.jneurosci.org as supplemental material) and video clips showing a full view of a person performing the same actions (“acting person”; supplemental Video S2, 18 × 20° size, available at www.jneurosci.org as supplemental material). Four different isolated hand action video sequences were used: a male or female hand grasping and picking up a candy (precision grip) or a ball (whole hand grasp). These video sequences lasted 3.3 s and 11 randomly selected sequences were presented in a 36 s block. Six different acting person video sequences, lasting 6 s, were presented in random order in a block: a man or woman grasping and picking up an apple (whole-hand grasp), a piece of carrot, or a peanut (precision grip). The longer duration of the acting person videos was attributable to the larger number of static frames at the beginning and end of the videos. The duration of the actual hand movement period was similar in the two movies: 2.1 s on average in the hand action videos and 2.7 s in the acting person videos. Two types of control stimuli were used: (1) static single frames of the action videos, one from the middle of the video sequence when the hand is about to grasp the object and one from the end of the video sequence when the object has been picked up; (2) scrambled videos produced by phase scrambling each frame of the video sequences (supplemental Video S3, available at www.jneurosci.org as supplemental material). Static stimuli were refreshed every 3.3 s or every 6 s by showing a frame selected from one of the four isolated hand action videos or from one of the six acting person videos, respectively. The acting person videos differed from the static controls by the 2.7 s dynamic period, as shown by the strong differential activity in MT/V5 (see Results).
The fixation condition used a white or green background, matched to those in the acting person and hand action videos, respectively. In each case, five different runs with different orders of conditions were used. Within a run, the order remained constant, and conditions were repeated once. The stimulus conditions are the same as those used in the study by Nelissen et al. (2005). For two monkeys (M5, M6), we used the parietal and STS data obtained in this earlier study, while two additional monkeys (M3, M15) were scanned.
In experiment 2 (M6, M13), we performed an additional action control test that contrasted the responses to goal-directed (object present) and mimicked (no object present) isolated hand actions to either a translating hand (supplemental Video S4, available at www.jneurosci.org as supplemental material) or to a static single frame. The goal-directed hand action video clips were the same as those used in experiment 1. The mimicked hand action clips showed an isolated hand mimicking a grasping action (supplemental Video S5, available at www.jneurosci.org as supplemental material). The translation controls were introduced because many regions in the middle and posterior STS are involved in visual motion analysis (Zeki, 1974; Maunsell and Van Essen, 1983; Desimone and Ungerleider, 1986; Vanduffel et al., 2001; Nelissen et al., 2006). The action and static stimuli conditions were the same as those in experiment 3 of the study by Nelissen et al. (2005). The monkeys scanned in that experiment were different animals from those used in the present study.
In the analyses of both experiments, regions of interest (ROIs) were defined in the anterior portion of STS and in the IPS using the hand action test of experiment 2 of the study by Nelissen et al. (2005) as an independent action localizer test (monkeys: M1, M3, M5). This action localizer, also used in the study by Nelissen et al. (2006), consisted of videos showing isolated hand actions (3.3 s for 1 cycle; four different video sequences presented in random order) and a static (single frames from the middle of the videos shown for 36 s) and scrambled control.
Functional time series (runs) in experiments 1 and 2 and in the action localizer test consisted of gradient-echo echo-planar whole-brain images acquired on a 1.5 T Siemens Sonata with a surface coil positioned over the head [1.5 T; repetition time (TR), 2400 ms; echo time (TE), 27 ms; 32 sagittal slices, 2 mm isotropic voxels]. For one animal (M15), functional time series in experiment 1 consisted of gradient-echo echo-planar whole-brain images acquired on a 3 T Siemens TIM Trio with a custom-built eight-channel receive coil (TR, 2000 ms; TE, 17 ms; 30 horizontal slices; 1.5 mm isotropic voxels).
In the 1.5 T scan sessions of experiment 1, the isolated hand action videos (and controls) and the acting person videos (and controls) were presented in alternate runs. In addition, after six such runs, in which the monkey was passive, an active run was introduced in which the monkey had to detect the change in orientation of a small bar shown in the center of the screen while the action stimuli were presented in the background. The aim of this run was merely to enhance the alertness of the subject and these data were not analyzed. After this active run, another set of six passive runs were collected, with the cycle of six plus one run being repeated once or twice in a session. In the 3 T scan sessions of experiment 1 and in experiment 2, only passive runs were collected, with little difference in the results.
For each monkey, an anatomical (three-dimensional magnetization-prepared rapid-acquisition gradient echo) volume (1 × 1 × 1 mm voxels) was acquired under anesthesia in a separate session.
Volume-based data analysis.
Data were analyzed using SPM5 and Match software. Only runs in which the monkeys held fixation within the window for >85% of the time were analyzed. In these analyses, realignment parameters, as well as eye movement traces, were included as covariates of no interest to remove eye movements and brain motion artifacts. Spatial preprocessing consisted of realignment and rigid coregistration with a template anatomy [M12, corresponding to M1 in the study by Ekstrom et al. (2008)]. To compensate for echo-planar distortions in the images and for interindividual anatomical differences, the functional images were warped to the template anatomy using the nonrigid matching software, BrainMatch (Chef d'Hotel et al., 2002). The functional volumes were then resliced to 1 mm3 isotropic and smoothed with an isotropic Gaussian kernel (full width at half-maximum, 1.5 mm). Group analyses (fixed effects) were performed with an equal number of volumes per monkey, supplemented with single-subject analysis, and the level of significance was set at p < 0.05 corrected (familywise error) for multiple comparisons, unless stated otherwise.
Sixteen different ROIs, 10 within the STS and 6 in the parietal cortex, were defined onto the anatomical template (M12) (Fig. 2).
The 10 STS ROIs (Fig. 2C,D) consisted of 6 motion-sensitive regions defined in the study by Nelissen et al. (2006), plus 4 additional regions defined by the action localizer test. The motion-sensitive regions are located in the caudal [middle temporal (MT)/V5, MT peripheral (MTp), dorsal part of medial superior temporal (MSTd), fundus of the superior temporal (FST)] and middle portion of the STS [lower superior temporal (LST) and middle superior temporal polysensory (STPm)]. These latter two regions are considered here part of the aSTS. The MTp region corresponds to the ventral part of medial superior temporal (MSTv) region of Nelissen et al. (2006). Here, the terminology of Kolster et al. (2010) is used, as a way of reconciling the motion-sensitive regions defined by Nelissen et al. (2006) with the retinotopic results of Kolster et al. (2009). Indeed, the region we refer to as MTp appears to be located more posteriorly than the MSTv described by Kolster et al. (2009), which shared a border with FST. The action localizer test yielded two separated activation sites in the anterior portion of the lower bank of the STS tentatively designated lower bank 1 (LB1) (6–10 mm anterior to interaural plane) and lower bank 2 (LB2) (11–15 mm anterior to interaural plane). These regions responded significantly to videos showing hand-grasping actions (compared with static and scrambled controls). The action localizer test did not yield such activation sites in the corresponding portion of the upper bank. To have an equally sensitive analysis of the two banks of the STS, we tentatively defined two upper bank regions, upper bank 1 (UB1) and upper bank 2 (UB2), at the same anterior–posterior level as LB1 and LB2. Both UB regions are part of anterior superior temporal polysensory (STPa) (Bruce et al., 1981; Oram and Perrett, 1994; Cusick, 1997; Jellema et al., 2000).
Six ROIs were defined in the parietal lobe (Fig. 2A,B,D), four in the inferior parietal lobule convexity (area PF, PFG, PG, and Opt), following Gregoriou et al. (2006), and two in the lateral bank of the intraparietal sulcus [AIP and anterior portion of lateral intraparietal (LIP)]. Area AIP was delineated on the basis of previous single-cell (Murata et al., 2000) and fMRI studies (Durand et al., 2007). The action localizer test also yielded a significant, active site in the lateral bank of the IPS (Fig. 2B), in the anterior portion of the IPS region active during the execution of visual saccades (Wardak et al., 2010), which was termed LIPa (anterior portion of LIP), following Durand et al. (2007).
Since the grasping actions in the videos were performed near the fovea and mostly within the right visual field, the ROI analysis was restricted to the left hemisphere. For number of voxels and proportion of visually responsive voxels in each ROI, see supplemental material (available at www.jneurosci.org).
Region-of-interest analysis was done using MarsBaR (version 0.41.1; http://marsbar.sourceforge.net). The significance threshold for the t tests was set at p < 0.05, one-tailed, Bonferroni corrected for the number of ROIs. A ROI was considered to be significantly activated only if the action observation condition activated the ROI more than any of the control conditions (including the fixation baseline) at p < 0.05, corrected, both in the group and at least two of three of the monkeys (i.e., two of three in experiment 1 and two of two in experiment 2).
The correlation analysis for heterogeneity of ROIs uses the method developed by Peelen et al. (2006). For each voxel of the ROI, we plot the differential activity in one contrast (e.g., isolated hand action − fixation only) as the function of differential activity in the second contrast (e.g., acting person − fixation only). These correlations were calculated using only the voxels that were activated in the subtraction defining the abscissa, e.g., acting person − fixation. Including both the deactivated and the activated voxels would have inflated these correlations.
To minimize distortions caused by whole-hemisphere flattening, the procedure developed by Durand et al. (2007) was used to flatten the STS, IPS, and adjoining IPL convexity and finally the inferior ramus of the arcuate sulcus (IAS). The trajectory in the flat maps of the iso-anteroposterior (AP) levels corresponding to coronal sections depends on the overall shape of the sulcus. In the IPS (Fig. 2B), iso-AP levels display a “V” shape, reflecting the increase in the depth of the IPS at more caudal levels. In the anterior STS, iso-AP lines run almost perpendicular to the elongated flattened shape of the STS (Fig. 2C). Since the STS widens caudally, the posterior iso-AP lines there are strongly bent. Hence, MTp, for instance, might appear to be located completely posterior to MT/V5 on the STS flat map, whereas it is actually located at similar anterior–posterior level as MT/V5.
Correlation with connectional data.
To identify the possible pathways conveying action-related visual information from the STS to frontal lobe areas, functional MRI data were combined with data from nine representative macaque monkeys (five Macaca nemestrina, cases 14, 20, 26, 27, 30; three Macaca fascicularis, cases 13, 29, 36; and one Macaca fuscata, case 17) in which neural tracers were injected into areas PFG (case 13l, 27r, 29r), AIP (cases 14r, 17r, 20l), 45B (cases 26l, 30r, 36l), and F5a (case 30r). All the cases of injections in areas PFG, AIP, and 45B have been already presented in previous connectional studies to which the reader is referred also for details on surgical, histological, and data analysis procedures (Rozzi et al., 2006; Borra et al., 2008; Gerbella et al., 2010). Data from the tracer injection in F5a have been presented only in abstract form.
In cases 13l, 27r, and 29r, peroxidase-conjugated wheat germ agglutinin (WGA-HRP) (4%, one injection, 0.1 μl), cholera toxin B subunit, conjugated with Alexa 488 (CTBgreen) (1% in distilled water, two injections, 1 μl each) or Fast Blue (FB) (3%, one injection, 0.2 μl) were injected into PFG. In cases 14r, 17r, and 20l, microruby (MR) (10% phosphate buffer, 0.1 m, pH 7.4, two injections, 1 μl each), WGA-HRP (seven injections, 0.1 μl each) and diamidino yellow (DY) (2%, one injection, 0.2 μl) were injected into AIP using similar procedures. In cases 26l, 30r, and 36l, DY (one injection, 0.2 μl), FB (one injection, 0.2 μl), and FB (one injection, 0.2 μl) were injected into area 45B. Finally, in case 30r, in addition to the FB injection into area 45B, DY (one injection, 0.2 μl) was injected into F5a.
In all cases, the distribution of retrograde labeling was plotted in sections every 600 μm and analyzed qualitatively and quantitatively, as the number of labeled neurons found in a given cortical subdivision outside the injected area, in percentage of the overall labeling observed in the injected hemisphere. The distribution of the labeling observed after tracer injections in the same area was remarkably consistent across different cases and the quantitative analysis showed quite similar percentage distributions of the labeling in the different animals (Rozzi et al., 2006; Borra et al., 2008; Gerbella et al., 2010).
For the purposes of the present study, the distributions of the retrograde labeling observed in the ipsilateral STS in all PFG, AIP, and 45B injection cases and in the ipsilateral IPS in case 30r (FB and DY) were reanalyzed as follows. The distributions were first visualized in two-dimensional reconstructions of the STS and IPS as described by Matelli et al. (1998). For purposes of comparison, all reconstructions of injected hemispheres were shown as left hemispheres. These two-dimensional reconstructions (flat maps) were then warped to the flattened left STS and IPS isomaps. For the STS warping, reference points were placed every 2 mm along the lower lip, fundus, and upper lip of the STS of both the input flat map (tracer injection) and the reference flat map (fMRI isomap). In each STS flat map, AP 0 was set at the level of the posterior commissure. For the IPS warping, reference points were placed every 2 mm along the lower lip, the middle of the lateral bank, and the fundus of the IPS of both the input flat map (tracer injection) and the reference flat map (fMRI isomap). The deformation of the input flat map to conform to the reference flat map was based on a linear interpolation of a triangular mesh, formed by the reference points and the four corner points of the image, using the Matlab “griddata” function (Watson, 1992), which is based on a Delaunay triangulation of the data.
To analyze the consistency of the labeling in a cortical region, the following procedure, a method inspired by that of Borra et al. (2008), was used. The procedure introduces a slight smoothing of the labeling to overcome individual differences. Each dot of label was replaced by a disc of 100% luminance with a 10 pixel radius (∼600 μm in the flat map, approximately the distance between sampled sections), and these luminance discs were added for the STS flat map of a given individual. The composite luminance distribution was thresholded just above 100%, to reject isolated labeled neurons in the connectivity map of a single animal. The overlap between maps from different subjects was calculated and color coded. The results did not depend much on the exact size of the disc, nor on the threshold as long as it exceeded 100%. The fMRI data were also smoothed in the flat maps, for the same reason, by replacing each local maximum with a disc of 20 pixels radius in the flat map. A union of the discs was computed for each animal and the overlap between individual activation patterns was calculated and color coded for each point of the flat map.
Cortical regions activated during action observation
The cortical regions activated by action observation are shown in Figure 3. Figure 3, B and D, illustrates the activation sites for grasping performed by the isolated hand (relative to static and scrambled controls), whereas Figure 3F shows those for grasping by the acting person. The network for hand action observation includes the STS, parietal, ventral premotor, and lateral prefrontal cortices.
The actions were presented mostly in the monkey's right visual field (see Materials and Methods). It is therefore difficult to draw any conclusion regarding the lateralization of the action observation network in the monkey. However, as shown in Figure 3, D and F, both STS and premotor cortex tended to respond bilaterally. The finding that the activation sites in the parietal cortex were more extensive in the left hemisphere than in the right (i.e., contralateral to the visual stimulation) is in agreement with previous results (Peeters et al., 2009).
Activation during grasping observation in the superior temporal sulcus region
The ROI analysis of the two experiments showed that 5 of the 10 STS ROIs were significantly more activated during the observation of grasping (isolated hand and active person) relative to the controls: MT/V5, FST, LST, LB2, and STPm (Table 1; Fig. 4; supplemental Fig. S1, available at www.jneurosci.org as supplemental material). In experiment 1, the actions were contrasted with each of three control conditions (the two static and scrambled controls) and with the fixation baseline condition, whereas in experiment 2 they were contrasted with the static and translation controls and the fixation baseline. Note that experiment 2 shows that the five action-responsive ROIs are not merely motion-sensitive regions. Table 1 lists the minimum t value obtained in the different contrasts in both experiments. This minimum value exceeds the significance threshold in all five ROIs mentioned above.
In experiment 1, the responses of MT/V5 to the acting person videos were much stronger than to either static condition (Fig. 4). This differential activity was very similar to that obtained with the isolated hand videos, clearly indicating that the main difference between the acting person videos and the static conditions is the motion sequence capturing the action.
In experiment 2, we also presented video clips of an isolated hand mimicking a grasp, without an object present (supplemental Video S5, available at www.jneurosci.org as supplemental material). The activation elicited by the observation of hand action with and without objects was very similar in the STS ROIs, except for LB2, which exhibited a weak, nonsignificant reduction in response for the mimicking conditions (supplemental Fig. S1, available at www.jneurosci.org as supplemental material). LB2 is located near the region in which Barraclough et al. (2009) observed stronger neuronal responses to goal-directed actions. No significant interaction between goal-directed and mimicked actions (compared with either their translational or static controls) was obtained in LB2 or in any other STS ROI.
All ROIs tested were located in the depth of the STS (Fig. 2). To complement the ROI analysis, we plotted all the local maxima (at p < 0.001, uncorrected) of the four single subject statistical parametric maps (SPMs) for the contrasts acting person versus the average of all three controls (Fig. 5A) and isolated hand action versus the average of all three controls (Fig. 5B). The MR responses to action observation are indeed concentrated in the deeper portions of both banks. Most of the local maxima form a long ribbon starting at the caudal end of the lower bank STS (near MT/V5) and extending forward all the way up to the tip of STS. Comparatively few local maxima were located in the upper bank. Most STS regions yielded intermingled responses to both types of action videos.
Activation during grasping observation in the IPL and IPS
The ROI analysis (Table 2; Fig. 6; supplemental Fig. S1, available at www.jneurosci.org as supplemental material) revealed that one parietal ROI, AIP, was significantly activated, using the criteria described for the STS, in the two experiments, for both types of action videos. Area PFG, in which visual responses to hand action have been reported (Fogassi et al., 2005; Rozzi et al., 2008), was also significantly activated in all these conditions, with the exception of the isolated hand tests at 1.5 T in experiment 1 and the comparison with translation control in experiment 2. This stands in sharp contrast to the other three IPL ROIs that were never activated. Finally, LIPa was significantly activated in most tests, except the isolated hand action at 3 T.
In experiment 2 (supplemental Fig. S1, available at www.jneurosci.org as supplemental material), area AIP showed a significant interaction between isolated hand grasping objects and mimicked hand actions, relative to either their translation or static controls (t = 3.57 and t = 3.53, respectively; p < 0.05, corrected for number of ROIs). Hence, AIP responded more strongly to the observation of the goal-directed (object present) than the mimicking (object absent) hand actions and this effect was specific for the action condition.
Anatomical connections between STS and parietal cortex
To investigate how visual areas processing grasping actions are connected with the different sectors of F5, we examined the anatomical connections of the key parietal regions active during the observation of hand grasping, PFG and AIP (Rozzi et al., 2006; Borra et al., 2008), and compared them with the present fMRI data, which were acquired in different subjects. The optimal approach would be to use the same animals, which is not feasible. Training a monkey for awake fMRI experiments is a long and intensive process, and a well trained fMRI monkey can participate in several studies during an extended period of time. This renders it impractical to kill these valuable animals for the tracer studies. As an alternative, we tested several macaque monkeys in the fMRI and anatomical experiments and searched for activation and projection sites that remained consistent across the majority of these subjects.
Figure 7B shows that tracer injections into PFG (Fig. 7A) produced labeling concentrated in three sectors of the ipsilateral upper bank of the STS (Fig. 7B): area MSTd (Nelissen et al., 2006), STPm, and UB1. Note that the injections in all three hemispheres produced a similar labeling distribution, as indicated by the overlap of green, red, and blue pixels (Fig. 7B). This was confirmed by the consistency analysis (Fig. 8A) in which these three sectors appear as white. The region of overlap in STPm is small because the label in one animal (Fig. 7B, blue label) was located slightly more anterior to that in the two other animals (Fig. 7B, red and green label). This is not to say that no other STS region is connected with PFG, as a few other regions, located mainly in the upper bank, are also labeled in a single animal. Even lowering the criterion to consistent labeling in two animals (light gray) would not substantially alter our conclusions, although the STS regions connected with PFG would be somewhat larger.
Of these three labeled sectors, only STPm showed a consistent activation during grasping action observation, being activated in three of four animals (yellow pixels) for the observation of both types of hand actions (Fig. 8D,E). On the contrary, activation of MSTd during action observation was less consistent: only in one animal and for one type of video (Fig. 8D,E). This suggests that STPm is the STS sector involved in the relaying of information concerning grasping actions to PFG.
Injections into area AIP (Fig. 9A) yielded a more widespread ipsilateral STS labeling (Fig. 9B) than that observed after injection into PFG. In all three cases, labeling was found in three sectors of the STS: one sector at the level of MSTd, extending further caudally, one in the middle portion of the upper bank near the fundus (anterior to STPm), extending into the lower bank, and one rostrally in LB2 (Fig. 9B), extending more laterally. The consistency analysis (Fig. 8B), however, indicated that consistent labeling in all three subjects was only observed in LB2 and more laterally in the lower bank near the lip between +8 and +10.
Of these two regions, only one, LB2, was consistently activated by action observation. This was particularly true for the observation of isolated hand actions (Fig. 8D) to which LB2 responded significantly in all four animals. The sector lateral to LB2 and MSTd were hardly activated by action observation. Conversely, most regions consistently activated by action observation were not labeled after AIP injections (STPm, LB1/LST border, FST, and MT/V5).
Thus, considering the fMRI data, LB2 appears to be the major source supplying AIP with visual information about grasping actions. This is not to say that there are no other connections between STS and AIP but those are unlikely to play a major role in action observation signals. This functional segregation among the anatomical connections supports the notion of a specific functional route by which we mean an anatomical connection that transmits specific types of information, here visual information on observed grasping. Other functional routes are likely to link the STS with AIP. For example, the more lateral regions of the lower bank near LB2 belong to inferotemporal cortex and are close to region TEs in which Janssen et al. (2000) recorded higher-order disparity selective neurons. This lateral route may therefore have a different functional role, related to two-dimensional and three-dimensional shape (Borra et al., 2008). The same distinction between functional route and anatomical connection also applies to the STS–PFG connections described earlier. Indeed, MSTd houses neurons selective for translation as well as optic flow components rotation or expansion (Tanaka et al., 1986; Lagae et al., 1994). Thus, the MSTd–PFG connection may be a functional route, different from the STPm–PFG action observation route, sending translation and optic flow signals to PFG (Fig. 8A).
Anatomical connections between STS, parietal and frontal cortex
Since we had found in earlier studies that a third frontal region—area 45B—was activated by action observation (Nelissen et al., 2005) and that this region is connected with both STS and IPS (Gerbella et al., 2010), we also investigated the connections of area 45B with our functional regions in both IPL and STS. Figure 10B presents the labeling pattern in the ipsilateral STS after injection into 45B (Fig. 10A). The strongest labeling was found in the fundus and lower bank, at approximately the location of the border between LST and LB1, extending laterally into TE toward the lip of the lower bank. This is confirmed by the consistency analysis (Fig. 8C), which indicates a large region of projection to 45B in LB1, including the borders with UB1 and LST, and extending laterally in the lower bank. In addition, a small consistent region seems also to be present anterior to LB2. Note that the labeling pattern after 45B injection appears in respects complementary to that of the AIP and PFG injections, except for some overlap very laterally in the lower bank at +9 mm.
LB1 and a small region just anterior to LB2 were consistently activated by action observation, both isolated hand actions (four animals) (Fig. 8D) and acting persons (two animals) (Fig. 8E). The other STS regions consistently activated by action observation have little labeling after 45B injections (Fig. 8C), except perhaps the rostral border of FST. Thus, action observation information from STS can reach 45B, directly, through connections arising mainly from LB1 and the rostral part of LST, and possibly supplemented by rostral FST and the rostral tip of the lower bank.
Finally, Figure 11 shows the connections of area 45B and area F5a with the areas located in the ipsilateral IPS. The strongest labeling after injection into F5a is found in area AIP (Fig. 11B, red labeling), whereas after injection into area 45B such labeling is located in the more posterior, oculomotor regions of the lateral bank, including LIPa (Fig. 11B, green labeling). Thus, 45B is connected with STS and IPS regions involved in action observation, but these regions appear mostly segregated from those belonging to the STS–F5 routes.
Heterogeneity of STS and parietal ROIs responding to action observation
The visual responses of F5c and F5a during grasping observation differ, the former requiring the agent performing the action to be visible, the latter not (Nelissen et al., 2005). This result has been replicated in M15 at 3 T (supplemental Fig. S2, available at www.jneurosci.org as supplemental material). However, the STS areas from which the two action observation routes originate (STPm and LB2) responded to either type of grasping videos. Thus, the F5c responses might be difficult to explain unless the neurons providing its main visual input (in PFG) show a certain degree of distinction between responses to hand actions performed with and without the agent visible.
To investigate the heterogeneity of the visual responses in the ROIs conveying visual information about actions to F5, we quantified in each of the four subjects the degree of correlation between the voxel activation levels (percentage MR signal change) during the observation of an acting person and isolated hand action. The analysis (Fig. 12) showed that the STPm and LB2 voxels responding to acting person observation also responded to isolated hand action observation: for STPm the median R2 equaled 0.78 and the correlation was significant in all four subjects, whereas for LB2 the median R2 was 0.64 and the correlation was significant in three of the four subjects. Interestingly, for PFG the median R2 equaled just 0.09 and the correlation between acting person and isolated hand actions was significant in only two subjects. In AIP, however, correlations were significant in all four subjects, with a median explained variance of 85%. Differences in the degree of correlation present in PFG compared with its input STPm and its neighbor AIP were significant in three of the four subjects (Fig. 12). Thus, although there seems to be relative little difference in the neuronal selectivity for grasping observation with and without the agent visible in the STS regions conveying action information to F5 (LB2 and STPm), nor in AIP, the dissociation between these two visual stimuli is quite large in PFG.
The strong correlations between the levels of activity in the individual voxels of STPm and LB2 evoked by the acting person and the isolated hand actions indicate that despite the many low-level visual differences between the two types of movies, such as color or size, all voxels of these regions respond similarly. This finding strongly suggests that the responses in these STS regions are driven by the motion pattern, portraying the action. To verify whether this is also true at the earliest level in the STS, we also computed the correlations for the MT/V5 ROI. The median R2 value here was also high, 0.72 (range, 0.25–0.93; significant in all four subjects), indicating that even at this early level the responses to the action videos are driven predominantly by the motion pattern rather than other low-level visual features.
Additional evidence for the heterogeneity of PFG was provided by a correlation analysis of the action observation and translation responses. The PFG responses were not significantly stronger for the observation of hand actions compared with a translation control in experiment 2. Since PFG receives input both from STPm and MSTd and the latter region responds to translation (Duffy and Wurtz, 1991; Lagae et al., 1994), it is conceivable that these two types of responses belong to different sets of PFG voxels, reinstating a sensitivity of PFG for action observation at the voxel level. Consistent with this view, hand translation and object observation correlated far less in PFG (R2 = 0.24) (supplemental Fig. S3, available at www.jneurosci.org as supplemental material) than in STPm (R2 = 0.82), or in LB2 and AIP (R2 = 0.57, R2 = 0.42, respectively), where the hand translation responses are systematically weaker than the action observation responses (supplemental Figs. S1, S3, available at www.jneurosci.org as supplemental material). The difference in correlation between PFG and its input, STPm, was significant. This suggests that the voxels in PFG are more heterogeneous than those in their input region, STPm.
The present study demonstrates that several regions in the STS are activated by action observation. They include, beyond MT/V5 and its FST satellite, STPm, LST, and more rostrally, LB2. Two of these regions, STPm and LB2, are the main starting points of two functional routes linking STS with F5: one via PFG and one through AIP. In the following sections, we discuss these three aspects, STS and parietal activations and functional routes.
Functional routes conveying grasping-related visual information to the frontal lobe
Anatomical results indicate that area PFG receive consistent (across animals) connections from three STS regions. Of these three, only one, STPm, is systematically activated by action observation. Similarly, anatomical results consistently indicate that AIP receives input from LB2 and neighboring lower bank cortex, and from two small regions in the upper bank. Of these regions, only one, LB2, displays consistent action observation responses. Both PFG and AIP have extensive connections with F5 (Rozzi et al., 2006; Borra et al., 2008), but the functional profiles of F5c and F5a/F5p (Nelissen et al., 2005) (supplemental Fig. S2, available at www.jneurosci.org as supplemental material) appear to be compatible only with those of PFG and AIP, respectively. Hence, we propose that two main functional routes link STS with F5: one via PFG, and the other through AIP (Fig. 13, red and blue arrows). These functional routes are a subset of the anatomical connections that link the ventral premotor cortex, posterior parietal cortex, and STS. A third possible action observation pathway links STS with 45B (Fig. 13), suggesting that action observation signals can also reach the prefrontal cortex directly, perhaps for the purpose of oculomotor control (Flanagan and Johansson, 2003; Gerbella et al., 2010).
Although both STS–F5 functional routes convey visual grasping information, the PFG route appears to be more sensitive to the presence of the agent in the video. In contrast, the AIP route appears to be focused on the object, which is the goal of the action. Indeed, a strong interaction was observed in AIP depending on presence of the object, and a weak, nonsignificant effect of object presence was also found in LB2. This main effect reached significance in M6 (supplemental Fig. S1, available at www.jneurosci.org as supplemental material), for whom a significant interaction was also observed in F5a. These functional differences suggest that the routes may serve different behavioral functions. The PFG route may be more important for extracting the intention behind the observed motor act. This view is supported by the single-cell studies (Fogassi et al., 2005; Bonini et al., 2010) revealing that PFG neuronal activity depends on the type of grasping observed, but also on how the grasped object will subsequently be used by the actor. This property fits with the concept of motor intention, that is, the reason behind a performed motor act (e.g., grasping to eat or place). The AIP route may make the immediate goal of the motor act more explicit, not only in terms of how the hand shaping relates to the intrinsic object properties but also in terms of different hand motor acts (grasping vs placing, dropping, etc.). Of course, these two functional routes are active simultaneously and exchange information by means of their reciprocal connections, providing an understanding of both the intention and the goal.
So far, we have considered only the forward projections along the two functional routes. However, cortical connections are reciprocal. This implies that signals from F5 and PFG or AIP neurons can be transmitted backward to the STS. We suggest that these backward connections are instrumental in binding the pragmatic meaning of actions, underpinned by the parieto-premotor activity, with their visual description in STS.
STS regions responsive to action observation
The cortex buried within the primate STS is known to be involved in the processing of observed actions (Allison et al., 2000; Puce and Perrett, 2003). Our data indicate that, anterior to MT/V5 and its satellites, three STS regions, STPm in the upper bank and LST and LB2 in the lower bank, are specifically activated by grasping observation. The predominance of action encoding in the lower bank, confirmed by the voxel-based analysis, is consistent with single-unit recordings in anterior STS. Although initial studies (Bruce et al., 1981; Oram and Perrett, 1994, 1996) emphasized the presence of neuronal responses to observed actions in the upper bank, several subsequent studies (Perrett et al., 1989; Barraclough et al., 2005, 2009; Vangeneugden et al., 2009; Singer and Sheinberg, 2010) have reported neurons responding to action observation also in the lower bank. Perrett et al. (1989) first documented neuronal responses specific for grasping in the lower bank. This finding was confirmed more recently by Barraclough et al. (2009), who reported that between AP levels +6 and +10 a 76% majority of neurons responding to hand actions were located in the lower bank, and by Singer and Sheinberg (2010), who recorded at levels +12 to +15. Our results indicate that the proportion of hand action neurons in the two banks depends on AP levels and stimulus type (acting person vs isolated hand actions).
Our results also indicate that the action observation activations are localized deep in the STS. These activation ribbons are reminiscent of the architectonic regions PGa and IPa of Seltzer and Pandya (1978). This resemblance, however, is only superficial, since the strips of action observation responsive cortex, especially in the lower bank, consist of several regions with distinct functional properties (Desimone and Ungerleider, 1986; Nelissen et al., 2006). In addition, the activation in the lower bank of aSTS clearly extends beyond PGa into area TE; hence, the action observation network includes part of the inferotemporal cortex.
There are marked visual differences in the two types of videos used in the present study: among others are differences in size, duration, color, and texture. Yet the activities of individual voxels in the STS regions were strongly correlated for the two types of stimuli. In addition, the slope of the regression line was close to 1 in most regions, indicating similar mean levels of response to the two types of videos. This indicates that the responses to action observation in the STS mainly reflect the motion patterns portraying the action and were relatively little influenced by the low-level visual differences.
IPL regions responsive to action observation
Observation of hand actions activated three parietal regions: area PFG, AIP, and LIPa. The activation of PFG is in accord with single-neuron studies showing that, of the four IPL convexity areas, only PFG houses grasping mirror neurons (Fogassi et al., 2005; Rozzi et al., 2008). The present data also fit with the physiological studies in several other respects. In the 1.5 T data, most PFG voxels appeared to be visually responsive, but with increasing resolution at 3 T only a fraction of them (29%) proved visually responsive. Rozzi et al. (2008) observed that only 60% of the PFG neurons were visually responsive, but some of these neurons responded to stimuli not tested here. Furthermore, no PF voxel was visually responsive at 3 T. Similarly, Rozzi et al. (2008) observed <10% visual neurons in PF. Second, correlation analysis showed that the PFG voxels are visually heterogeneous, responding to at least three types of stimuli: acting person, isolated hand actions, and mere translation. Rozzi et al. (2008) described seven types of visual responses, two of which, the mirror and motion neurons, responded to stimuli matching those we used. Third, although the PFG activation during action observation reached significance in most tests, there were two exceptions (Table 2). This fits with the relatively small proportion (15%) of mirror neurons in PFG (Rozzi et al., 2008). Finally, the presence of correlation between responses to acting person and isolated hand actions is STPm and its absence in PFG implies that novel neuronal responses to action observation may arise in PFG.
In addition to PFG, our data show that area AIP is also active during the observation of hand-grasping actions, but, unlike PFG, the responses are not influenced by the visibility of the agent. According to Sakata et al. (1995), area AIP plays a fundamental role in visuomotor transformations for grasping. Note, however, that Sakata et al. (1995), in addition to neurons responding to the observation of objects (“object” type neurons), also described a set of neurons related to the sight of the hand moving toward the target (“nonobject” type neurons). Furthermore, recordings of single neurons in AIP have occasionally revealed the presence of grasping mirror neurons in this area (Rizzolatti et al., 2009) (S. Rozzi, P. F. Ferrari, L. Bonini, G. Rizzolatti, L. Fogassi, unpublished observations). Finally, action observation activated LIPa, the LIP sector where the central visual field is represented (Blatt et al., 1990; Fize et al., 2003) and fixation-related neurons have been described (Ben Hamed et al., 2001). Additional single-neuron recordings are needed to interpret this activation.
We have documented a double STS–F5 network using either PFG or AIP as parietal relay. Activity in this network may underlie the full perception of goal-directed actions, linking their goal and intention to their specific visual aspects.
This work was supported by Fonds voor Wetenschappelijk Onderzoek (FWO) Grant G151.04 (G.A.O.), Excellentie Financiering Katholieke Universiteit Leuven Grant EF 05/14 (G.A.O.), Interuniversity Attraction Pole Grant 6/29 (G.A.O.); FWO Grant G.0.593.09 and National Institutes of Health Grant R21 NS064432-01 (W.V., K.N.); Ministero dell' Università e della Ricerca (MUR) Grant PRIN 2006052343-002 (G.L.); and grants from Agenzia Spaziale Italiana and MUR (G.R.). The 3T scanner was purchased with a “Zware Apparatuur” 2004 grant from the region Flanders. Laboratoire Guerbet (Roissy, France) provided the contrast agent Sinerem. K.N. is a postdoctoral research fellow of the FWO. We thank A. Coeman, W. Depuydt, M. De Paep, C. Fransen, P. Kayenbergh, G. Meulemans, R. Peeters, S. Verstraeten, and Dr. H. Kolster for technical assistance.
- Correspondence should be addressed to Koen Nelissen, Laboratorium voor Neuro- en Psychofysiologie, Katholieke Universiteit Leuven Medical School, Herestraat 49, bus 1021, 3000 Leuven, Belgium.