Abstract
Selecting suitable grasps on three-dimensional objects is a challenging visuomotor computation, which involves combining information about an object (e.g., its shape, size, and mass) with information about the actor's body (e.g., the optimal grasp aperture and hand posture for comfortable manipulation). Here, we used functional magnetic resonance imaging to investigate brain networks associated with these distinct aspects during grasp planning and execution. Human participants of either sex viewed and then executed preselected grasps on L-shaped objects made of wood and/or brass. By leveraging a computational approach that accurately predicts human grasp locations, we selected grasp points that disentangled the role of multiple grasp-relevant factors, that is, grasp axis, grasp size, and object mass. Representational Similarity Analysis revealed that grasp axis was encoded along dorsal-stream regions during grasp planning. Grasp size was first encoded in ventral stream areas during grasp planning then in premotor regions during grasp execution. Object mass was encoded in ventral stream and (pre)motor regions only during grasp execution. Premotor regions further encoded visual predictions of grasp comfort, whereas the ventral stream encoded grasp comfort during execution, suggesting its involvement in haptic evaluation. These shifts in neural representations thus capture the sensorimotor transformations that allow humans to grasp objects.
SIGNIFICANCE STATEMENT Grasping requires integrating object properties with constraints on hand and arm postures. Using a computational approach that accurately predicts human grasp locations by combining such constraints, we selected grasps on objects that disentangled the relative contributions of object mass, grasp size, and grasp axis during grasp planning and execution in a neuroimaging study. Our findings reveal a greater role of dorsal-stream visuomotor areas during grasp planning, and, surprisingly, increasing ventral stream engagement during execution. We propose that during planning, visuomotor representations initially encode grasp axis and size. Perceptual representations of object material properties become more relevant instead as the hand approaches the object and motor programs are refined with estimates of the grip forces required to successfully lift the object.
Introduction
Grasping is one of the most frequent and essential everyday actions performed by humans and other primates (Betti et al., 2021), yet planning effective grasps is computationally challenging. Successful grasping requires identifying object properties including shape, orientation, and mass, and considering how these interact with the capabilities of our hands (Fabbri et al., 2016; Maiello et al., 2019; Klein et al., 2020). Whether an object is large or small, heavy or light, determines how wide we open our hands to grasp it and how much force we apply to lift it (Johansson and Westling, 1988; Cesari and Newell, 1999). Such grasp-relevant object properties, including weight, mass distribution, and surface friction can often be inferred visually before initiating actions (Fleming, 2017; Klein et al., 2021).
A recent computational model accurately predicts precision-grip grasp locations on 3D objects of varying shape and nonuniform mass (Klein et al., 2020). The model combines multiple constraints related to properties of the object and the effector, such as the torque associated with different grasps and the actor's natural grasp axis. However, it remains unclear which brain networks are involved in computing specific grasping constraints. Moreover, it is unknown whether all constraints are estimated during grasp planning (i.e., before action initiation; Gallivan et al., 2013b, 2019) or whether some aspects are computed during action execution, allowing the actor to refine grasp parameters online before or during contact with the object. Here, we ask how information gets combined to evaluate and then execute grasps. Although many previous studies have investigated the effects of individual attributes, during either grasp planning or execution, here we consider how multiple factors combine, and we compare both planning and execution.
Previous studies show that grasp-relevant representations are distributed across ventral and dorsal visual processing streams. Shape is represented throughout both streams (Sereno et al., 2002; Orban et al., 2006; Konen and Kastner, 2008; Orban, 2011), with dorsal representations emphasizing information required for grasp planning (Srivastava et al., 2009). For example, dorsomedial area V6A—located in human superior parieto-occipital cortex (SPOC)—is involved in selecting hand orientation given object shape (Fattori et al., 2004, 2009, 2010; Monaco et al., 2011). Visual representations of material properties—also crucial for grasping—have been identified predominantly in ventral regions such as lateral occipital cortex (LOC), the posterior fusiform sulcus (pFS), and parahippocampal place area (PPA; Cant and Goodale, 2011; Hiramatsu et al., 2011; Gallivan et al., 2014; Goda et al., 2014, 2016). Brain regions that transform these disparate visual representations into appropriate motor codes include anterior intraparietal sulcus (aIPS), ventral premotor cortex (PMv), dorsal premotor cortex (PMd), and primary motor cortex (M1). Primate neurophysiology suggests that PMv (primate area F5) encodes grip configuration (Murata et al., 1997; Raos et al., 2006; Theys et al., 2012), whereas PMd (primate area F2) encodes grip/wrist orientation (Raos et al., 2004). Both regions exhibit strong connections with aIPS, which could play a key role in linking visual representations—including those in ventral stream regions (Borra et al., 2008)—to motor commands sent to the hand through M1 (Murata et al., 2000; Janssen and Scherberger, 2015).
How information flows and is combined across this complex network of brain regions is far from understood. We therefore sought to identify cortical regions associated with distinct components of grasping and tested their relative importance during grasp planning and execution. To disentangle grasping constraints, we used our model (Maiello et al., 2020) to select grasps that placed different constraints in conflict. For example, a selected grasp could be nearly optimal in terms of the required hand axis but suboptimal in terms of grasp aperture. We then measured functional magnetic resonance imaging (fMRI) blood oxygenation level-dependent (BOLD) activity during planning and execution of these preselected grasps. Combining this model-guided approach with representational similarity analysis (RSA; Kriegeskorte, 2008) allowed us to tease apart the relative contributions of object mass, grasp size, and grasp axis at different stages of grasping.
Materials and Methods
Participants
Analyses used data from 21 participants (13 female, mean age, 25.5 years; range, 18–33) recruited from the University of Western Ontario. Data from two additional participants were excluded because of excessive head motion. All participants had normal or corrected-to-normal vision and were fully right-handed as measured by the Edinburgh Handedness Inventory. Informed consent was given before the experiment. The study was approved by the Health Sciences Research Ethics Board at the University of Western Ontario and followed the principles in the sixth revision of the Declaration of Helsinki (2008). Participants were instructed on how to perform the experimental task before entering the MRI room, yet remained naive with respect to the hypotheses of the study. Participants were financially compensated at a rate of $25 Canadian per hour.
Setup
A schematic of our setup is shown in Figure 1A. Participants lay supine inside the MRI scanner with their head placed in a head coil tilted by ∼30° to allow direct viewing of real stimulus objects placed in front of them. Below the head we positioned the bottom 20 channels of a 32-channel head coil, and we suspended a four-channel flex coil via a Loc-Line (Lockwood Products) over the forehead. A black wooden platform placed above a participant's hip, enabled the presentation of real objects that participants were required to grasp, lift, and set back down using their right hand. The flat surface of the platform was tilted by ∼15° toward a participant to maximize comfort and visibility. Objects were placed on a black cardboard target ramp (Fig. 1A; ramp dimensions, 15 × 5 × 13 cm) on top of the platform to create a level surface that prevented objects from tipping over. The exact placement of the objects was adjusted so that all required movements were possible and comfortable. Between trials, a participant's right hand rested on a button at a start position on the lower right side of the table. The button monitored movement start and end times. Participants upper right arm was strapped to their upper body and the MRI table using a hemicylindrical brace (Fig. 1A, brace not shown). This prevented shoulder and head movements, thus minimizing movement artifacts while enabling reach-to-grasp movements through elbow and wrist rotations. A small red LED fixation target was placed above at a slightly closer depth location than the object to control for eye movements. Participants were required to maintain fixation on this target at all times during scanning. An MR-compatible camera was positioned on the left side of the head coil to record the participant's actions. Videos of the runs were screened off-line, and trials containing errors were excluded from further analyses. A total of 22 error trials were excluded, 18 of which occurred in one run where the participant erroneously grasped the objects during the planning phase.
Two bright LEDs illuminated the workplace for the duration of the planning and execution phases of each trial; one was mounted on the head coil, and the other was taped to the ceiling of the bore. Another LED was taped to the outside of the bore and was only visible to the experimenter to cue the extraction and placement of the objects. The objects were kept on a table next to the MRI scanner on which three LEDs cued the experimenter on which object to place inside the scanner. Participants wore MR-safe headphones to relay task instructions on every trial. The LEDs and headphones were controlled by a MATLAB script on a PC that interfaced with the MRI scanner. Triggers were received from the scanner at the start of every volume acquisition. All other lights in the MRI room were turned off, and any other potential light sources and windows were covered so that no other light could illuminate the participant's workspace.
Stimuli
Stimuli were three L-shaped objects of the same size, created from seven blocks (cubes, 2.5 cm side length). One object was constructed with seven cubes of beech wood (object weight, 67 g), whereas the other two were both constructed of four brass and three wooden cubes (object weight, 557 g). We performed pilot testing to ensure that the objects and their movements did not evoke artifacts related to the movement of masses within the scanner (Barry et al., 2010). Specifically, we placed a spherical MRI phantom (immobile mass) in the scanner and collected fMRI data while the experimenter placed and removed the objects as they would in the actual experiment. Functional time courses were carefully examined to ensure that no artifacts were observed (such as spikes or abrupt changes in signal at the time of action; Culham, 2006; Singhal et al., 2013). The two identical wood/brass objects were positioned in two different orientations, one with the brass arm pointing up (Fig. 1F, BrassUp), the other with the brass arm lying down (BrassDown). In a slow event-related fMRI design, on each trial, participants directly viewed, grasped, and lifted an object placed on a platform.
Task
Participants performed three distinct grasps per object with each grasp marked on the objects with colored stickers during the experiment. The colors were clearly distinguishable inside the scanner and served to cue participants about which grasp to perform. Participants were instructed to perform three-digit grasps with their right hand, by placing the thumb in opposition to index and middle fingers. This grasp was similar to the precision grip grasps used in our previous work (Maiello et al., 2019, 2020; Klein et al., 2020, 2021) but ensured participants could apply sufficient grip force to lift all objects to a height of ∼2 cm above the platform. Grasp contact locations for the index and thumb were selected to produce a set of uncorrelated—and thus linearly independent—representational dissimilarity matrices (RDMs) for the three grasp factors investigated, that is, grasp axis, grasp size, and object mass. Specifically, grasps could be rotated 45° either clockwise or counterclockwise around the vertical axis and could require small (2.5 cm) or large (7.5 cm) grip apertures. In pilot testing we further refined the positioning of the objects and grasps within the magnetic field of the MRI scanner to avoid forming eddy currents within the brass parts of the objects that could hinder participants from executing the grasps. The complete set of grasp conditions is shown in Figure 1C.
Experimental design and statistical analysis
fMRI experimental procedure
We employed a slow-event-related fMRI design with trials spaced every 23–31 s. Participants underwent four experimental runs in which they performed each combination of three objects times three grasps twice per run (18 trials × run, 72 trials total) in a pseudorandom order to minimize trial order effects (van Polanen and Davare, 2015a; Maiello et al., 2018; van Polanen et al., 2020). The sequence of events occurring on each trial is schematized in Figure 1B. Before each trial, the experimenter was first cued on which object to place inside the scanner. The experimenter placed the object on the ramp. At trial onset, the illumination LEDs turned on, and the participant heard the instruction plan over the headphones, immediately followed by the auditory cue specifying which grasp to execute. The auditory cue was blue, green, or red, which corresponded to colored stickers marking the grasp locations on the objects. The duration of the planning phase of the task was randomly selected to be 6, 8, 10, or 12 s. During this time, the participant was required to hold still and mentally prepare to grasp the object at the cued location. Following previous research (Gallivan et al., 2014, 2016), we used a variable delay between cue and movement onset to distinguish sustained planning-related neural activity from the movement-execution response accompanying action initiation. It is important to note that we use the term “action planning” for a sustained action planning previewing phase in which participants are thinking about how to execute the movement and must thus access mental representations of the object and task. In this kind of delayed action task, previous work has demonstrated that dorsal stream areas plan and maintain action goals (Singhal et al., 2013). We specifically do not mean the purely feedforward movement planning that occurs only a few hundred milliseconds before movement initiation (Westwood and Goodale, 2003) because it is unfeasible to investigate neural signals at this timescale though fMRI BOLD activity.
Once the planning phase ended, the word “Lift” was spoken through the headphones to cue the participant to execute the grasp. During the execution phase of the task, the participant had 7 s to reach, grasp, and lift the object straight up by ∼2 cm, place it back down on the target ramp, and return their hand to the start position. The illumination LEDs turned off, and the participant waited for a 10–12 s intertrial interval (ITI) for the next trial to begin. During the ITI the experimenter removed the object and placed the next one before the onset of the following trial. We note that we did not include a passive preview phase in our trial design because we have repeatedly shown in previous studies that action intentions cannot be decoded from neural activity recorded during passive stimulus preview (Gallivan et al., 2011, 2013a,b).
Participants were instructed about the task, familiarized themselves with the objects, and practiced the grasps outside the MRI room for ∼5 min before the experiment. Once participants were strapped into the setup, they practiced all grasps again, thus ensuring that they could comfortably grasp each object.
Grasp comfort ratings
At the end of the fMRI experiment, participants remained positioned in the scanner and performed a short rating task. Participants were asked to perform one more time each of the nine grasp conditions. For each grasp, participants verbally reported how comfortable the grasp was on a scale of 1–10 (1 being highly uncomfortable and 10 being highly comfortable). Verbal ratings were manually recorded by the experimenter.
Analyses
Data analyses were conducted using BrainVoyager 20.0 (BV20) and 21.4 software packages (Brain Innovation) as well as MATLAB version R2019b.
fMRI data acquisition
Imaging was performed using a Siemens 3T Magnetom Prisma Fit MRI scanner at the Robarts Research Institute at the University of Western Ontario. Functional MRI volumes were acquired using a T2*-weighted, single-shot, gradient-echo, echoplanar imaging acquisition sequence. Functional scanning parameters were time to repetition (TR) = 1000 ms, time to echo (TE) = 30 ms, field of view = 210 × 210 mm in plane, 48 axial 3 mm slices, voxel resolution = 3 mm isotropic, flip angle = 40°, and multiband factor = 4. Anatomical scans were acquired using a T1-weighted MPRAGE sequence with the following parameters: TR = 2300 ms; field of view = 248 × 256 mm in plane, 176 sagittal 1 mm slices; flip angle = 8°; 1 mm isotropic voxels.
fMRI data preprocessing
Brain imaging data were preprocessed using the BV20 Preprocessing Workflow. First, we performed Inhomogeneity Correction and extracted the brain from the skull. We then coregistered the functional images to the anatomic images and normalized anatomic and functional data to Montreal Neurological Institute (MNI) space. Functional scans underwent motion correction and high-pass temporal filtering (to remove frequencies below three cycles/run). No slice scan time correction and no spatial smoothing were applied.
General linear model
Data were further processed with a random-effects general linear model that included one predictor for each of the 18 conditions [three grasp locations times three objects times two phases (planning versus execution)] convolved with the default BrainVoyager two-gamma hemodynamic response function (Friston et al., 1998) and aligned to trial onset. As predictors of no interest, we included the six motion parameters (x, y, and z translations and rotations) resulting from the 3D motion correction.
Definition of regions of interest
We investigated a targeted range of regions of interest (ROIs). The locations of these ROIs are shown in Figure 1H. The criteria used to define the regions and their MNI coordinates are provided in Table 1. ROIs were selected from the literature as regions most likely specialized in the components of visually guided grasping investigated in our study. These included primary visual cortex (V1), areas LOC, pFS, and PPA within the ventral visual stream (occipitotemporal cortex), areas SPOC, aIPS, PMv, PMd within the dorsal visual stream (occipitoparietal and premotor cortex), and M1/primary somatosensory cortex (S1).
Regions of interest and their peak x, y, and z coordinates in MNI space
V1 was included because it represents the first stage of cortical visual processing on which all subsequent visuomotor computations rely. Primary motor area M1 was included instead as the final stage of processing, where motor commands are generated and sent to the arm and hand. In our study, however, we refer to this ROI as M1/S1 because our volumetric data do not allow us to distinguish between the two banks of the central sulcus along which motor and somatosensory regions lie.
We next selected regions believed to perform the sensorimotor transformations that link visual inputs to motor outputs. The dorsal visual stream is thought to be predominantly specialized for visually guided actions, whereas the ventral stream mostly specializes in visual object recognition (Goodale and Milner, 1992; Culham et al., 2003; Cavina-Pratesi et al., 2007; Vaziri-Pashkam and Xu, 2017). Nevertheless, significant cross talk occurs between these streams (Budisavljevic et al., 2018), and visual representations of object material properties have been found predominantly in ventral regions. We therefore selected areas across both dorsal and ventral visual streams that would encode grasp axis, grasp size, and object mass.
We expected grasp axis could be encoded in dorsal stream regions SPOC (Fattori et al., 2004, 2009, 2010; Monaco et al., 2011), aIPS (Taubert et al., 2010), PMv (Murata et al., 1997; Raos et al., 2006; Theys et al., 2012), and PMd (Raos et al., 2004). We expected grasp size to be encoded in dorsal stream regions SPOC, aIPS (Monaco et al., 2015), PMd (Monaco et al., 2015), and PMv (Murata et al., 1997; Raos et al., 2006; Theys et al., 2012), and ventral stream region LOC (Monaco et al., 2015). We expected visual estimates of object mass to be encoded in ventral stream regions LOC, pFS, and PPA (Cant and Goodale, 2011; Hiramatsu et al., 2011; Gallivan et al., 2014; Goda et al., 2014, 2016). We further hypothesized that the network formed by aIPS, PMv, and PMd might play a role in linking ventral stream representations of object mass to the motor commands generated and sent to the hand through M1 (Murata et al., 2000; Borra et al., 2008; Davare et al., 2009, 2010, 2011; Janssen and Scherberger, 2015; van Polanen and Davare, 2015b; Schwettmann et al., 2019; Schmid et al., 2021).
It should be noted that we do not expect the set of ROIs investigated here to be the exhaustive set of regions involved in visually guided grasping. For example, subcortical regions are also likely to play a role (Nowak et al., 2007; Prodoehl et al., 2009; Cavina-Pratesi et al., 2018). However, cortical and subcortical structures require different imaging protocols (De Hollander et al., 2017; Miletić et al., 2020), and the small size and heterogeneity of subcortical structures also require different normalization, coregistration, and alignment techniques than those used in the cortex (Diedrichsen et al., 2010). Moreover, adding further ROIs would reduce statistical power when correcting for multiple comparisons. We thus chose to focus on a constrained set of cortical regions for which we had a priori hypotheses regarding their involvement in the aspects of visually guided grasping investigated here. Nevertheless, we hope that exploratory analyses on our open access data may guide future studies mapping out the distributed neural circuitry involved in visually guided grasping.
Figure 1H shows our selected ROIs as volumes within the Colin 27 template brain (https://nist.mni.mcgill.ca/colin-27-average-brain-2008/). To locate all left-hemisphere ROIs (except V1) in a standardized fashion, we searched the automated meta-analysis website https://neurosynth.org (Yarkoni et al., 2011) for key words (Table 1), which yielded volumetric statistical maps. Visual inspection of the maps allowed us to locate the ROIs we had preselected based on a combination of activation peaks, anatomic criteria, and expected location from the relevant literature. For example, aIPS was selected based on the hotspot for grasping nearest to the intersection of the intraparietal and postcentral sulci (Culham et al., 2003). Spherical ROIs of 15 mm diameter, centered on the peak voxel, were selected for all regions except V1. Because Neurosynth is based on a meta-analysis of published studies, search terms like V1 would be biased to the typical retinotopic locations used in the literature and likely skewed toward the foveal representation (whereas the objects and hand would have been viewed across a larger expanse within the lower visual field). As such, we defined V1 in the left-hemisphere V1 using the Wang et al. (2015) atlas, which mapped retinotopic cortex plus or minus at ∼15° from the fovea. Table 1 presents an overview of our ROI selection where we list all our Neurosynth-extracted ROIs with their peak coordinates, search terms, and download dates. We also share our ROIs (in MNI space) in the nifti format.
Representational similarity analysis
The analysis of activation patterns within the selected ROIs was performed using multivoxel pattern analysis, specifically, RSA (Kriegeskorte, 2008; Kriegeskorte et al., 2008). An activation pattern corresponded to the set of normalized beta-weight estimates of the BOLD response of all voxels within a specific ROI for a specific condition. To construct RDMs for each ROI, we computed the dissimilarity between activation patterns for each condition. Dissimilarity was defined as 1-r, where r was the Pearson correlation coefficient. RDMs were computed separately from both grasp planning and grasp execution phases. These neural RDMs computed were then correlated to model RDMs (Fig. 1D–F) to test whether neural representations encoded grasp axis, grasp size, and object mass. To estimate maximum correlation values expected in each region given the between-participant variability, we computed the upper and lower bounds of the noise ceiling. The upper bound of the noise ceiling was computed as the average correlation of each participant's RDMs with the average RDM in each ROI. The lower bound of the noise ceiling was computed by correlating each participant's RDMs with the average of the other participants' RDMs. All correlations were performed between upper triangular portions of the RDMs excluding the diagonal. We then used one-tailed Wilcoxon signed rank tests to determine whether these correlations were significantly above zero within each ROI. We set statistical significance at p < 0.05 and applied false discovery rate (FDR) correction for multiple comparisons following Benjamini and Hochberg (1995).
To visualize the representational structure of the neural activity patterns within grasp planning and grasp execution phases, we first averaged RDMs across participants in each ROI and task phase. We then correlated average RDMs across ROIs within each phase and used hierarchical clustering and multidimensional scaling to visualize representational similarities across brain regions. We also correlated average RDMs across ROIs and across planning and execution phases. Statistically significant correlations (p < 0.05 with Bonferroni correction) are shown also as topological connectivity plots (within-phase data) and as a Sankey diagram (between-phase data; see Fig. 3F).
Grasp comfort ratings
Grasp comfort ratings were analyzed using simple t tests to assess whether ratings varied across different grasp axes, grasp sizes, or object mass. The difference between ratings for each condition was then used to create grasp comfort RDMs for each participant. Grasp comfort RDMs were correlated to model RDMs to further test how strongly grasp comfort corresponded to grasp axis, grasp size, and object mass. To search for brain regions that might encode grasp comfort, the average grasp comfort RDM was correlated to neural RDMs following RSA as described above.
Data availability
Data and analysis scripts are available from the Zenodo data repository (doi: 10.5281/zenodo.10055791).
Results
Participants in a 3T MRI scanner were presented with physical 3D objects on which predefined grasp locations were shown (Fig. 1A). On each trial, participants first planned how to grasp the objects (Fig. 1B, planning phase) and then executed the grasps (execution phase). We designed objects and grasp locations to produce a set of nine distinct conditions (Fig. 1C) that would differentiate three components of grasping—the grasp axis (i.e., orientation), the grasp size (i.e., the grip aperture), and object mass. By computing pairwise distances between all conditions for each of these grasp-relevant dimensions, we constructed one RDM for each component (Fig. 1D,F); these were uncorrelated across conditions. In each brain ROI tested in the study (Fig. 1H), brain-activity patterns elicited by each condition were compared with each other via Pearson correlation to construct brain RDMs. Figure 1G shows one such RDM computed from brain region PMv for one example participant during the planning phase. In this participant, this area appeared to strongly encode grasp axis.
Study design. A, Participants in the MRI scanner were cued to grasp 3D objects at specific locations marked by colored stickers. B, Sequence of events for one example trial during which participants were instructed to grasp the object at the predefined location marked by different colored dots or arrows. Trials began by illuminating the workspace. Through earphones, participants heard the plan instruction, followed by an auditory cue (blue, green, or red) specifying which grasp to execute based on the colored stickers marking grasp locations on the objects. This initiated the planning phase of the trial. After a jittered delay interval (6–12 s), participants heard the lift command, instructing them to perform the required grasp. This initiated the execution phase of the trial in which participants had 7 s to execute the grasp and return their hand to the start position. The illumination of the workspace was then extinguished, and participants waited for the following trial to begin. C, Preselected grasps on stimulus objects of wood and brass produced nine distinct conditions designed to differentiate three components of grasping using RSA. D–F, RDMs for grasp axis, grasp size, and object mass. Colored cells represent condition pairs with zero dissimilarity, white cells represent maximum dissimilarity. G, An example RDM computed from fMRI BOLD activity patterns in region PMv of one participant during the planning phase. Note the strong similarity to the grasp axis RDM in D. H, Visualization of the selected ROIs within the Colin 27 template brain. All ROIs except V1 were built as spheres centered on coordinates recovered from https://neurosynth.org. V1 coordinates were taken from the Wang et al. (2015) atlas. Note that surface rendering is for presentation purposes only as data were analyzed in volumetric space, and no cortex-based alignment was performed.
How grasp-relevant neural representations develop across the grasp network
Figure 2A shows average neural RDMs computed throughout the network of visuomotor brain regions we investigated. ROIs were selected from the literature as regions most likely specialized in the components of visually guided grasping investigated in our study. We included V1 as the first stage of cortical visual processing. Areas LOC, pFS, and PPA within the ventral visual stream (occipitotemporal cortex) were included as they are known to process visual shape and material appearance (Cant and Goodale, 2011; Hiramatsu et al., 2011; Gallivan et al., 2014; Goda et al., 2014, 2016), and could thus be involved in estimating object mass. Areas SPOC, aIPS, PMv, and PMd within the dorsal visual stream (occipitoparietal and premotor cortex) were included as they are thought to transform visual estimates of shape and orientation into motor representations (Janssen and Scherberger, 2015). M1/S1 in the central sulcus was included as the final stage of cortical sensorimotor processing. The patterns of correlations between model and neural RDMs across participants and ROIs (Fig. 2B–G) reveal which information was encoded across these visuomotor regions during grasp planning and execution phases.
RSA results. A, Mean neural RDMs computed in the nine ROIs included in the study. For visualization purposes only, RDMs within each region are first averaged across participants and then normalized to the full range of the LUT. B–G, Correlations between model and neural RDMs in each brain ROI during planning (B, D, F, top) and execution phases (C, E, G, bottom). In bar graphs, gray-shaded regions represent the noise ceiling for each ROI. Bars indicate means, error bars indicate 95% bootstrapped confidence intervals. The same data are represented topographically as dots scaled proportionally to the mean correlation in each region. Bright colors represent significant positive correlations (p < 0.05 with FDR correction); correlations shown in dark colors are not statistically significant.
Grasp axis encoding in visuomotor regions during grasp planning
Figure 2, B and C, shows that neural representations in V1 and ventral region LOC were significantly correlated with grasp axis during both grasp planning and execution phases. In contrast, representations in ventral areas pFS and PPA were never significantly correlated with grasp axis. Further, grasp axis was significantly correlated with neural representations across all dorsal areas (SPOC, aIPS, PMv, PMD), as well as M1/S1, but only during grasp planning. Dorsal and motor areas thus robustly encoded the orientation of the hand when preparing to grasp objects, suggesting that the hand-wrist axis was among the first components of the action computed across these regions.
Grasp size was encoded across both visual streams during grasp planning and execution
During the planning phase (Fig. 2D), grasp size significantly correlated with neural representations in all ventral areas (LOC, pFS, PPA) and with representations in dorsal regions aIPS and PMd. During the execution phase (Fig. 2E), grasp size remained significantly correlated with neural representations in ventral areas LOC and PPA but not pFS. In the dorsal stream during the execution phase, grasp size remained significantly correlated with neural representations in PMd but not aIPS and became significantly correlated with representations in PMv. Neural representations in early visual area V1 were significantly correlated with grasp size only in the execution phase but not during planning. Thus, different ventral and dorsal areas encoded grasp size at different time points. These data suggest that ventral regions may have been initially involved in computing grasp size and might have relayed this information (e.g., through aIPS) to the premotor regions tasked with generating the motor codes to adjust the distance between fingertips during the execution phase. It is perhaps surprising to note that neural representations in M1/S1 were never significantly correlated with grasp size, given the well-established role of these regions in sensorimotor processing and motor control. These patterns may align however with findings from (Monaco et al., 2015), which suggest that M1/S1 is insensitive to object size and could be related to previous work (Smeets and Brenner, 1999, 2001; Smeets et al., 2019), which proposes that grip formation emerges from independently controlling the movements of the digits rather than the size of the grip aperture.
Object mass was encoded across dorsal and ventral streams and in motor areas but only during grasp execution
During the planning phase (Fig. 2F), none of the investigated ROIs exhibited any activity that was significantly correlated with object mass. Conversely, during the execution phase (Fig. 2G), object mass significantly correlated with representations in ventral areas pFS and PPA, dorsal areas aIPS and PMd, and sensorimotor area M1/S1. Object mass was thus encoded in the later stages of grasping. One possible interpretation is that this occurred when the hand was approaching the object and was preparing to apply appropriate forces at the fingertips. Alternatively, it could be because of sensory feedback about slippage once the object was lifted.
Representational similarities within the grasp network
We took the RDMs generated for each of the nine ROIs (Fig. 2) and correlated them with one another to reveal inter-ROI similarity relationships. Figure 3 summarizes the resulting second-order similarity relationships, both within and between planning and execution phases.
We find that neural representations were significantly correlated across many selected ROIs during both grasp planning (Fig. 3A) and execution (Fig. 3C). Of particular note is that during the planning phase, dorsal regions tended to correlate more strongly with one another, while during the execution phase, ventral regions showed more correlated representations. This is revealed by visualizing the inter-ROI similarities arranged topographically within a schematic brain (Fig. 3B,D), with the darkness of connecting lines between ROIs proportional to the correlations between their corresponding RDMs.
The representational structure of grasping. A, Matrix showing correlations of data RDMs between regions during the planning phase. White asterisks represent significant correlations (p < 0.05 with Bonferroni correction). B, The same data in A are shown through hierarchical clustering and 2D multidimensional scaling, and significant correlations are shown topographically. C, D, As in A, except for the planning phase. E, Correlations between ROIs across planning and execution phases. F, Sankey diagram depicting significant correlations from E.
During planning (Fig. 3B), the strongest correlations were among M1/S1, PMd, and aIPS; between V1 and SPOC; and to a lesser extent between SPOC and M1/S1. The structure of these representational similarities is shown also in the multidimensional scaling plot, where a gradient of information can be visualized from V1 through dorsal regions SPOC and aIPS toward motor regions PMd and M1/S1. In the execution phase (Fig. 3D) the similarities among brain regions formed two main clusters. One cluster of visual regions was formed by V1, SPOC, and LOC. The second cluster comprised aIPS, premotor areas PMv and PMd, and M1/S1. Hierarchical clustering, multidimensional scaling, and topographical plots all highlight how these two clusters appeared to share representational content predominantly through ventral stream regions pFS and PPA.
Shared representations across planning and execution phases
Neural representation patterns were also partly correlated across grasp planning and execution phases (Fig. 3E,F). Notably, aIPS representations during the planning phase were significantly correlated with representational patterns in ventral (PPA), dorsal (SPOC, PMd), and sensorimotor (M1/S1) regions during the execution phase. This suggests that aIPS may play a key role in linking grasp planning to execution. Further, neural representation patterns in nearly all ROIs (except PMv) during the planning phase were correlated with representations in V1 during the execution phase, and representations in PFs, SPOC, PMd, and M1/S1 during action planning were correlated with LOC representations during action execution. We speculate that this might reflect mental simulation, prediction, and feedback mechanisms at play (see below, Discussion).
Grasp comfort
We previously demonstrated that humans can visually assess which grasp is best among competing options and can refine these judgements by executing competing grasps (Maiello et al., 2020). These visual predictions and haptic evaluations of grasp comfort were well captured by our multifactorial model (Klein et al., 2020), suggesting they may play a role in grasp selection. We thus wondered whether we could identify, within the grasp network investigated here, brain regions that encoded visual predictions and haptic evaluations of grasp comfort. To this end, once an imaging session was completed, we asked participants (while still lying in the scanner) to execute once more each of the nine grasps and rate how comfortable each felt on a scale of 1–10. Comfort ratings were consistent across participants (Fig. 4A). Comfort was slightly modulated by grasp axis (Fig. 4B; t(20) = 3.3, p = 0.0037) and was not modulated by grasp size (Fig. 4C; t(20) = 0.89, p = 0.39). The factor that most affected grasp comfort was object mass, with heavy objects being consistently rated as less comfortable than light objects (Fig. 4D; t(20) = 8.1, p < 0.001). This was also evident when we computed RDMs from comfort ratings (Fig. 4E) and found these were significantly correlated with the model RDM for object mass (p < 0.001) but not with RDMs for grasp axis (p = 0.54) or grasp size (p = 0.83; Fig. 4F).
Grasp comfort. A, Average grasp comfort ratings for each grasp condition in the fMRI experiment. B–D, Grasp comfort ratings averaged across (B) grasp axis, (C) grasp size, and (D) object mass. E, Average RDM computed from participant comfort ratings. F, Correlations between grasp comfort and model RDMs. G, H, Correlations between grasp comfort and neural RDMs in each brain ROI during planning (G, top) and execution phases (H, bottom). In bar graphs, gray-shaded regions represent the noise ceiling for each ROI. Bright blue bars represent significant positive correlations (p < 0.05 with FDR correction); correlations shown in dark blue are not statistically significant. The same data are represented topographically as dots scaled proportionally to the mean correlation in each region. Bars indicate means, error bars indicate 95% bootstrapped confidence intervals; **p < 0.01, ***p < 0.001.
Neural representations of grasp comfort were present during both grasp planning and execution phases
To identify brain regions that encoded grasp comfort, we next correlated neural RDMs with the average RDM derived from participant comfort ratings. Neural representations in premotor regions PMv and PMd were significantly correlated with grasp comfort during grasp planning (Fig. 4G). During the execution phase instead, grasp comfort correlated with neural representations in ventral stream region PPA (Fig. 4H). This suggests that dorsal premotor regions encoded the visually predicted comfort of planned grasps (which in our conditions was primarily related to the object mass). Area PPA instead encoded comfort during the execution phase, and might thus be involved in the haptic evaluation of grasp comfort, or some other representation of material properties that correlate with comfort.
Discussion
Our results show that different regions within the two visual streams represent distinct determinants of grasping, including grasp axis, grasp size, and object mass; moreover, the coding of these attributes differed between grasp planning and execution. Most regions represented multiple factors at different stages. For example, aIPS activity correlated with both grasp axis and size during planning and with object mass during execution. We found that grasp axis, which is adjusted at the very beginning of reach-to-grasp movements (Cuijpers et al., 2004), was predominantly encoded across dorsal regions during planning. Grasp size, which is adjusted throughout reach-to-grasp movements (Cuijpers et al., 2004), was encoded in different sets of ventral and dorsal regions during grasp planning and execution. Object mass, which gains relevance when applying forces at the fingertips on hand-object contact (Johansson and Westling, 1988; Johansson and Flanagan, 2009), was instead encoded across ventral, dorsal, and motor regions during grasp execution.
Shift from dorsal to ventral stream regions between planning and execution
In the broadest terms, our analyses revealed an overall shift—in terms of representational similarity—from dorsal sensory and motor regions during the planning phase (Fig. 3A,B) to more ventral regions during execution (Fig. 3C,D). During planning, the most similar representations were between V1 and SPOC, SPOC and M1/S1, and among M1/S1, PMd, and aIPS, tracing an arc along the dorsal stream to frontal motor areas. SPOC is associated with representations of grasp axis (Monaco et al., 2011), as is parieto-occipital area V6A in the macaque, which together with V6 is thought to be the macaque homolog of human SPOC (Fattori et al., 2004, 2009, 2010; Pitzalis et al., 2013). The SPOC complex serves as a key node in the dorsal visual stream involved in the early stages of reach-to-grasp movements (Rizzolatti and Matelli, 2003). It is thus interesting to speculate that our findings likely represent the progressive transformation of grasp-relevant sensory representations of an object into explicit motor plans along the dorsal processing hierarchy. In contrast, along the ventral stream, individual ROIs (V1, LOC, PPA, pFS) shared similar representations with dorsal sensorimotor areas (particularly aIPS, M1/S1, and PMd) but only weak or no correlation with one another (or with PMv). During planning there was no visual movement to drive common responses, and it seems reasonable to assume that different ROIs extracted distinct aspects of the stimulus, leading to these rather weak correlations.
During action execution, the picture changed dramatically. Representations in the dorsal stream became more independent of one another. Notably, the high similarity between SPOC representations and the more frontal motor regions (M1/S1, aIPS, PMd, and PMv) almost disappeared, to be replaced with a stronger correlation with ventral shape-perception area LOC. At the same time, representational correlations between ventral visual regions V1, LOC, PPA and pFS, as well as their correlations with PMv increased. This may partly be because of the salient visual consequences of the participant's own actions providing a common source of variance across regions. It is interesting to speculate that the overall shift from similar dorsal to similar ventral representations reflects a shift from the extraction of action relevant visual information during planning to monitoring object properties to assess the need for corrections during action execution.
One of the more striking findings from representational similarity analysis (Fig. 3E,F) is that activity in V1 during execution correlated with representations in a slew of high visual and sensorimotor areas during the planning phase. (This is visible as the column of dark values below V1 in Fig. 3E and as the large and dense pattern of connections toward V1 in the Sankey plot in Fig. 3F.)
We speculate that the shift in representations between planning and execution might reflect a role of mental simulation in grasp planning and subsequent comparison with the sensory evidence during execution. During the planning phase, participants may be using visual information to compute and compare forward models of potential grip choices (Wolpert and Flanagan, 2001; Cisek and Kalaska, 2010) and possibly mentally simulating potential grasps (Jeannerod, 1995; Jeannerod and Decety, 1995). These simulations could be used to generate motor plans and sensory predictions. Sensory predictions could then be compared with visual, tactile, and proprioceptive inputs during the grasping phase to facilitate online movement corrections and evaluate the success of the generated motor plan (Desmurget and Grafton, 2000; Wolpert and Ghahramani, 2000; Wolpert et al., 2011). This possibility is supported by previous work showing that planned actions can be decoded from activity in V1 and LOC before movement onset (Gallivan et al., 2013b, 2019; Gutteling et al., 2015; Monaco et al., 2020) and that V1 and LOC are rerecruited when performing delayed actions toward remembered objects (Singhal et al., 2013).
Effects of grasp comfort
Grasp comfort was moderately correlated with object mass (r ∼ 0.3) but not grasp axis nor grasp size, suggesting that other factors also affected comfort (perhaps even more so than usual because of the movement constraints in the scanner). Grasp comfort was significantly correlated with PPA activation during execution, perhaps related to a role for PPA in also coding object mass during execution. More interestingly, activation patterns in premotor cortex (PMv and PMd) were correlated with grasp comfort during planning, although no regions significantly represented object mass during planning. These results corroborate earlier results implicating premotor cortex in grip selection based on orientation (Martin et al., 2011; Wood et al., 2017) and extend the findings to a broader range of factors and to multivariate representations.
Limitations and future directions
One notable finding of our study is that object mass is encoded in sensorimotor regions during action execution. This is understandable, as information about object mass is required to modulate grip and lift forces. However, we have previously demonstrated that mass and mass distribution also play an important role in selecting where to grasp an object (Klein et al., 2020). It is thus reasonable to expect processing of object material and mass also during planning, which we did not observe. However, in our study, grasps were preselected. As a result, participants did not need to process an the material properties of an object to select appropriate grasp locations. To investigate the role of visual material representations in grasp selection, future research could use our computational framework (Klein et al., 2020; Maiello et al., 2020) to identify objects that produce distinct grasp patterns rather than constraining participants to predefined grasp locations. Conditions that require visual processing of object material properties to select appropriate grasp locations would then reveal whether the same or different sensorimotor regions process object mass during grasp planning and execution. However, such designs would require disentangling activity related to representing shape per se from activity related to grasp selection and execution.
One factor which is known to be important for grasp selection and execution is grip torque, that is, the tendency of an object to rotate under gravity when grasped away from its center of mass (Goodale et al., 1994; Lederman and Wing, 2003; Eastough and Edwards, 2007; Lukos et al., 2007; Paulun et al., 2016). Although torque is directly related to object mass, it is possible to select different grasps on the same object that produce substantially different torques (Maiello et al., 2020). As grasps with high torque require greater forces at the fingertips to maintain an object level, humans tend to avoid such high-torque grasps (Klein et al., 2020). We originally designed our stimuli in the hope of dissociating torque from object mass. Unfortunately, in pilot testing we observed that certain object and grip configurations in the magnetic field of the MRI scanner produced eddy currents in the brass portions of our stimuli. These currents caused unexpected magnetic forces to act on the stimuli, which in turn altered fingertip forces required to grasp and manipulate the objects. To avoid the occurrence of such eddy currents in our experiment, we decided to forgo conditions differentiating the effects of object mass from those of grip torques. By using nonconductive materials, in future work our approach could be extended to test whether grasp-relevant torque computations occur in the same visuomotor regions responsible for estimating object material and shape. Although previous studies have investigated material and shape largely independently, one intriguing question for future research is how material and shape are combined to assess the distribution of materials and the consequences of mass distribution on torque and grip selection.
Conclusions
Together, our results extend previous behavioral and modeling findings about how participants select optimal grasps based on myriad constraints (Klein et al., 2020) to reveal the neural underpinnings of this process. Results show that distinct factors—grip orientation, grip size, and object mass—are each represented differently. Moreover, these representations change between grasp planning and execution. Representations during planning rely relatively more heavily on the dorsal visual stream, whereas those during execution rely relatively more heavily on the ventral visual stream. Although surprising, this transition can be explained by a transition from grip selection during planning to monitoring of sensory feedback during grasping execution.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft Grants IRTG-1901 and SFB-TRR-135, HORIZON EUROPE European Research Council Consolidator Grant ERC-2015-CoG-682859 (R.W.F.), Natural Sciences and Engineering Research Council of Canada Grants 4931-2014 and RGPIN-2016-04748 (J.C.C.), and a Canada First Research Excellence Fund BrainsCAN Grant to the University of Western Ontario. We thank Mel Goodale for discussions when designing the study.
The authors declare no competing financial interests.
- Correspondence should be addressed to Guido Maiello at guido_maiello{at}yahoo.it
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.