Abstract
Current understanding of the neural processes underlying human grasping suggests that grasp computations involve gradients of higher to lower level representations and, relatedly, visual to motor processes. However, it is unclear whether these processes evolve in a strictly canonical manner from higher to intermediate and to lower levels given that this knowledge importantly relies on functional imaging, which lacks temporal resolution. To examine grasping in fine temporal detail here we used multivariate EEG analysis. We asked participants to grasp objects while controlling the time at which crucial elements of grasp programs were specified. We first specified the orientation with which participants should grasp objects, and only after a delay we instructed participants about which effector to use to grasp, either the right or the left hand. We also asked participants to grasp with both hands because bimanual and left-hand grasping share intermediate-level grasp representations. We observed that grasp programs evolved in a canonical manner from visual representations, which were independent of effectors to motor representations that distinguished between effectors. However, we found that intermediate representations of effectors that partially distinguished between effectors arose after representations that distinguished among all effector types. Our results show that grasp computations do not proceed in a strictly hierarchically canonical fashion, highlighting the importance of the fine temporal resolution of EEG for a comprehensive understanding of human grasp control.
SIGNIFICANCE STATEMENT A long-standing assumption of the grasp computations is that grasp representations progress from higher to lower level control in a regular, or canonical, fashion. Here, we combined EEG and multivariate pattern analysis to characterize the temporal dynamics of grasp representations while participants viewed objects and were subsequently cued to execute an unimanual or bimanual grasp. Interrogation of the temporal dynamics revealed that lower level effector representations emerged before intermediate levels of grasp representations, thereby suggesting a partially noncanonical progression from higher to lower and then to intermediate level grasp control.
Introduction
The human brain computes sensorimotor representations and processes in multiple stages of sensorimotor transformations (Flash and Hogan, 1985) to interact with the world. For example, to grasp an object, the brain extracts sensory information about the object and the intended effector to generate motor programs that then guide the movements of the grasping hand. To this end the brain relies on lateral and medial pathways within a parietofrontal network (Fattori et al., 2009, 2017; Gallivan and Culham, 2015; Janssen and Scherberger, 2015) that computes grasp-relevant visuomotor features (Tunik et al., 2005; Davare et al., 2006; Baumann et al., 2009; Cavina-Pratesi et al., 2010; Fabbri et al., 2016; Schaffelhofer and Scherberger, 2016).
Two (partially related) properties of the grasp network are suggestive of the multistage dynamics of the sensorimotor transformations that they perform—hierarchical organization and visual-to-motor gradients. The grasp network is hierarchically organized; at higher (or upstream) levels areas represent grasp features regardless of effectors, at lower levels areas encode grasp features for the right or left hand more specifically (Davare et al., 2006; Gallivan et al., 2013; Turella et al., 2016, 2020; Michaels and Scherberger 2018). Furthermore, some areas share neural resources at intermediate levels, downstream from effector-independent representations but upstream from the control of individual effectors. For example, right parietal areas, more specialized for grasping with the left than the dominant right hand, also control bimanual grasps (Le et al., 2014, 2017). Other examples of intermediate-level sensorimotor control can be found in Kadmon Harpaz et al. (2014) and Turella et al. (2020).
A concept related to hierarchical organization is a visual-to-motor gradient (Gallivan and Culham, 2015; Janssen and Scherberger, 2015), which is seen in parietal regions more associated with representations of visual object properties (upstream of motor control and effector representations); whereas premotor and motor regions are more associated with more downstream motor representations for movement planning and execution (Fabbri et al., 2016; Schaffelhofer and Scherberger, 2016), operating in a relatively modular fashion (Michaels et al., 2020).
Because of the hierarchical structure of the frontoparietal grasp network and sensory-to-motor organization, it appears obvious to assume that information flows from sensory to motor stages and from higher to lower levels of planning and executing grasp movements. Yet, empirical evidence remains limited. Functional magnetic imaging (fMRI), the dominant method to study human brain functions, is handicapped by its coarse temporal resolution, although fMRI paradigms slowing down grasp processes offer some albeit limited insights into timing (Monaco et al., 2011; Gallivan et al., 2011, 2013; Ariani et al., 2018).
To unravel the time course of the sensorimotor grasp processes in detail, two studies were begun to use magnetoencephalography (MEG) or electroencephalography (EEG) together with multivariate decoding techniques. Turella et al. (2016) identified action planning processes ∼750 ms after object presentation and upstream from effector-related processes. Guo et al. (2019) revealed the time course of object shape and grasp orientation representations during grasp planning and execution.
Here, we used multivariate EEG analysis to test whether sensorimotor processes underlying grasping evolve in a strictly canonical fashion from higher to lower levels. To tease apart higher, intermediate, and lower levels of computations, we mapped neural representations that were effector independent and partially differentiated between effectors or completely differentiated between effectors. Therefore, we extracted representations of grasp orientation to map grasp programs before and after effector specification. Participants knew which grasp orientation to perform while viewing objects but learned only later whether to grasp the object with the left or right hand or whether to grasp bimanually. We included bimanual grasps to identify intermediate-level action planning where bimanual and left-hand grasping are computationally and physiologically similar but different from right-hand grasping (Le et al., 2014, 2017, 2019). We found that effector-independent visual representations emerged before motor representations. Surprisingly however, representations partially differentiating between effectors emerged after representations that completely differentiated between effectors, thereby revealing a noncanonical progression from higher to lower and then to intermediate-level grasp control.
Materials and Methods
Participants
Fifteen participants (eight females; median age, 20 years; range, 18–35) from the University of Toronto community gave their written and informed consent to participate in the experiment in exchange for cash payments. Whereas we were (and are) not aware of a power analysis for the analysis that we intended to conduct here (i.e., representational dissimilarity analysis, see below), we determined our sample size a pr iori based on similar EEG or MEG studies using multivariate methods. First, in the main experiment of a study from our lab (Guo et al., 2019) we had found that using support vector machines to analyze 6 h of data for each of 15 participants yielded a significant grasping effect (Guo et al., 2019), including a significant effect for 15 of 15 individual participants for one of the independent variables. (Nemrodov et al., 2016, 2018, have similar sample sizes in two support vector machine studies on face perception.) In addition, a literature search for related articles that conducted representational dissimilarity analysis on MEG/EEG data produced an average sample size of 13.1 (standard deviation = 5.2, range = 5–20; Cichy et al., 2014; Cichy and Pantasis, 2017; Kaneshiro et al., 2015; Kassraian-Fard et al., 2016; Kietzmann et al., 2019; Mohsenzadeh et al., 2019; Wardle et al., 2016; Sburlea et al., 2021).
The participants included in the present study were all right-handed (Oldfield, 1971) and had normal or corrected-to-normal vision. All procedures were approved by the Human Participants Review Subcommittee of the University of Toronto and conformed to the ethical standards in the Declaration of Helsinki.
Procedures and apparatus
Participants were seated in a light-sealed room at a table waiting to make visually guided grasp movements. Before each block of trials an experimenter sitting beside them told them whether grasps should be clockwise (CW) or counterclockwise (CCW). Also, the experimenter practiced with them (two or three trials) as they waited for a high- or low-pitched tone as a signal about which effector to use. For left-handed and bimanual (LB) blocks, one pitch (high or low in different blocks) signaled grasps with the left hand using the index finger and the thumb, and the other pitch signaled grasps with both hands using the index fingers of both hands and the middle fingers for support; for right-handed and bimanual (RB) blocks, the pitch of the tone cued grasps with the right hand or with both hands; and for left-handed and right-handed (LR) blocks, it cued grasps with the left or right hand. We tested only two types of grasping in any block to avoid confusion; learning to associate three different pitches to three grasp types as well as relearning the association after each block of trials would have been too difficult for participants. Contingency between pitch and effector randomly changed for participants from block to block to ensure that our analyses did not falsely include representations of auditory frequencies when decoding effector representations.
The participants then placed their left and right hand on a button box so that each index finger blocked a beam of infrared light. Earplugs and an opaque shutter glass screen (Smart Glass Technologies) ensured that the participants could not hear or see how the experimenter prepared each trial. To that end, the experimenter turned on a set of LEDs that illuminated a black-clad grasp space with a slanted platform and a square-shaped peg in the middle. On the peg the experimenter mounted objects always with the same position and orientation, 43 cm away from the participant with the surface of the object tilted toward the participant's line of sight.
All objects were made from 2 cm thick wooden blocks and were either shaped like a pillow with four concave edges or like a flower with four convex edges (Fig. 1A). All objects measured 6 cm across opposing edges, thus with identical grip sizes, and they were painted middle gray on the sides. The top surfaces were covered either with a grid or checkerboard texture. All combinations of textures and shapes were equally likely to occur. However, only object shapes were relevant for grasping. Texture was irrelevant for the task and merely helped create a greater variety of objects to better engage the attention of the participants.
Methods. A, Objects used in the experiment. Note that the pillow and flower shapes provide equal grip sizes. B, Combination of grasp orientation and effector. C, Timeline of a trial. The auditory Go cue (high- or low-pitched tone) informed participants of the effector to use for the grasp.
Next, the experimenter pressed a key to start the trial (Fig. 1B). The LEDs switched off, and the shutter glass screen turned transparent. Seven-hundred fifty to 1250 ms later, the LED lights turned on to illuminate the object for the participant to see for a Preview time of 500–1000 ms. Then the pitch of the auditory Go signal (loud enough to be heard through the earplugs) instructed participants which hand or hands to use to grasp the object.
Once participants moved their hand or hands, the infrared beams on the button box were unblocked, marking the time of movement onset (note that only at that time the participant's hand or a hands came into view through the shutter glass). As the participants reached to grasp the object, their fingers crossed a curtain of infrared beams created by two 15-cm-tall pillars located 40 cm apart from one another and directly in front of the object. The participant's hand or hands crossing the beam defined the end of the reach-to-grasp movement (i.e., immediately before the participant touched the object). The trial ended with the participant picking up the object and placing it on the table near the experimenter. Trials with incorrect grasps or dropped objects were marked as invalid before the start of the next trial. In total there were 40 trials in one block (2 shapes × 2 textures × 2 effectors × 5 repetitions in random order) for a total of 42 blocks (2 orientations × 2 effector combinations, i.e., LB/RB/LR × 7 repetitions in random order) across two 3 h sessions conducted on different days, and there was one practice block at the beginning of the experiment. Breaks were provided in between blocks as requested by the participants.
Data acquisition and preprocessing
EEG data were recorded using a 64-electrode BioSemi ActiveTwo recording system, digitized at a rate of 512 Hz with 24-bit analog-to-digital conversion. The electrodes were arranged according to the International 10/20 System. The electrode offset was kept below 40 mV.
EEG preprocessing was performed offline in MATLAB using the EEGLAB Toolbox (Delorme and Makeig, 2004) and ERPLAB Toolbox (Lopez-Calderon and Luck, 2014). Signals from each block were bandpass filtered (noncausal Butterworth impulse response function, 12 dB/octave roll-off) with half-amplitude cut-offs at 0.1 and 40 Hz and down sampled to 256 Hz (to improve statistical power). Noisy electrodes (correlation with nearby electrodes < 0.6) were interpolated (on average 0.78 electrodes per subject), and all electrodes were rereferenced to the average of all electrodes. Next, independent component analysis (ICA) was performed, and ICLabel (Pion-Tonachini et al., 2019) was used to help identify and remove components that were associated with blinks, eye movements, muscle activity, and channel noise. The ICA-corrected data were segmented relative to the onset of Preview (−100 to 500 ms) and Go signal (−100 to 800 ms). In addition, invalid trials and epochs containing abnormal reaction times (<100 ms or >1000 ms) or incorrect grasps were removed. As a result, an average of 9.03% of trials from each subject was removed from further analyses.
As an additional preprocessing procedure, we conducted multivariate noise normalization on EEG patterns separately for Preview and Go. The procedure is recommended for multivariate pattern analysis of electrophysiological signals to avoid any individual electrode value overly influencing dissimilarity calculations (Guggenmos et al., 2018). To this end we computed covariance matrices based on electrode activation patterns of all trials for each condition and time point separately. The obtained covariance matrices were subsequently averaged across time points and then across conditions. The averaged covariance matrix was then inverted and multiplied with EEG patterns at each epoch and time point.
Next, the preprocessed epochs were averaged into ERP traces to increase the signal-to-noise ratio of spatiotemporal patterns (Grootswagers et al., 2017). Specifically, up to five epochs within a given block that corresponded to the same condition (i.e., shape, texture, effector, and grasp orientation) were averaged together, resulting in 14 separate ERP traces for each of the 24 conditions (2 orientations × 3 effectors × 4 objects) for Preview and Go, respectively.
Representational dissimilarity analysis
Multivariate pattern analysis of ERP traces was conducted using representational dissimilarity analysis (RDA; Kriegeskorte et al., 2008). RDA captures the representational structure among conditions based on dissimilarities in EEG patterns. The obtained structure can then be compared with the expected representational structure derived according to the shared characteristics (i.e., shape, texture, effector, orientation) among conditions to test for the presence of certain representations. A benefit of RDA over decoding approaches (Guo et al., 2019) is the ability to statistically remove influences of artifacts (e.g., eye movements) from the representational structure (see below, Main effect RDM models).
RDA was performed at each time point using spatial features from all electrodes to assess the time course of representations. We used a cross-validated Euclidean distance to index dissimilarity between each pair of conditions. Here, cross-validation was performed by averaging the estimated distance (
Cross validation ensured that the resulting Euclidean distance was unbiased; that is, the expected value of distance would be zero if two patterns are not statistically different from one another, and less than zero otherwise (Nili et al., 2014; Walther et al., 2016; Guggenmos et al., 2018). Note that because of the nature of cross-validation the estimated Euclidean distance,
RDA produced a 24 × 24 representational dissimilarity matrix (RDM) at each time point during Preview and Go (Fig. 2A, RDMs obtained at selected time points). Aside from RDMs obtained from ERPs, we also obtained a set of eye movement RDMs that reflect the degree to which conditions differed from one another based on eye movement artifacts. That is, based on the eye movement artifacts as identified by the independent components analysis during preprocessing, we calculated RDMs applying the same procedure.
RDM models. A, Main effect models. B, Similar representation models of effectors (e.g., left = bimanual model, or more precisely, left = bimanual ≠ right model. Note that a complete similarity model where left = bimanual = right would mean that all cells are similar would not be suitable for RDA. Instead, to identify effector-independent representations, we used the grasp orientation model as a proxy to map effector independent representations. The diagonal elements in the model RDMs are excluded because the corresponding elements in the RDMs derived from ERPs are always zero, and so this avoids inflating the β weights from multiple regression.
Main effect RDM models
To identify the dissimilarity information contained in the RDMs of the ERPs, they were compared with model matrices (Fig. 2) designed to test the presence of specific representations. To test for visual representations, the shape model took into account that flower and pillow shapes are represented differently regardless of texture and visuomotor properties, and the texture model assumed that grid and checkerboard textures are represented differently regardless of shape and visuomotor properties. To test for visuomotor and motor representations, the grip orientation model assumed that CW and CCW grasps are represented differently regardless of effector and visual properties, and the hand model assumed that left, right, and bimanual grasps are represented differently regardless of orientation and visual properties. The four models, together with the eye movement RDMs, then simultaneously entered a multiple regression to serve as predictors of the RDMs of the ERPs at each time point. This way, we obtained β weights that reflected the unique contribution of each model while partialing out other models as well as the artifactual contribution of eye movements. The resulting β weights were compared against zero using one-sample t tests.
Effector-dependent RDM models
The models above tested for the presence of single representations independent of other features, analogous to testing for main effects in ANOVAs. Additionally (and similar to linear contrasts of ANOVAs), we tested for effector-dependent processes using models constructed based on a priori hypotheses. Specifically, we hypothesized that left-handed and bimanual grasps might be similarly represented given their similarities in neural (Le et al., 2014, 2017) and computational processes (Le and Niemeier, 2013a, b; Le et al., 2019). This hypothesis resulted in a model that assumes that grasp representations for left-handed and bimanual grasps are the same, whereas grasp representations for right-hand grasping are different (Fig. 2B). For completeness, we also tested the possibility that other effector pairs might share similar grasp representations, and so a second model took into account more similar representations between right-handed and bimanual grasps, and finally a third model took into account more similar representations between left- and right-handed grasps. Significant time courses of these special models would indicate that some aspects of grasp representations depend on shared neural processes between effectors. We tested each special model in a separate multiple regression using the special model together with the effector model as well as eye movement RDMs as regressors (shape and texture models were statistically entirely orthogonal and therefore not included).
Electrode informativeness
To assess the informativeness of electrodes for the RDA of ERP patterns, we performed a searchlight analysis across electrodes. Specifically, we defined a 50 mm radius neighborhood around each electrode and conducted separate RDAs on spatiotemporal features obtained from each electrode neighborhood across 100 ms time bins. The selected radius sufficiently captures the nearest surrounding electrodes. The cross-validation procedure here followed Equation 1 with the exception that x now reflects spatiotemporal features (∼8 electrodes in a neighborhood × 40 time points) rather than entirely spatial features. The resulting RDMs from this searchlight analysis were compared with all models previously described using the same multiple regression approach.
Statistics
For all tests conducted on EEG data, statistical significance was assessed using a nonparametric, cluster-based approach to determine clusters of time points (or electrodes for searchlight analyses) in which there was a significant effect at the group level (Nichols and Holmes, 2002). For time-resolved analyses, we defined clusters as consecutive time points that exceeded a statistical threshold (cluster-defining threshold) defined as the 95th percentile of the distribution of t values at each time point obtained using sign permutation tests computed 10,000 times (equivalent to p < 0.05, one tailed). Significant temporal clusters were then defined as cluster sizes that are equal to or greater than the 95th percentile of maximum cluster sizes across all permutations (equivalent to p < 0.05, one tailed). In addition, we calculated 95% confidence intervals for the onset times of the first significant cluster of the observed effects. This was accomplished by bootstrapping across participants (i.e., by selecting datasets randomly with replacement) 10,000 times and conducting the same data analysis including the permutation tests. For searchlight analyses (conducted across electrodes), cluster-based correction was conducted on each 100 ms time window separately on spatial clusters defined as nearby electrodes within a 50 mm radius.
Statistical tests for behavioral data (reaction time and movement time) were performed using four-way repeated-measures ANOVAs (Effector × Grasp Orientation × Shape × Texture) and adjusted for sphericity violations using the Greenhouse–Geisser (GHG) correction when needed. Additional post hoc analyses were conducted using repeated-measures t tests and corrected for multiple comparisons using the false discovery rate (Benjamini and Hochberg, 1995).
Results
Behavioral results
Average reaction time (RT; defined as the time between Go onset and movement onset) was 456 ms (SD = 89 ms). RTs submitted to a four-way repeated-measures ANOVA (Effector x Grasp Orientation x Shape x Texture) yielded a main effect of effector (F(1.782, 32) = 19.654, p < 0.001,
Average movement time (MT; defined as the time between movement onset and movement end) was 249 ms (SD = 44 ms). The four-way repeated-measures ANOVA of MTs showed a main effect of Effector (F(1.968, 32) = 21.230, p < 0.001,
Time course of visual and visuomotor representations and electrode informativeness
The middle rows in Figure 3 show the group-averaged 24 × 24 RDMs obtained at four different sample time points relative to Preview and Go onset, respectively. These RDMs captured dissimilarity information as can be visualized with multidimensional scaling (MDS; top and bottom rows). MDS was applied to the group-averaged RDMs, and the top three dimensions accounting for the most variance in the data are plotted as 3D plots of abstract representational space. Visual inspection of the plots shows that data during Preview are organized primarily based on shape and orientation (the organization based on orientation is more subtle with filled symbols for clockwise orientation plotted somewhat higher in the plots than nonfilled symbols for counterclockwise orientation), whereas during Go, data are organized according to orientation and effector.
Group-averaged RDMs (second and third rows) and visualization by MDS (top and bottom rows) from four selected time points aligned to Preview and Go. Note that texture conditions were not graphed in the MDS plots given the weak effect of texture.
To quantify the extent to which the obtained RDMs captured the representation of visual object and of motor features, we conducted multiple regression on the obtained RDMs at each time point for each participant using the four RDM models (shape, texture, orientation, and effector). This allowed us to examine the unique contribution of each model to explaining the obtained RDMs. Including all four RDM models within the same analysis was not necessary because they were statistically independent of one another; the purpose was to obtain the same data format as in subsequent analyses that included statistically dependent RDMs.
Beta weights for the shape model reached significance 85 ms after object onset (Fig. 4A, top row). The weights peaked at 140 ms, dropping to lower levels thereafter. Beta weights for shape aligned to (shape irrelevant) Go were around zero (Fig. 4B).
Representations of shape, texture, grasp orientation, and hand. A, B, Time course of representations aligned to the onset of Preview (A) and (B) Go. Shaded envelopes around the curve indicate ±1 SEM. Shaded areas under the curve indicate time points that were significant (cluster-based sign permutation test with cluster-defining and cluster-size thresholds of p < 0.05). Horizontal colored error bars mark 95% bootstrapped confidence intervals of representation onset during Preview for shape (80 ms, 105 ms) and grasp orientation (70 ms, 155 ms), and during Go for grasp orientation (25 ms and 195 ms), and hand (105 ms, 145 ms). Note that during Go, shape representations reached significance from 550 to 670 ms but were excluded from the bootstrapping analysis given that the time periods occurred during movement. C, Electrode informativeness for representations. Open circles indicate significant electrodes (cluster-based sign-permutation test with cluster-defining and cluster-size thresholds of p < 0.05). Note that in A and B β-weight curves are differently scaled along the y-axis.
Beta weights for texture did not reach significance (Fig. 4A,B, second row) as expected (Guo et al., 2019), and so texture models were not included in any subsequent analyses.
Grasp orientation representations formed during Preview (80–450 ms) and thus before effector specification (Fig. 4A, third row). During Go, β weights for grasp orientation became significant after 115 ms, with a brief interruption between 265 and 275 ms). Note that a small cluster of β weights (∼50–100 ms) was slightly too brief to reach significance but contributed to the wide confidence interval (25–195 ms).
Hand representations rose rapidly 115 ms after the Go signal and maintained significance afterward (Fig. 4B, fourth row; hand representations were absent before effector specification during Preview, as expected; Fig. 4A).
To explore the spatial profiles of shape, grasp orientation, and hand representations, we applied RDA to neighborhoods of electrodes (50 mm around each electrode, ∼8 electrodes) for temporal bins of 100 ms. Shape information during Preview peaked at posterior electrodes (Fig. 4C, first row) starting early (0–100 ms), then extending to nearly all electrodes for the 100-200 ms bin, and gradually declining thereafter. During Go, no electrodes reached significance, consistent with the results from time-resolved RDA.
Grasp orientation representations during Preview peaked at posterior and central electrodes and became gradually more prominent with a maximum at 200-300 ms (Fig. 4C, second row). By contrast, during Go, grasp orientation information mostly came from parietal, central, and temporal electrodes with relatively less involvement of occipital electrodes compared with Preview during 0-200 ms and 300-500 ms.
Hand representations during Preview showed a small effect at right frontal electrodes during the 400-500 ms bin, perhaps reflecting a previous-hand effect (because each block of trials tested only two effectors there was a 50% chance that the same hand was used in two consecutive trials). Hand representations after Go especially involved a peak at central electrodes from 100 ms on but also included posterior electrodes (Fig. 4C, third row). After 200 ms they involved nearly all electrodes.
In sum, RDA allowed us to extract information about the time course and the informativeness of electrodes with regard to representations of grasp actions. Next we used two strategies to map in detail how neural programs for grasp actions unfold. First, to identify abstract or high-level (visual) processes, we looked for representations of grasp orientation before effector specification to then show that these representations re-emerged after effector specification yet earlier than effector representations. Second, to identify intermediate-level processes, we searched for action representations that partially differentiated between effectors. That is, we looked for representations shared by left-hand and bimanual effector representations, finding them to arise after representations that distinguished among all effectors.
Effector-independent representations of grasp orientation before and after effector specification
As illustrated in Figure 5 we conducted a series of analyses to demonstrate (1) that Preview representations of grasp orientation reflected effector-independent grasp processes (Fig. 5A–C) and (2) that these representations re-emerged during Go (Fig. 5D). Regarding the former, we first tested whether Preview grasp orientation representations were truly effector independent. That is, even before the Go signal specified the effector, participants might have generated motor plans based on the effector they had used during the respective previous trial. Behavioral evidence for motor priming was reflected in slower reaction times when the effector did or did not change from one trial to the next (different effector, 474 ms; same effector, 443 ms; t(14) = 5.058; p < 0.001; d = 1.306). Therefore, we calculated ERPs sorting together trials that shared the same previous-trial effector condition (regardless of which effector would be used in the current trial) and reran RDA. We observed significant representations of the previous-trial effector from ∼400 ms to the end of Preview (Fig. 5A). This shows that motor priming did influence brain processes before effector specification but well after the representations of grasp orientation had formed. Thus, Preview orientation representations between 80 and 400 ms did not depend on effectors.
Effector independence of grasp orientation representations during Preview and Go. A, Representations of the primed hand. Shaded area under the curve indicates significant time periods. B, Time course of grasp orientation representations during Preview. Red curve shows results baselined for −200 to −100 ms superimposed onto the original data (baseline, −100–0 ms). C, Representations of grasp orientation before (black lines) and after (colored lines) temporally jittering temporal alignment. Shaded areas between the black and colored lines indicate time periods of significant differences. Open circles on the scalp plots indicate significant changes in electrode involvement during periods of significant differences. C1, Preview data. C2, Go data. D, Dynamics of grasp orientation representations during D1 Preview, D2 Go, and D3 across the two events as reflected by temporal generalization of RDA. Representational dissimilarity matrices were first computed using ERP patterns across two time points (e.g., Preview at 100 ms vs Go at 150 ms) and were subsequently submitted to multiple regression to test for grasp orientation representations. Note that in D1 and D2 significant β weights yield symmetrical patterns because if a representation, say, 300 ms after object onset is similar to a representation 120 ms after object onset or Go, then the reverse must be true too. In contrast, in D2 the pattern of significant β values is asymmetrical because a representation 300 ms after object onset might be similar to a representation 120 ms after Go, but the reverse is not necessarily true, Preview representations at 120 ms do not necessarily have to be similar to Go representations at 300 ms. Blue dashed lines and brackets illustrate how data in C and D correspond to one another in time. All statistics in A–D were computed using cluster-based sign permutation tests with cluster-defining and cluster-size thresholds of p < 0.05.
Second, we tested whether Preview orientation representations emerged 80 ms after object onset because they were triggered by the object or because of an artifact that had to do with our choice of baseline (−100 to 0 ms), which would have eliminated any information right before object onset. However, we found that with an earlier baseline (−200 to −100 ms) orientation representations exhibited essentially the same trajectory (Fig. 5B). Crucially, this shows that the grasp orientation representations as studied here did not form based on the verbal instructions given at the start of each block to grasp clockwise or counterclockwise as opposed to visual and visuomotor processes (participants obviously followed verbal instructions, but the respective representations were invisible to our ERP data analysis).
Third, to further show that orientation representations reflected visual processes, we tested whether these representations were sensitive to the precise timing of object onset to illustrate how important correct temporal alignment is for RDA of ERPs; for example, in Fig. 4A Preview shape representations persist after 500 ms, but in Fig. 4B, with the same data aligned to Go, shape representations disappear. Therefore, we added random temporal jitter (±250 ms) to the time of visual object onset of individual trials to recalculate ERPs and rerun RDA. We found that orientation representations were significantly reduced between 200 and 400 ms (Fig. 5C1). Also, occipital electrode involvement largely disappeared during that time. This suggests that grasp orientation representations during Preview relied on processes that were tightly linked to visual object onset.
Curiously, orientation representations during Go showed a similar sensitivity to the timing of the auditory signal. Orientation representations as well as electrode involvement during two clusters between 50 ms and 200 ms were also significantly reduced (Fig. 5C2). This shows that orientation representations during Go were triggered by the auditory Go signal without that signal carrying any information relevant for grasp orientation. A possible reason is that the sound served as an impulse that pinged visual orientation representations reflexively (Wolff et al., 2017). Alternatively, Go signal specifying effectors might have caused visual preparatory grasp computations to repeat. If so, orientation representations during Preview and Go should be similar. Indeed, this is what we found in the next section.
To demonstrate that Preview orientation representations re-emerged during Go, we used cross-temporal and cross-event generalization analysis (King and Dehaene, 2014). That is, we computed RDMs using ERP patterns from different time points during Preview and Go and then used multiple regression the same way as before to test for the presence of grasp orientation representations. This analysis produced time-by-time matrices in which the diagonals of the matrices reflect time-specific representations same as the results discussed above, and the off diagonals of the matrices reflect generalizability of representations from one time point to another. During Preview (Fig. 5D1) the analysis revealed a chain of consecutive representations ∼70–110 ms (significant weights along the diagonal) followed by a mix of sustained and reactivating representations (weights form a roughly square-shaped cluster with some armlike patterns). King and Dehaene, 2014 have a discussion of the different activation patterns. During Go (Fig. 5D2), a first cluster from ∼0–300 ms showed a similar mix of sustained activation and reactivation. A second cluster (∼300–800 ms) mainly showed sustained activation coinciding with movement execution. Crucially, comparing Preview and Go through cross-event generalization (Fig. 5D3) revealed similarities in orientation representations. That is, Go representations from 50 to 120 ms generalized from ∼150 to 500 ms during Preview, and Go from ∼150 to 300 ms generalized from ∼150 to 300 ms during Preview. Note that these times of generalization correspond well with times of jitter-sensitive representations in Figure 5C1 and 5C2. Hence, Preview orientation representations sensitive to visual object onset transiently reactivated during Go. Note that these representations preceded motor representations of the hand (Fig. 4B).
Similar representation models of effectors
Our second strategy to map unfolding grasp actions in detail sought to identify intermediate-level effector representations that partially differentiated between effectors. To this end we moved away from RDAs investigating main effect-like representations of visual and motor processes (Fig. 2A). Instead, we tested a special model (analogous to an ANOVA linear contrast) that assumed grasp processes to be similar for left-handed and bimanual grasps as shown previously (Le et al., 2014, 2017, 2019) but dissimilar for right-handed grasps (L = B ≠ R model or LB model for short, Fig. 2B). For completeness we also tested models that assumed similar grasp representations between right-handed and bimanual grasps (RB model), as well as left- and right-handed grasps (LR model; RB and LR models were not predicted by previous studies, nevertheless their existence was not ruled out either).
As shown in Figure 6, similar effector representations arose after the Go signal that specified the effector to be used, as expected. LB representations rose rapidly and maintained significance from 165 ms onward (Fig. 6A, top row). Analyzing electrode informativeness showed that mainly central electrodes, with a bias for the right scalp, were involved starting 100 ms after Go (Fig. 6B). These results are consistent with previous behavioral and transcranial magnetic stimulation studies on bimanual grasping (Le et al., 2014, 2017, 2019).
Similar effector representations dependent on shared neural processes between LB grasps, RB grasps, and LR grasps. A, Time course of representations aligned to the onset of Preview and Go. Shaded envelopes around the curve indicate ±1 SEM. Shaded areas under the curve indicate time points that were significant (cluster-based sign permutation test with cluster-defining and cluster-size thresholds of p < 0.05). Error bars indicate 95% bootstrapped confidence intervals of representation onset during Go for LB (130 ms, 175 ms) and RB (95 ms, 230 ms) representations of grasp orientation. Inset, The two plots above the LB model in the top row present MDS results to visualize how effector representations separate. The MDS plot on the left (130 ms) shows that first all effectors separate from one another, right-hand (red), left-hand (green), and bimanual grasping (blue). The MDS plot on the right (350 ms) illustrates that later a similarity of left-hand and bimanual grasping emerges. Note that evidence for the RB model did not emerge within the first three dimensions of the MDS analysis. The LR model attained negative β weights after Go (light green) and zero weights when tested as sole predictor (dark green), indicating that it served as a suppressor variable. B, Electrode informativeness for representations. Open circles indicate significant electrodes (cluster-based sign permutation test with cluster-defining and cluster-size thresholds of p < 0.05). C, Time delay between similar effector representations and hand representations (Fig. 4B, bottom). Top row, LB model. Bottom row, RB model. Shaded areas mark time delays down to the fifth percentile of the 10,000 bootstrapped comparisons.
Interestingly, RB representations also reached significance, although with smaller effect sizes (150–290 ms; Fig. 6A, second row; the β weight curve for RB representations is somewhat similar to the LB curve probably because both curves reflect visuomotor processes unfolding in time in a similar way). Electrode involvement for RB representations was less pronounced and mostly came from the left hemisphere with peaks around central electrodes after 200 ms.
Furthermore, LR representations obtained negative β weights (Fig. 6A, third row, light green curve) because the regression analysis turned the LR model into a suppressor variable to filter out irrelevant variability (i.e., a separate regression using the LR model as the only predictor variable yielded no significant β values; Fig. 6A, third row, dark green curve).
Finally and crucially, we were interested in the timing of the LB model relative to the representations of individual effectors (Fig. 4B, bottom row). If sensorimotor control of grasps evolved in a strictly canonical fashion from higher to intermediate and then lower level representations, then LB representations should be computed before effector representations (Fig. 4B, bottom row). Instead, we found that LB representations arose later (Fig. 6A, MDS plots). The median delay of 47 ms relative to individual effector representations was significant (Fig. 6C). To statistically test this, we subtracted the bootstrapped onset times of LB representations by the bootstrapped onset times of effector representations to obtain one-tailed 95% confidence intervals (CI lower bound, 15 ms). Likewise, RB representations arose with a median delay of 40 ms after effector representations (bootstrapped CI lower bound, 4 ms). When we realigned ERPs to movement onset, we observed similar trends of time differences (hand vs LB, 82 ms; CI lower bound, −65 ms; CI lower bound at α = 0.1, 12 ms, p = 0.095; hand vs RB, 110 ms; CI lower bound, 35 ms). Finally, it should be noted that these time differences are, if at all, conservative estimates because statistically the hand model for individual effector representations (Fig. 2A) had a slightly poorer signal-to-noise ratio than the effector similarity models (Fig. 2B) because of more uneven numbers of similar versus dissimilar cells (∼1/3–2/3 vs ∼5/9–4/9). In contrast, a systematic difference in computation times (e.g., left-hand grasping might take longer to compute) had no effect on time differences because all models included all effector conditions.
Contribution of eye movement artifacts
Eye movements can differ systematically across grasp conditions (Brouwer et al., 2009) and can contaminate EEG signals and, thus, contribute to multivariate analyses (Quax et al., 2019). To address this, we partialed out eye-movement artifacts captured by ICA in our multiple regression analyses of RDMs. However, it is still possible that residual eye movements not captured by ICA contaminated neural-based ERPs. To test this possibility, we ran RDA only using frontal electrodes (FP1, FPz, FP2, AF7, AF3, AFz, AF4, AF8). However, β weights only showed spurious significance for shape during Preview (Fig. 7, top row), effector representations during Go (Fig. 7, fourth row), and RB representations during Go (Fig. 7, sixth row). These results are also consistent with our observation that frontal electrodes carried little information in our main analyses (Figs. 4C, 6B). In sum, our analyses show that eye movement artifacts cannot sufficiently account for the results of our multivariate analyses.
RDA using frontal-most electrodes (Fp1, Fpz, Fp2, AF7, AF3, AFz, AF4, AF8). Shaded envelopes around the curve indicate ±1 SEM. Shaded areas under the curve indicate time points that were significant (cluster-based sign permutation test with cluster-defining and cluster-size thresholds of p < 0.05).
Discussion
We investigated the temporal evolution of grasp programs using multivariate analysis of ERPs recorded from human participants. As a proxy of grasp computations, we used representations of grasp orientation, relating them in time to the development of high-level, effector-independent visual object information and lower level effector representations. In addition, to identify intermediate levels of grasp programming we included a bimanual grasping task. The results provide novel insights into the hierarchical structure underlying the control of human precision grasps. Notably, they suggest that grasp programs evolve in a partially noncanonical manner.
We studied grasp programs during a preview and a movement execution phase. During Preview, grasp programs emerged despite effectors not being specified yet. This is consistent with previous findings that effector-independent representations in posterior parietal and dorsal premotor cortex distinguish between grasp and reach plans (Gallivan et al., 2013; Turella et al., 2016, 2020). In extension of these findings here we show that effector-independent grasp programs already incorporate fine-grained information about grasp orientation. Of further significance, our EEG data allow us to determine that grasp programs emerge 80 ms after object onset, which is similar to when shape representations arose in the present study. It is also similar to orientation representations in a previous study where the effector was known ahead of time (Guo et al., 2019). This shows that grasp programs with or without delayed effector specification commence immediately after objects become visible, suggesting that early grasp programs reflect visual processes that are modulated by action intentions. In support of this idea, here we show that the grasp programs did not reflect the verbal instructions about grasp orientations that we gave at the start of each experimental block because orientation representations did not form before object onset. Also, orientation representations during Preview did not form based on motor priming. Instead, they were temporally yoked to visual object onset, just like shape representations, especially at occipital and parietal electrodes. Our results suggest that early grasp orientation representations reflect high-level grasp programs that rely on object-based visual processes, regardless of whether effectors are specified or not.
The visual nature of orientation representations is consistent with the fact that visually guided grasping requires vision-based grasp point computations (Blake, 1992) and causes the attentional spotlight to split into two regions near the grasp points (Schiegg et al., 2003). Further, occipital and parieto-occipital electrodes being involved in orientation representations suggest that the underlying processes recruited occipitotemporal and occipitoparietal areas, all of which play a role in extracting grasp-relevant object information and action selection (Astafiev et al., 2004; Rice et al., 2007; Monaco et al., 2014; Fabbri et al., 2016) or abstract action representation (Tucciarelli et al., 2015). Further, visual representations of grasp orientation might reflect sensory predictions during grasp planning associated with contact points (Flanagan et al., 2006). In sum, the results highlight the time course and dynamics of visual grasp-goal representations that are modulated by action intentions independent of effector-related processes.
Grasp orientation representations re-emerged after the auditory Go signal announced the hand (or hands) with which to grasp, warranting two key observations. First, orientation representations were temporally yoked to the auditory signal between 50 and 200 ms. At about the same time (50–300 ms after Go), orientation representations were like those during Preview (150–300 ms after object onset, Fig. 5D3), possibly because the Go signal caused visual preparatory responses to repeat. Similar forms of reactivation of visual processes have been observed in fMRI studies where delayed grasping in darkness is associated with reactivation of object area lateral occipital cortex (Singhal et al., 2013; Monaco et al., 2017). Interestingly however, in the present study, shape representations, as opposed to vision-based orientation representations, did not reactivate. This suggests that object-based visual processes reactivated selectively for object features that were relevant for grasp planning consistent with the known selectivity of visuomotor control of grasps (Ganel and Goodale, 2003). Second, effector-independent visual orientation representations re-emerged during Go before the Go signal was converted into lower level representations of individual effectors, as expected from a canonical progression from higher to lower level visuomotor computations.
However, we found that grasp computations did not unfold in a strictly canonical fashion when we looked at hierarchies in more detail, mapping intermediate-level grasp control. Here, we defined intermediate-level computations as grasp representations that had partially, yet not fully, incorporated effector choices. We expected partially shared grasp representations because previous studies have shown that left-hand and bimanual (but not right hand) grasping overlap in computational and neural resources in the right hemisphere (Le and Niemeier, 2013a, b; Le et al., 2014, 2017, 2019). Indeed, we found that a model that assumed left-hand and bimanual grasps to be the same produced significant representations and that these representations relied on right-lateralized parietal, central, and frontocentral electrodes, indicating the involvement of the right frontoparietal grasp network as expected (Le et al., 2014, 2017). Also, the model produced β weights that were more prominent than those of a model assuming right-hand and bimanual grasping to be similar. These partially effector-dependent grasp representations reflected processes not entirely abstract from effector choice. For example, effector choice might have routed grasp point computations for left-hand and bimanual grasps to shared neural resources in the right hemisphere, such as the right anterior intraparietal sulcus (Le et al., 2014). In addition, right-hand and bimanual grasps might have shared neural resources in the left hemisphere. At any rate, right-hand, left-hand, and bimanual grasping implicate different patterns of muscle activation and thus partially shared grasp representations must have reflected intermediate control processes upstream from control circuits for arm and finger movements.
Therefore, given that partially shared grasp representations mark a level of motor control where left-hand grasping is different from right-hand grasping but not yet different from bimanual grasping, do these intermediate LB model representations arise before downstream effector representations distinguish between all three effector choices? Intriguingly, we found that this was not the case. Partially shared grasp representations emerged not before but 47 ms after effector representations formed.
Perhaps we found intermediate representations to be delayed because representations of individual effectors included the processes necessary to map the pitch of the Go signal onto a given effector. However, aligning ERPs to movement onset should have filtered out stimulus-response mapping, only revealing effector representations related to action preparation. Even so, downstream effector representations did not arise after shared representations but earlier.
Perhaps LB model representations did not reflect an intermediate stage of processing but a specialized system for planning atypical or awkward actions. It has been shown that left-hand grasping is a less common or more awkward action than right-hand grasping. For example, unlike right-hand grasping, left-hand and awkward (and, thus, little practiced) right-hand grasps are susceptible to size-contrast illusions (Gonzalez et al., 2008). Therefore, do bimanual grasps liken left-hand grasps in being uncommon or awkward? This appears to be unlikely. Bimanual actions are not uncommon, they are frequently used in daily life (Kilbreath and Heard, 2005). Of course, it could be argued that bimanual precision grasps (as tested here) are less common compared with bimanual grasping with the whole hand. Nevertheless, previous research suggests that bimanual precision grasps are about as proficient as right-hand precision grasps, for example, bimanual grip apertures are as proficiently scaled to object size as apertures for right-hand grasping (compare Le and Niemeier, 2013a, with Le and Niemeier, 2014). Furthermore, a recent study directly showed that shared computations between left-hand and bimanual grasping were not because of actions being awkward or unusual (Le et al., 2019).
Perhaps the LB model did not flag an intermediate processing stage but brain activity that is less lateralized relative to individual effector representations. However, not all aspects of shared left-hand and bimanual activity are strictly bilateral. For example, magnetic stimulation studies have shown that only right parietal regions disrupted visuomotor transformations for bimanual grasping just like left-hand grasping (Le et al., 2014, 2017). Furthermore, not all downstream effector-specific representations are necessarily lateralized; at least executing bimanual actions requires activation of primary motor cortex in both hemispheres.
In conclusion, studying the time course of the neural processes underlying the visuomotor control of grasping, the present study offers novel insights into the temporal structure of visual-to-motor transformations underlying grasp computations. We show that effector-independent grasp representations start as object-based visual processes followed by visuomotor and motor processes. However, partially shared grasp representations and, thus, intermediate levels of control emerge after lower level effector-related motor representations. Our results strongly suggest that grasp control does not necessarily evolve in a canonical fashion, thereby highlighting the need for methods like EEG or MEG and their fine temporal resolution to attain a comprehensive understanding of human sensorimotor control.
Footnotes
This work was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada.
The authors declare no competing financial interests.
- Correspondence should be addressed to Matthias Niemeier at m.niemeier{at}utoronto.ca