Abstract
The frontoparietal networks underlying grasping movements have been extensively studied, especially using fMRI. Accordingly, whereas much is known about their cortical locus much less is known about the temporal dynamics of visuomotor transformations. Here, we show that multivariate EEG analysis allows for detailed insights into the time course of visual and visuomotor computations of precision grasps. Male and female human participants first previewed one of several objects and, upon its reappearance, reached to grasp it with the thumb and index finger along one of its two symmetry axes. Object shape classifiers reached transient accuracies of 70% at ∼105 ms, especially based on scalp sites over visual cortex, dropping to lower levels thereafter. Grasp orientation classifiers relied on a system of occipital-to-frontal electrodes. Their accuracy rose concurrently with shape classification but ramped up more gradually, and the slope of the classification curve predicted individual reaction times. Further, cross-temporal generalization revealed that dynamic shape representation involved early and late neural generators that reactivated one another. In contrast, grasp computations involved a chain of generators attaining a sustained state about 100 ms before movement onset. Our results reveal the progression of visual and visuomotor representations over the course of planning and executing grasp movements.
SIGNIFICANCE STATEMENT Grasping an object requires the brain to perform visual-to-motor transformations of the object's properties. Although much of the neuroanatomic basis of visuomotor transformations has been uncovered, little is known about its time course. Here, we orthogonally manipulated object visual characteristics and grasp orientation, and used multivariate EEG analysis to reveal that visual and visuomotor computations follow similar time courses but display different properties and dynamics.
Introduction
We move our hands to grasp objects with seeming ease. Yet, the neural and computational mechanisms underlying reach-to-grasp actions are complex (Fagg and Arbib, 1998) and evolve developmentally late (Kuhtz-Buschbeck et al., 1998; Schneiberg et al., 2002). To plan a grasp, the brain needs to form visual representations of an object with regard to its shape, weight and other intrinsic characteristics and to transform these representations into motor plans that guide hand movements (Jeannerod et al., 1995). Such visuomotor transformations involve parieto-frontal networks composed of dorsolateral and dorsomedial streams (Castiello, 2005; Gallivan et al., 2013a,b; Cavina-Pratesi et al., 2018) within which object characteristics are integrated with grasp parameters (Culham et al., 2003; Tunik et al., 2005; Davare et al., 2007; Baumann et al., 2009; Monaco et al., 2011; Theys et al., 2012; Verhagen et al., 2012; Fabbri et al., 2016; Schaffelhofer and Scherberger, 2016).
Relative to the neuroanatomic basis of visuomotor transformations, little is known about their time course. Several fMRI studies have used delayed-movement paradigms in which movements are planned and withheld for several seconds. Incorporating pattern classification, the studies demonstrated that upcoming grasp-related parameters (e.g., grip type and effector choice) can be decoded from parieto-frontal brain areas during planning phases (Gallivan et al., 2011a,b, 2013a,b), and that these representations generalize to movement execution phases (Ariani et al., 2018) thereby revealing a rough timeline of the unfolding sensorimotor events.
Much higher temporal resolution, though, is attained in EEG and MEG studies. One line of this research has applied univariate methods to inspect the visuomotor processes underlying the planning and execution of grasp types (Zaepffel and Brochier, 2012; De Sanctis et al., 2013), the representation of goals (van Schie and Bekkering, 2007; Westerholz et al., 2013), or differences between movements and movement observations (Virji-Babul et al., 2010). Another study measured the effects of object perturbations on electrophysiological microstates (Tunik et al., 2008). These studies were limited to rather coarse experimental manipulations (Koester et al., 2016).
However, fine-grained details can be extracted from M/EEG signals. Multivariate approaches reach accuracies comparable to those of fMRI data (Nemrodov et al., 2019). For example, pattern classification has identified a 300 ms window of premotor activity where beta band oscillations convey effector independent representations of grasp vs reach plans (Turella et al., 2016). Further, brain-computer interfaces can decode the kinematics of grasp movements (Agashe et al., 2015, 2016; Jochumsen et al., 2016; Schwarz et al., 2018) even from single trials (Iturrate et al., 2018). However, to date, no study has mapped the timing of visuomotor transformations of human grasp movements.
Here, we used multivariate pattern classification of electrophysiological signals to characterize the dynamics of visuomotor computations underlying the parameter specifications of precision grasps. We identified the timeline of visual object and grasp orientation representations. Participants previewed and then viewed to grasp objects with different (grasp-relevant) shapes and (grasp-irrelevant) textures. Orthogonal to these properties, participants made grasps with clockwise or counter-clockwise orientation, a key component of grasp computations (Tunik et al., 2005; Baumann et al., 2009; Fattori et al., 2010; Monaco et al., 2011). We predicted that shape classification accuracy should rapidly increase within the first 100 ms of object presentation. However, the time course of grasp orientation representations is unknown as well as its onset relative to visual processes and to motor execution. In addition, we extended pattern classification of shape and grasp parameter representations to a cross-temporal and cross-event generalization approach (King and Dehaene, 2014) to elucidate the dynamics of neural representations and their processing stages. Finally, we combined independent component analysis and pattern classification to localize the sources of shape and grasp orientation information, respectively.
Materials and Methods
Participants.
Thirty-five students from the University of Toronto community gave their written and informed consent to participate in 2 EEG and 1 EMG experiments in exchange for payments or course credit. Fifteen students (8 females; median age: 23, range: 20–37) participated in the main EEG experiment, 10 (6 females; median age: 19, range: 18–24) participated in the control EEG experiment, and 10 (5 females; median age: 24, range: 20–38) participated in the EMG experiment. All participants were right-handed (Oldfield, 1971) and had normal or corrected-to-normal vision. All procedures were approved by the Human Participants Review Subcommittee of the University of Toronto and conformed to the ethical standards laid down in the Declaration of Helsinki.
Procedures and apparatus.
Participants sat in a light-sealed room at a table with an experimenter on the left side (Fig. 1A). The right hand of each participant was placed on a button box, where the index finger and the thumb each blocked a beam of infrared light, signaling the presence of their hand. The participants wore earplugs, and they looked at an opaque shutter glass screen (Smart Glass Technology) so that they could neither hear nor see how the experimenter prepared each trial behind the screen. For the purpose of setting up each trial, the experimenter viewed instructions on a monitor on the side and turned on a set of LEDs to light up a black-clad grasp space where the experimenter mounted objects on a platform. The platform was slanted with a square-shaped peg in the middle on which objects were placed always with the same position and orientation, 43 cm away from the participant with the surface of the object tilted toward the participant's line of sight.
All objects were cut out of 2-cm-thick blocks of wood and were either shaped like a “pillow” with four concave edges, or like a “flower” with four convex edges (Fig. 1B). Flowers had smaller surface areas than pillows. However, all objects measured 6 cm across opposing edges, thus affording identical grip sizes (also see Discussion). We chose shape as a visual object property that is relevant for grasping. All objects were painted middle gray on their sides, and their top surfaces were covered with equal amounts of white, red, green, and blue to form one of two kinds of textures. One texture had a white rectangular grid structure with squares of the other colors filling its gaps, and the second texture showed a checkerboard pattern of squares of all colors. All combinations of textures and shapes were equally likely to occur. We chose texture as a visual object property that was irrelevant for grasping. Furthermore, texture helped to create a slightly greater variety of objects to better engage the attention of the participants.
Once the object was set up the experimenter pressed a key that switched off the LEDs and, in darkness, set the shutter glass screen to transparent. Seven-hundred and fifty to 1250 ms later, the LED lights turned on, thus illuminating the object for the participant to see for the first “Preview” time of 200 ms and then turned off for 1–2 s (for timeline, see Fig. 1C). The purpose of this period in darkness was to observe whether grasp representations in darkness are different from those with continuous visual information. It ended with the lights turning on for the second “Go” time. Participants then started to move their right hand, as sensed by the button box and marked as movement onset (note that only at that time the participant's hand came into view through the shutter glass). They reached under the shutter glass screen and grasped the object with the index finger and the thumb, either with a clockwise (CW) or a counterclockwise (CCW) orientation as monitored by the experimenter. The end of the reach-to-grasp movement was defined as the time when the hand crossed a curtain of infrared beams mounted between two 15 cm tall pillars, that were positioned 40 cm apart, immediately in front of the object (this measure slightly underestimated movement end as usually defined, e.g., based on a movement velocity criterion in studies where hand tracking data are available; Le et al., 2014). Finally, the participants picked up the object, placed it on the table near the experimenter and returned their hand to the button box while the experimenter pressed a key for any invalid trials (i.e., incorrect grasp orientations or dropped objects) before the next trial.
Each block contained 60 trials (2 shapes × 2 textures × 15 repetitions in random order) during which participants always grasped the objects with the same CW or CCW orientation as per instructions given at the start of the block. Both instructions were given in random order for 5 blocks each, amounting to a test session of about 3 h. Two such sessions were conducted on two different days for each participant. Breaks were provided in between blocks as requested by the participants.
Control EEG experiment.
The control experiment was motivated by the unexpectedly poor classification performance of texture classification in the main experiment. The control experiment was intended to test whether texture would become classifiable if made relevant for the task. To this end the experimental protocol was identical to the main experiment, except we merged the independent variables corresponding to Texture and Grasp orientation. That is, we instructed participants to associate each of the two textures with one of the two grasp orientations, respectively (6 randomly selected participants paired the grid texture with CW and the checkerboard texture with CCW grasps; the other participants learned the opposite pairing). This way, the textures were behaviorally relevant, in contrast to the first experiment. Also, grasp orientations were performed in a random order rather than in blocks (given that orientation now was linked to texture). Participants completed 10 blocks of 60 trials each block, and so the entire experiment was completed in a single 3 h-session.
EMG experiment.
The EMG experiment followed the same experimental protocol as the main experiment. Participants completed 10 blocks of 60 trials each block, and the entire experiment was completed in a 3 h-session.
EEG acquisition and preprocessing.
EEG data were recorded using a 64-electrode BioSemi ActiveTwo recording system, digitized at a rate of 512 Hz with 24-bit A/D conversion. The electrodes were arranged according to the International 10/20 System. The electrode offset was kept <40 mV.
EEG preprocessing was performed offline in MATLAB using the EEGLAB Toolbox (Delorme and Makeig, 2004) and the ERPLAB Toolbox (Lopez-Calderon and Luck, 2014). Signals from each block were band-pass filtered (noncausal Butterworth impulse response function, 12 dB/oct roll-off) with half-amplitude cutoffs at 0.1 and 40 Hz (to reduce drifts in the data and the impact of electrical noise, respectively). Noisy electrodes (correlation with nearby electrodes <0.6) were interpolated (mean of 1.75 electrodes per subject), and all electrodes were rereferenced to the average of all electrodes. Next, independent component analysis (ICA) was performed to identify and remove components that were associated with blinks (Jung et al., 2000) and eye movements (Chaumon et al., 2015; Drisdelle et al., 2017). The ICA-corrected data were segmented relative to the onset of Preview (−100 to 800 ms) and Go Signal (Go; −100 to 800 ms). In addition, invalid trials and epochs containing abnormal reaction times (<100 ms or >1000 ms) were removed. As a result, an average of 2.43% (range: 0.58%–3.39%) of trials from each subject were removed from further analyses.
Pattern classification of ERP signals across time.
Epochs were averaged into ERP traces to increase the signal-to-noise ratio (SNR) of spatiotemporal patterns (Grootswagers et al., 2017). Specifically, up to 15 epochs within a given block that corresponded to the same condition (i.e., shape, texture, or grasp orientation) were averaged together. Since each block contains 30 epochs for a given shape or texture, 2 separate ERP traces were obtained for a given shape or texture from each block. This procedure resulted in 20 separate ERP traces per condition for Preview and Go, respectively. The traces were then z-scored across time and electrodes. Outliers (i.e., values exceeding 3 SDs from the mean) were thresholded at ±3. We chose this winsorization approach (for similar approaches, see Nemrodov et al., 2016, 2018) to curb the impact of outliers on SVM-based pattern classification while keeping the number of features constant. Alternative forms of outlier treatment would have been replacement of outliers with mean values, or outlier removal. The latter approach bears a certain risk that more information is discarded than necessary because it requires the removal of outliers across all training and testing samples to ensure that the number of features is consistent. Nevertheless, for the current data, we found no differences between the three different approaches; for example, for shape classification using all temporal features at each electrode separately we found electrode involvement to be highly correlated, r ≥ 0.915. Further, all ERP traces were rescaled to a range between 0 and 1.
Next, to increase the robustness of pattern analyses ERP traces were divided into temporal windows with 5 consecutive bins (5 bins * 1.95 ms ≈ 10 ms). For each window, data from all 64 electrodes were concatenated to create 320 features. These features were constructed for the purpose of pattern classification across time, window by window.
Pairwise discrimination of shape, texture, and grasp orientation was conducted using linear support vector machine (SVM; c = 1; LibSVM 3.22, Chang and Lin, 2011) and leave-one-out cross-validation. That is, 19 of 20 pairs of observations were used for training while 1 pair was used for testing. Further, to disentangle visual and motor representations, classification of visual object characteristics (i.e., shape and texture) was conducted across different grasp orientations (i.e., the classifier was trained on one grasp orientation and tested on the other), and classification of grasp orientation was conducted across different visual object characteristics.
Cross-temporal generalization.
To gauge the degree to which representations of visual and motor information remained stable across time and across events, we used a temporal generalization approach (Meyers et al., 2008; Crowe et al., 2010; for review, see King and Dehaene, 2014). Specifically, we tested the classifier trained at a given 10 ms time window on other time windows within the same event (e.g., Preview) as well as the other event (e.g., Go). If a classifier generalizes from one time window to another, then the underlying representations are similar for these two time windows. If, however, a classifier does not generalize, then this indicates that the underlying representation is different.
Pattern classification of ERP signals across electrodes.
To assess the informativeness of each electrode for the classification of ERP signals, we also classified temporal patterns separately for each electrode. Specifically, for each electrode we concatenated the signals across the entire event (from 0 to 800 ms) as features for SVM classification. This was conducted separately for the events Preview and Go.
Source localization of ERP signals.
Despite the limitations of source localization of EEG signals due to electrode distance from sources and volume conduction effects (Nunez, 1981; Makeig et al., 1996; Yao and Dewald, 2005; Verhellen and Boon, 2007), we sought to estimate the approximate brain regions responsible for the dynamics of shape and grasp orientation representations. Specifically, we conducted source analysis (Brainstorm toolbox; Tadel et al., 2011) on statistically independent ERPs that carried either shape or grasp orientation information. To this end, we decomposed Preview and Go ERP traces into independent components (ICs; 64 per person and event) and projected each IC onto electrode space. Subsequently, we cross-decoded shape and grasp orientation using SVM classifiers on concatenated data from all time points (except for the 100 ms baseline). Next, we labeled 'shape' and 'grasp orientation' ICs (a) depending on which classification accuracy was higher, and (b) provided that accuracy was >50%. The labeled ICs were then reprojected together to the electrode space to create “shape” and “grasp orientation” traces. Finally, the traces were averaged and then entered source analysis for each subject separately using the ICBM152 brain model template (Mazziotta et al., 1995) and aligned it with the average EEG electrode positions using Brainstorm's iterative closest point algorithm and visual inspection (fiducial points: nasion, left ear and right ear; Tadel et al., 2011). We estimated noise covariance matrices using the 100 ms baseline before stimulus presentation for Preview and Go. Subsequently, we computed 5001 brain sources using standardized low-resolution brain electromagnetic tomography (sLORETA; Pascual-Marqui, 2002) with dipole orientations normally constrained to the cortex.
EMG acquisition and analysis.
Surface EMG signals were recorded from four muscles (superior trapezius, anterior deltoid, brachioradialis, and first dorsal interosseous) at 2000 Hz using a BioSemi ActiveTwo recording system. Signals were rectified and then high-pass filtered (noncausal Butterworth impulse response function, fourth order) with a cutoff frequency of 2 Hz. Next, we computed a root mean square (RMS) value of the signal at each time-point using a 10 ms gliding window. Finally, signals were down-sampled to 512 Hz and segmented relative to the onset of Preview (−100 to 800 ms) and Go (−100 to 800 ms).
Classifications of EMG patterns across time followed the same procedure as in the main EEG experiment (see Materials and Methods, Pattern classification of ERP signals across time). To investigate the contribution of each muscle group to EMG classifications, we additionally conducted decoding on each muscle separately using temporal patterns across 100 ms time windows from −100 to 800 ms.
Experimental design and statistical analysis.
All meaningful trials (see “EEG acquisition and preprocessing” in the Materials and Methods) were used to decode shape, texture, and grasp orientation. Classifications were performed for each subject separately. Significance of decoding accuracy in time-resolved classification analyses and in those conducted at each electrode were assessed using two-tailed one-sample t tests at the group level correcting for multiple correction using the false discovery rate (FDR; Benjamini and Hochberg, 1995). In addition, we conducted permutation tests at the single-subject level for time-resolved classifications in the main EEG experiment. This was done by comparing each subject's classification accuracy at any given time window against a null distribution of accuracies obtained by classifying data with shuffled condition labels 10000 times. In the cross-temporal generalization analysis, we assessed significance of classification accuracy using nonparametric sign-permutation tests (10000 iterations) with cluster-defining and cluster-size thresholds of p < 0.05 (Nichols and Holmes, 2002). Finally, in the source localization analysis, we determined voxels that were significantly different relative to the baseline using paired samples t test and FDR (q < 0.05).
Results
Behavioral results
Average reaction time (RT; defined as the time between Go onset and movement onset) was 311 ms (SD = 38 ms); and average movement time (MT; defined as the time between movement onset and movement end) was 286 ms (SD = 48 ms). RTs submitted to a three-way repeated-measures ANOVA (grasp orientation × shape × texture) yielded no significant effects (F < 3.84, p > 0.070). The analysis of MTs showed an interaction between grasp orientation and shape (F(1,14) = 20.14, p = 0.001). Post hoc analyses revealed that participants reached for the flower shape faster when they grasped counter-clockwise, but more slowly when they grasped clockwise (mean difference between differences in MT for shape in each grasp orientation: 6 ms; t(14) = 4.487, p < 0.001). No other comparison was significant (F < 3.976, p > 0.066).
Classification time course of shape, texture, and grasp orientation
We applied SVM pattern classification to ERP amplitudes separately at each 10 ms time window so as to examine the time course of discrimination of the target object's visual characteristics (i.e., shape and texture) and grasp orientation. Importantly, we cross-decoded classifiers for shape and texture. That is, we trained and tested them with different grasp orientations. Likewise, classifiers for grasp orientations were trained and tested with different shapes, as well as with different textures (the two resulting curves were then averaged given their similarity). Cross-decoding allowed us to decode the representation of visual characteristics regardless of grasp orientation, and vice versa.
The cross-decoded classification curve for shape rose rapidly after object onset (Fig. 2A, top). During Preview it reached significance 85 ms after object presentation and peaked at 135 ms (i.e., around the time of the P1), followed by a slow decay of information during darkness that remained significant until 600 ms with two additional transient significant intervals later on (630–650 ms; 740–760 ms; 15 of the 15 individual datasets showed significant shape classification). During Go, shape classification yielded a similar curve with a peak at 115 ms with several significant intervals (85–290 ms; 330–380 ms; 430–500 ms; 525–555 ms; 585–595 ms; 610–680 ms; 730–790 ms; 15 of the15 individual datasets showed significant shape classification). Further, we realigned the Go data to individual Movement onsets and recomputed classification accuracy. In comparison (Fig. 2B, top), this produced a shape classification curve that was significantly reduced between 105 and 145 ms with a peak that was noticeably smaller and later (∼180 ms). The curve nevertheless reached significance (145–165 ms; 175–270 ms; 300–310 ms; 340–350 ms; 370–380 ms; 415–455 ms; 475–555 ms; 575–630 ms; 650–680 ms; 710–800 ms). This shows that in part processes underlying visual shape representations are yoked to the visual onset of the object, whereas other visual representations are time-locked to voluntary behavior and therefore appear to directly partake in the visuomotor transformations of grasp planning.
Classification of texture did not reach significance (Fig. 2A, middle). Further, making texture task-relevant did not improve texture classification performance. That is, the control experiment used texture to cue participants as to how to grasp objects. Participants were able to follow the cues (error rate: 0.6%) with similar timing as before (mean RT = 315 ms; SD = 65 ms, t(23) = 0.177, p = 0.861; mean MT = 253 ms; SD = 41 ms, t(23) = 1.756, p = 0.092; no reliable influence of factors Shape or Texture/Grasp Orientation on either variable: F < 1.109, p > 0.320). Classification of texture/grasp orientation yielded a curve that reached significance during Go (Fig. 2C, bottom). However, the improved classification came from confounding texture and grasp orientation in the control experiment. Thus, classifiers relied on two sources of information. To demonstrate this, we estimated confounded classification performance for the main experiment by assuming that the presence of texture and grasp orientation information contributed additively to SVM texture/grasp orientation classification. This is illustrated in Equation 1, where Gtg is the gain of classification accuracy for texture confounded with grasp orientation. Gt and Gg represent texture and grasp orientation information, respectively. Further, because there are fewer observations used to train the classifiers in the control experiment due to shorter test time, we recomputed classifications in our main experiment using only observations from the first test session, thus making the number of observations equal across the two experiments. The predicted curves for grasp orientation/texture classification showed similar performance as the observed curves in the control experiment. Thus, our data provided no evidence that making texture relevant to the task boosted its classification. Due to its poor classification performance, texture was not considered for subsequent analyses for the experiment.
Although cross-decoding of grasp orientation filtered out visual shape and texture information, its classification curves rose as a function yoked to each of the two visual presentations of the object (Fig. 2A, bottom). During Preview, it reached significance 85 ms after object presentation and remained relatively stable. After the Go Signal, classification of grasp orientation became significant at 75 ms and continued to increase thereafter. Thus, grasp orientation decoding was timed remarkably similarly to shape decoding although the onset of significance of individual shape vs grasp orientation classification curves (using permutations with 10,000 iterations, see Materials and Methods) produced no correlations (Preview: r = 0.156, p = 0.578; Go: r = 0.077 p = 0.795), which suggests two largely parallel processes. Furthermore, grasp orientation decoding during Go still showed a very similar decoding curve even when we realigned the data for Movement onset and choose a different baseline (Fig. 2B, bottom).
In sum, classification curves for visual and motor aspects of the task revealed processes that commenced at nearly the same time but followed different time lines. Shape decoding was sensitive to the temporal alignment with the presentation of the object, indicating that information about visual processes up to approximately 145 ms was importantly reflected in the dynamics of the visual ERPs. In contrast, grasp orientation decoding showed curves that ramped up regardless of visual or motor alignment, consistent with a visuo-motor representation, or a motor representation that depends on input from visual processes.
Correlating classification of grasp orientation with reaction time
To further investigate the relationship between the electrophysiological representation of grasp orientation and motor behavior, we extracted two variables from grasp orientation classification during Go: we quantified the rate at which classification rose with the slope of a straight line fitted to the classification accuracies, and we determined the time at which classification accuracy reached significance within each participant (see Materials and Methods). Further, we calculated correlations with reaction times, and bootstrapped them with 10,000 iterations to obtain the 95% confidence intervals (Efron and Tibshirani, 1993).
Reaction times correlated significantly with classification slopes (r = −0.412, confidence interval excluded zero: [−0.734 −0.056]; Fig. 3A). Thus, faster build-up of grasp orientation information was associated with faster reaction times. In addition, there was a trend for a correlation with times of significant classification (r = 0.317, confidence interval: [−0.131 0.645]; Fig. 3B).
Cross-temporal generalization
Temporal classification as applied so far examined whether classifiers, trained on EEG data from a certain time point, are capable of decoding other data from the same time. We next extended this time-specific approach using cross-temporal and cross-event generalization analysis. That is, we trained and tested classifiers on any combination of times from Preview and Go (Fig. 4). To illustrate what kind of information this analysis approach can reveal (King and Dehaene, 2014), three possible outcomes are shown schematically in Figure 4A. The first shows cross-temporal generalization analysis yielding a classification pattern along the diagonal. In other words, classifiers only decode data from the time that they have been trained for. The diagonal pattern indicates that classification is based on information that travels across a chain of transient neural representations, or generators, that activate sequentially. Second, accurate classification could spread above and below the diagonal if representations are more sustained. For example, a single area, or a single set of areas, providing classification information continuously would yield an approximately square-shaped classification pattern. A third kind of pattern would include significant classification along the diagonal as well as within blobs at symmetrical points above and below the diagonal. Such a pattern would reflect a chain of transient representations where later generators reactivate earlier ones (King and Dehaene, 2014).
Cross-temporal generalization analysis of shape revealed a reactivation pattern. That is, classifiers trained on Preview data between 400 and 700 ms were able to decode ERPs between 100 and 200 ms and vice versa (Fig. 4B). This indicates that early generators reactivated after 400 ms, although in a more sustained manner. A similar reactivation pattern was found for Go data (Fig. 4C), and once again, for cross-condition generalization when classifiers trained on Preview data were applied to Go data and vice versa (Fig. 4D). This indicates that shape decoding was based on similar representations during Preview and Go, despite several differences between the two conditions, that is, a dark phase during Preview but not Go, and possible motor signals (or visual information) from shape-specific hand movements during Go but not Preview. In other words, artifacts due to darkness or impending hand movement cannot explain reactivation given the similarity of Figure 4B–D. Note further that the similarity across events (also see Fiehler et al., 2011) does not contradict claims that visual processes for action during darkness and light are systematically different from one another (Westwood and Goodale, 2003) as our analysis does not quantify dissimilarities.
Cross-temporal generalization analysis of grasp orientation yielded a wide generalization pattern for Preview that was less structured (Fig. 4E). Crucially however, grasp orientation during Go produced a chain pattern for the first ∼200 ms after object presentation. Subsequently, and ∼100 ms before movement onset, it expanded into a sustained pattern (Fig. 4F). Surprisingly, cross-condition generalization between Preview and Go yielded a similar chain and sustained pattern that indicates that grasp preparatory processes during Preview likened those during Go (4G).
Finally, we aimed to confirm statistically that the generalization pattern of reactivation for shape, compared with the chain and the sustained pattern for grasp orientation were indeed systematically different from one another at intermediate times. Accordingly, we subtracted the shape generalization matrices from the grasp orientation generalization matrices and then tested the average difference of classification accuracies during the testing times between 100 and 200 ms across all training times (Preview: t(14) = 2.702, p = 0.017; Go: t(14) = 2.622; p = 0.020).
Informativeness of electrodes for classification
To test the degree to which electrodes contributed to shape and grasp orientation classification we applied classification analyses separately to each electrode. This purely temporal analysis revealed that all electrodes, except T8, classified shape at significant levels with peaks at occipital and parietal sites during Preview (Fig. 5). During Go, shape classification peaked near electrode sites O1, Oz, O2, PO3, POz, and PO4 (other significant electrodes included all but Fp1, Fpz, Fp2, AF3, F1, Fz, FC1, TP7, and CP5). For the classification of grasp orientation, significant levels were scattered around frontal, parietal, and occipital sites during Preview. During Go, stronger involvement came from electrodes on the left side of the scalp with peaks around C1, CP1, and CP3 (other significant electrodes included all but FP2, F1, C4, P6, P7, P8, and PO8). Purely temporal data at the level of individual electrodes were less informative for grasp orientation compared with shape whereas our spatio-temporal analyses (Figs. 2, Fig. 4) yielded more similar classification performance (given that peak shape classification was rather transient). This difference suggests that the informativeness of grasp orientation importantly depended on the topography of electrodes.
Source localization of shape information
To identify the approximate location of the brain regions underlying the dynamics of shape representations, we conducted source localization on statistically independent ERP traces that carried shape, but no grasp orientation information (average number of ICs selected for ERP traces: 22.7; mean classification accuracy of shape during Preview: 72%, t(14) = 8.909, p < 0.0001; grasp orientation: 51%, t(14) = 0.506, p = 0.621; mean classification accuracy of shape during Go: 78%, t(14) = 12.240,p < 0.0001; grasp orientation: 60%; t(14) = 6.437, p < 0.0001). Note that ICA was less successful in removing grasp orientation information during Go, and so we will mainly comment on the Preview period. Figure 6A shows cortical sources (paired t tests, p < 0.05, FDR corrected for the number of voxels; see Fig. 6B for similar results for the Go period) during Preview where t-values reflect the probabilistic contribution of a given voxel/dipole to the “shape ERPs” (i.e., the t-values approximately reflect brain activity rather than quality of shape information). We selected three 50 ms windows from Preview where temporal generalization analysis revealed the most robust reactivation pattern: an early window of informativeness (85–135 ms), an intermediate time window before shape reactivation (200–250 ms), and a later window of reactivation (400–450 ms).
Early contributions to shape classification involved extrastriate, posterior parietal, ventral premotor cortex, and posterior insula in the left hemisphere. Right-sided activity stretched from the temporoparietal junction (TPJ) into the superior temporal sulcus and included posterior insula. In addition, there were medial contributions such as from the parieto-occipital sulcus and the precuneus.
For the intermediate time window contributions came from the parieto-occipital junction, and extrastriate cortex in the left hemisphere. Right hemisphere activity was confined to the parieto-occipital junction and a small frontal area near the supplemental motor area.
The later time window of reactivation yielded, once again, bilateral contributions from the parieto-occipital junction, and medial regions around the parieto-occipital sulcus, and there was activation in left extrastriate cortex, and left ventral precentral gyrus, as well as in the right TPJ and right posterior insula. In addition, there was right inferior temporal and prefrontal activity.
Source localization of grasp orientation information
To estimate brain sources contributing to the dynamics of grasp orientation representations, we conducted source localization on ERP traces that carried grasp orientation, but no shape information (average number of ICs selected for ERP traces: 19.7; mean classification accuracy of grasp orientation during Preview: 65%; t(14) = 9.719, p < 0.0001; shape: 44%, t(14)=−2.137,p = 0.051; mean classification accuracy of grasp orientation during Go: 69%, t(14) = 9.624, p < 0.0001; shape: 50%, t(14) = 0.115, p = 0.910). Figure 7, A and B, shows localization results of grasp orientation information for the Preview and Go periods, respectively. We mainly focus on the Go period where temporal generalization yielded the most robust pattern. Just like for shape localization, we selected a window of early informativeness (85–135 ms) and an intermediate time window that captured a phase of sustained grasp orientation representation before movement onset (200–250 ms). We also inspected a later phase of sustained representation during movement execution (400–450 ms) with the caveat of possible movement artifacts.
Our results suggested relatively wide networks of brain areas underlying “grasp orientation ERPs” (consistent with our previous observations that classification performance of grasp orientation using electrode topography was good, see Figs. 2, 4, whereas classification at individual electrodes was relatively poor, see Fig. 5). In detail, early left extrastriate contributions extended into inferior temporal cortex. Also, there was activity in left superior and intraparietal cortex and in ventral premotor and prefrontal regions as well as different parts of the insula. The right hemisphere yielded similar insular activity. Further, activity stretched from right extrastriate into inferior parietal and temporal cortex along the superior temporal sulcus, and there was activity along the right superior frontal sulcus. In addition, medial contributions came from the anterior cingulate.
The intermediate time window saw an expansion of earlier sources in both hemispheres, such as in superior parieto-occipital and in lateral inferior parietal cortex. Crucially, activity in the intra- and superior parietal cortex became more prominent, especially in the left hemisphere, and reached into a region around the precuneus. Finally, the later time window showed a general decline in sources.
Potential eye movement confounds
Although we made significant efforts to remove eye movement artifacts from our data (see Materials and Methods), some of these artifacts may have survived preprocessing. So, it is possible that left-over eye movement artifacts contributed to the decoding analyses (Quax et al., 2019), given that eye movements could have been different depending on condition. Specifically, during grasps a first saccade typically aims for the center of mass of an object, but a second saccade targets one of its grasp points (Brouwer et al., 2009), this way revealing the intended grasp orientation. Furthermore, similar eye movement patterns (Mostert et al., 2018) during Preview and Go could have inflated representational similarity (Fig. 4) between the two conditions. To estimate whether the influence of eye movement artifacts on EEG data can decode shape and grasp orientation, we first inspected data that were mainly caused by eye movements. That is, we extracted those independent components that preprocessing had identified as eye movement artifacts (mean number of components = 2.49, SD = 1.63) and conducted classifications on the respective mostly “eye-based” ERP patterns in the frontal electrodes (Fp1, Fpz, Fp2, AF7, AF3, AFz, AF4, and AF8). Results in Figure 8A show that eye-based ERP decoding of shape and grasp orientation were at chance level during Preview, indicating that eye movements did not contribute to our Preview classification analyses. This also rules out that representational similarities between Preview and Go were inflated because participants made similar eye movements both times. Nevertheless, during Go, eye-based ERPs were capable of decoding grasp orientation during several intervals: 95–125 ms; 195–230 ms; 370–380 ms; 400–440; 455–525 ms; 545–565 ms; 680–790 ms. Given this, we next investigated whether eye movement artifacts also contaminated our “neural-based” ERPs to the effect that the artifacts would explain decoding of grasp orientation. If such contribution existed, then decoding results from eye-and neural-based ERPs should be similar. We tested this in two ways. First, we compared the time courses of eye-and neural-based decoding. That is, we correlated across time the eye- and neural-based decoding accuracies during Preview and Go within each subject, and we tested the Fisher's z-transformed correlations against 0 at the group level. This resulted in no significant correlations (t < 1.491; p > 0.146). Second, we correlated across participants the average decoding accuracies during times when eye-based decoding was significant with the average neural-based decoding accuracies from the same time periods. The resulting correlation was 0.298 (p = 0.281). Finally, we reasoned that eye-based ERPs might be statistically entirely different from the eye movement artifacts that have potentially contaminated our neural-based ERPs. To test this possibility, we decoded shape and grasp orientation information from neural-based ERPs only using frontal electrodes. However, decoding only showed spurious significance for grasp orientation during Go (Fig. 8B), which is also consistent with our observation that frontal electrodes carried little information about shape or grasp orientation (Fig. 5). In sum, our analyses show that eye movement artifacts cannot account sufficiently for our classification results.
Potential EMG confounds
As another potential source of confounds, muscle activity near the head (e.g., trapezius) during movement could have been picked up by EEG sensors, and, as a result, contributed to the classifications of grasp orientation during Go. Further, it is possible that upon previewing the object participants made anticipatory movements resulting in muscle activities during the delay period. Therefore, we conducted an additional EMG experiment to investigate the time course of muscle activity as well as their involvement in the classification of grasp orientation.
Figure 9 displays the results from the EMG experiment. EMG activity during Preview was minimal (Fig. 9A) and not informative (Fig. 9C). Thus, we can rule out that muscle activity contributed to EEG grasp orientation decoding during Preview. Moreover, the similarity of Preview representations to those during Go (Fig. 4) suggests that decoding during Go did not rely on EMG alone either. Nevertheless, we did observe EMG activity during Go, as expected. Muscles engaged sequentially from trapezius (before 100 ms) to anterior deltoid (∼100 ms), to brachioradialis (∼150 ms), and then to dorsal interosseous (∼250 ms; Fig. 9B). Decoding of grasp orientation using all muscle activities across 10 ms time bins during Go resulted in above-chance classifications in several intervals: 105–115 ms, 155–210 ms, 230–250 ms, 270–800 ms (Fig. 9D). However, muscle-specific decoding of grasp orientation (Fig. 9F) showed that EMG was informative for muscles located far from the EEG cap and thus had minimal impact on EEG patterns: activity from anterior deltoid was informative during 100–200 ms and from 300 ms and onwards, and dorsal interosseous became informative from 200 ms and onwards with brachioradialis showing trends during 200–300 ms, 400–500 ms, 600–800 ms. In contrast, superior trapezius merely showed a trend during 300–400 ms. In sum, neither during Preview nor during Go was there any evidence that EMG artifacts contributed significantly to EEG classification results.
Discussion
We cross-decoded ERP signals to investigate the temporal dynamics of visual and motor computations underlying visually guided precision grasps during movement planning and execution. Signals emanating from visual and parieto-frontal networks allowed for robust decoding of visual and motor information. Shape information yielded decoding curves that peaked shortly after object presentation and slowly decaying thereafter. Grasp orientation information became significant at similar times as shape representations but ramped up more gradually. Our data allow for novel insights into the computations underlying the control of human precision grasps, separating object-and grasp-related processes.
To identify pure object-related representations of shapes, we trained and tested shape classifiers on data collected from different grasp orientation conditions, respectively. Thus, we largely removed shape representations that might have been embedded in grasp plans. Grasp-independent information about shape was sensitive to how we aligned the ERPs in time. Only for data aligned to object onset, classification peaked shortly after, indicating that early sources of shape information came from dynamic visual representations. However, sustained representations of shape continued to carry classifiable information reflected in a fundamentally different curve that peaked ∼50 ms later for data realigned to movement onset. This shows that movement preparations came with, and might have caused, the activation of shape representations. Interestingly, this is consistent with an observation from temporal generalization analysis where later shape representations at the time of motor execution reactivated early shape representations.
This reactivation indicates that shape information in visual cortex is recruited around the time of action. Thus, it might parallel fMRI studies (Singhal et al., 2013; Monaco et al., 2017) where visual cortex activity reemerges during grasp execution and is functionally relevant for grasping (Cohen et al., 2009). However, such fMRI paradigms found reactivation for grasping in darkness, and not for cancelled grasping (Monaco et al., 2017). In contrast, we found reactivation first in darkness, when no grasping was required and, then, again, around the time of grasping but with lights on. Although these discrepancies could arise from different measures of reactivation (similar informativeness vs BOLD signal) and different time scales for EEG and fMRI, it is possible that EEG reactivation is a passive visual event (also see Cichy et al., 2014; Wen et al., 2019). More research is required to confirm whether sensorimotor control triggers not only shape representations as found in ERPs aligned to movement onset, but also reactivation as revealed with temporal generalization.
Temporal generalization does suggest that shape classification was little influenced by information about object size or weight. Although pillow-shaped objects were larger and heavier than flowers, these differences would have mattered more or less at different times: early, during preview object weight should have mattered less than during movement execution, and during execution apparent size should have mattered less because visuomotor grasp control is governed by physical grip size (Haffenden et al., 2001; Ganel and Goodale, 2003) which was identical for all objects. Essentially, classifiers picking up on size/weight differences should not generalize from Preview to Go, and around movement onset, they should have produced little evidence for reactivation of early shape representations. In sum, apparent size or weight likely played a minor role in shape classification.
To estimate the origin of shape classification signals we used ICA and SVMs to generate ERP traces that carried shape, but no grasp orientation information and submitted them to source localization. Although ERP source localization is spatially limited we can tentatively say that early shape classification relied on information originating from earlier visual and posteroparietal regions. Medial parietal sources suggest contributions of superior parieto-occipital complex (SPOC) or anterior precuneus (Gallivan et al., 2011b). We also found evidence for early shape contribution from ventral premotor cortex, which, in monkeys, has been shown to receive visual object information from the anterior intraparietal area (Schaffelhofer and Scherberger, 2016). Here, we found anterior intraparietal sulcus (aIPS) to convey no shape information, inconsistent with previous studies where aIPS did represent shape (Murata et al., 1996; Króliczak et al., 2008; Schaffelhofer and Scherberger, 2016). As one possible explanation, shape representations in aIPS might be so intricately linked with representations of object or grasp orientation (Murata et al., 2000; Tunik et al., 2005; Króliczak et al., 2008; Michaels and Scherberger, 2018) that they do not withstand cross-decoding. Alternatively, perhaps the signal-to-noise ratio in aIPS was limited, or source localization was inaccurate. In contrast, right TPJ conveyed shape and orientation information with shape activation perhaps reflecting an involvement of stimulus-driven attentional processes in shape perception (Corbetta and Shulman, 2002). The early network of sources faded during intermediate times and reactivated later on.
Next, we identified representations purely related to grasp orientation. We trained and tested grasp orientation classifiers on data collected from different shape (and texture) conditions, respectively, to remove orientation information sourced from visual object representations. This seemingly vision-independent information yielded classification curves that rose shortly after visual stimulus onset with similar timing as shape classification. Further, orientation curves predicted behavior, and they remained largely unchanged for ERPs aligned to movement onset. This indicates that grasp orientation representations stemmed from visuomotor processes that depended on object presentation without representing the objects themselves, suggesting that the sight of the object triggered grasp-relevant computations. For instance, grasping requires computing grasp points on the objects together with the hand movements to aim for the points (Blake et al., 1993).
To tease apart the phases of grasp computations, temporal generalization revealed that early grasp computations passed through a chain of representations, perhaps from visual to motor stages. Indeed, early contributions originated from mainly visual regions. Signals especially in the left hemisphere came from occipitotemporal cortex together with left ventral premotor regions, all of which contribute to action selection (Astafiev et al., 2004; Turella et al., 2016; Ariani et al., 2018), consistent with their involvement in biological motion perception, and in hand and tool use representations (Gallivan et al., 2013a,b; Brandi et al., 2014; Gallivan and Culham, 2015; Lingnau and Downing, 2015). Thus, we argue that visual representations of the upcoming hand movements and future hand posture, and of tool use might have formed as integral components of sensorimotor control, that is, as high-level components of an “inverse model” within which motor control converts sensory representations of an action goal into motor commands (Wolpert and Kawato, 1998). Alternatively, occipitotemporal cortex recruitment could have reflected the activation of abstract action representations (Tucciarelli et al., 2015). Additionally, early grasp orientation sources especially in the right hemisphere included occipitoparietal areas and stretched into the superior temporal sulcus, and so perhaps encoded the interactive aspects of the task of passing the object on to the experimenter (Blanke et al., 2005; Carter and Huettel, 2013), although this speculation requires experimental confirmation.
At intermediate times, ∼140 ms before movement onset, grasp orientations attained more sustained representations, prominently involving similar occipitotemporal and occipitoparietal regions as earlier. Moreover, left intra-and superior parietal regions, precuneus as well as ventral premotor cortex activated, as expected for a reach-to-grasp task (Gallivan et al., 2011b, 2013a,b). Among these brain regions, the precuneus in general and, more specifically, SPOC or area V6A is of particular interest because it is known as a key region for grasp orientation in neurophysiological (Fattori et al., 2004, 2009, 2010), human fMRI (Monaco et al., 2011) and patient research (Wood et al., 2017).
Finally, later times after movement onset showed similar though diminished networks of grasp orientation information. Our EMG experiment suggests that these results were not influenced by EEG picking up muscle activity, since neck muscle activity carried no information about grasp orientation. Nevertheless, it is still possible that arm movements caused head movements and, thus, small shifts in electrodes that added artifacts to the EEG signal, making their interpretation difficult, especially given the sensitivity of our multivariate approach.
Despite the demonstrated capability of classification analysis to extract shape and grasp orientation information from EEG signals, we were unable to decode texture-related information, even when texture was behaviorally relevant. The difficulty with decoding texture representations constitutes a contrast to other research. fMRI experiments have identified texture sensitivity, for example in the collateral sulcus (Cant and Goodale, 2011) perhaps because of more natural and perceptually distinct textures or perhaps fMRI is more suitable than EEG to detect texture sensitivity for anatomical reasons. This is not to say EEG signals contain no visual surface information at all. Surface features of faces are well suited to reconstruct facial appearance from EEG signals (Nemrodov et al., 2019), probably highlighting the extraordinary aptitude of the visual system to process face information.
In sum, our results demonstrate ERP patterns provide a rich source of information about the neural mechanisms underlying the visuomotor computations of grasp planning and execution. Despite our orthogonal approach of cross-decoding object and action features, representations reveal intertwined visual and motor processes. Shape representations depend on motor processes evoking ideas of reafferent neural mechanisms. Grasp orientations emerge from stages that seem to involve visual representations. Our work offers novel insights into the temporal structure of human visuomotor control.
Footnotes
This work was supported in part by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).
The authors declare no competing financial interests.
- Correspondence should be addressed to Matthias Niemeier at niemeier{at}utsc.utoronto.ca