How and where in the human brain high-level sensorimotor processes such as intentions and decisions are coded remain important yet essentially unanswered questions. This is in part because, to date, decoding intended actions from brain signals has been primarily constrained to invasive neural recordings in nonhuman primates. Here we demonstrate using functional MRI (fMRI) pattern recognition techniques that we can also decode movement intentions from human brain signals, specifically object-directed grasp and reach movements, moments before their initiation. Subjects performed an event-related delayed movement task toward a single centrally located object (consisting of a small cube attached atop a larger cube). For each trial, after visual presentation of the object, one of three hand movements was instructed: grasp the top cube, grasp the bottom cube, or reach to touch the side of the object (without preshaping the hand). We found that, despite an absence of fMRI signal amplitude differences between the planned movements, the spatial activity patterns in multiple parietal and premotor brain areas accurately predicted upcoming grasp and reach movements. Furthermore, the patterns of activity in a subset of these areas additionally predicted which of the two cubes were to be grasped. These findings offer new insights into the detailed movement information contained in human preparatory brain activity and advance our present understanding of sensorimotor planning processes through a unique description of parieto-frontal regions according to the specific types of hand movements they can predict.
Significant developments in understanding the neural underpinnings of highly cognitive and abstract processes such as intentions and decision-making have predominantly come from neurophysiological investigations in nonhuman primates. Principal among these has been the ability to predict or decode upcoming sensorimotor behaviors (such as movements of the arm or eyes) based on changes in parieto-frontal neural activity that precede movement onset (Andersen and Buneo, 2002; Gold and Shadlen, 2007; Andersen and Cui, 2009; Cisek and Kalaska, 2010). To date, the ability to predict goal-directed movements based on intention-related cortical signals has almost entirely been constrained to invasive neural recordings in nonhuman primates. Recently, however, advances in neuroimaging using pattern classification, a multivariate statistical technique used to discriminate classes of stimuli by assessing differences in the elicited spatial patterns of functional magnetic resonance imaging (fMRI) signals, have made it possible to probe the cognitive contents of the human mind with a level of sensitivity previously unavailable. Indeed, pattern classification has provided a wealth of knowledge within the domain of sensory-perceptual processing, showing that visual stimuli being viewed (Haxby et al., 2001; Kamitani and Tong, 2005), imagined (Stokes et al., 2009), or remembered (Harrison and Tong, 2009) and that categories of presented auditory stimuli (Formisano et al., 2008) can be accurately decoded from the spatial pattern of signals in visual and auditory cortex, respectively.
Few pattern classification experiments to date, however, have examined the primary purpose of perceptual processing: the planning of complex object-directed actions. Given the rather poor understanding of the human sensorimotor planning processes that guide target-directed behavior, the goals of this experiment were twofold. The first goal was to examine whether object-directed grasp and reach actions with the hand can be decoded from intention-related activity recorded before movement execution, as has only been shown previously with neural recording studies in monkeys (Andersen and Buneo, 2002). The second goal, pending success of the first, was to determine whether different parieto-frontal brain areas can be characterized according to the types of planned movements they can decode. For instance, we questioned whether plan-related activity in interconnected reach-related areas, such as superior parietal cortex, middle intraparietal sulcus (IPS), and dorsal premotor (PMd) cortex (Andersen and Cui, 2009), can predict an upcoming reach movement. Similarly, we questioned whether preparatory signals in interconnected hand-related areas, such as posterior (pIPS) and anterior (aIPS) IPS and ventral premotor (PMv) cortex (Rizzolatti and Matelli, 2003; Grafton, 2010), can predict upcoming grasp movements and moreover even discriminate different precision grasps. More revealingly, we wondered whether the increased sensitivity of decoding approaches would enable us to predict an upcoming movement from brain regions not previously implicated in coding particular hand actions. Given that conventional fMRI analyses in humans have shown widespread, highly overlapping, and essentially undifferentiated activations for different movements (Culham et al., 2006), combined with mounting evidence that standard fMRI methods may ignore the neural information contained in distributed activity patterns (Harrison and Tong, 2009), we expected that our pattern classification approach might offer a new understanding of how various parieto-frontal brain regions contribute to the planning of goal-directed hand actions.
Materials and Methods
To address these two main questions, we measured activity across the whole brain using fMRI while human subjects performed a delayed object-directed movement task. The task required three different hand actions to be performed on a target object comprising a small block attached atop a larger block (Fig. 1). These actions included grasping the top (GT), grasping the bottom (GB), or touching the side (touch) of the target object (Fig. 1B). This delayed movement task allowed us to separate, in time, the transient neural activity associated with visual responses (preview phase) and movement execution (execute phase) responses from the more sustained plan-related responses that evolve before the movement (plan phase) (Andersen and Buneo, 2002; Beurze et al., 2007) (Fig. 1C,D). This experimental design permits a direct investigation of whether pattern classifiers implemented during the planning phase of an action in a given brain area can decode (1) upcoming grasp versus touch actions (GT vs touch and GB vs touch), two general types of hand movements requiring slight differences in wrist orientation and hand preshaping, and (2) upcoming grasp movements from each other (GT vs GB), performed on different blocks, requiring far more subtle differences in size and location. Emphasis on decoding during the planning phase of actions has the added advantage of using activity patterns uncontaminated by the subject's limb movement. Importantly, given this task design, in which all actions are performed on a centrally located object that never changes position from trial-to-trial, any movement decoding during planning would be independent of simple retinotopic and general attention-related differences across trial types.
First, to localize the common brain areas among individuals in which to perform pattern analyses, we searched for regions at the group level preferentially involved in movement planning. To do this, we contrasted activity elicited by the planning of a hand action (i.e., after movement instruction) versus the transient activity elicited by visual presentation of the object before the instruction (plan > preview). We reasoned that, compared with the activity elicited when the object was illuminated and the subject was unaware of the action to be performed (preview phase), areas involved in movement planning should show heightened responses once movement instruction information has been given (plan phase), although the object was visible in both phases. This rationale provides a logical extension of recent studies that examined areas involved in planning to temporally spaced instructions about target location, effector to be used, and grasp type in fMRI movement tasks (Beurze et al., 2007, 2009; Chapman et al., 2011). This group contrast allowed us to define 14 well-documented action-related regions-of-interest (ROIs) as well as three sensory-related ROIs that could then be reliably identified in single subjects with the same contrast. In each subject, we then iteratively trained and tested pattern classifiers in each predefined ROI to determine whether, before movement, its preparatory spatial activity patterns were predictive of the hand movement to be performed.
Nine right-handed volunteers participated in this study (five males; mean age, 26.2 years) and were recruited from the University of Western Ontario (London, ON, Canada). One subject was excluded as a result of head motion beyond 1 mm translation and 1° rotation in their experimental runs (see below, MRI acquisition and preprocessing). Informed consent was obtained in accordance with procedures approved by the University of Western Ontario Health Sciences Research Ethics Board.
Setup and apparatus.
Each subject's workspace within the MRI scanner consisted of a black platform placed over their waist and tilted away from the horizontal at an angle (∼10–15°) that maximized comfort and target visibility. To facilitate direct viewing of the workspace, we also tilted the head coil (∼20°) and used foam cushions to give an approximate overall head tilt of 30° from supine (Fig. 1A). Participants performed actions with the right hand and had the right upper arm braced such that arm movement was limited to the elbow and wrist, creating an arc of reachability [movements of the upper arm have been shown to cause perturbations in the magnetic field and induce artifacts in the participant's data (Culham, 2004)]. The target object was made up of a smaller cube atop a larger cube (bottom block, 5 × 5 × 5 cm; top block, 2.5 × 2.5 × 1.5 cm) and was secured to the workspace at a location along the arc of reachability for the right hand, at the point corresponding to the participant's sagittal midline. The exact placement of the object on the platform was adjusted to match each participant's arm length such that all required movements were comfortable. During the experiment, the object was illuminated from the front by a bright yellow light emitting diode (LED) attached to flexible plastic stalks (Loc-Line; Lockwood Products). To control for eye movements, a small green fixation LED was placed immediately above the target object (fixation). Subjects were asked to always foveate the fixation LED during functional scans. Experimental timing and lighting were controlled with in-house software created with Matlab (MathWorks).
For each trial, the subjects were required to perform one of three actions on the object: GT, using a precision grip with the thumb and index finger placed on opposing surfaces of the cube; GB, using the same grip; or manually touch the side of the object with the knuckles (transport the hand to the object without hand preshaping). Importantly, for each trial, the graspable object never changed its centrally located position.
Experiment design and timing.
To isolate the visuomotor planning response from the visual and motor execution responses, we used a slow event-related planning paradigm with 32 s trials, each consisting of three distinct phases: preview, plan, and execute (Fig. 1C). We adapted this paradigm from one of our previous studies (Chapman et al., 2011) as well as previous work with eye movements and working memory that have successfully parsed delay activity from the transient responses after the onset of visual input and movement execution (Curtis and D'Esposito, 2003; Curtis et al., 2004, 2005). Each trial was preceded by a period in which participants were in complete darkness except for the fixation LED on which they maintained their gaze. The trial began with the preview phase and illumination of the workspace and centrally located object. After 6 s of the preview phase, a voice auditory cue (0.5 s; one of “top,” “bottom,” or “touch”) was given to the subject and instructed the corresponding upcoming movement, marking the onset of the plan phase. Although participants had visual information of the object to be acted on during the preview phase, only in the plan phase did they know which action was to be performed, thus providing all the information necessary to prepare the upcoming movement. Critically, the visual information during the preview and plan phases was constant for all trials, only the planned movements changed. After 10 s of the plan phase, a 0.5 s auditory beep cue instructed participants to immediately execute the planned action (for a duration of ∼2 s), initiating the execute phase of the trial. Two seconds after the beginning of this go cue, the illuminator was extinguished providing 14 s of darkness/fixation that allowed the blood oxygenation level-dependent (BOLD) response to return to baseline before the next trial [intertrial interval (ITI) phase]. Other than the execution phase of each action, throughout the other phases of the trial (preview, plan, and ITI) the hand was to remain still and in a relaxed ‘home’ position on the platform to the right of the object. The three trial types, with six repetitions per condition (18 trials total per run), were pseudorandomized within a run and balanced across all runs so that each trial type was preceded and followed equally often by every other trial type across the entire experiment.
During the anatomical scan (collected at the beginning of every experiment) and before entering the scanner, brief practice sessions were conducted (equivalent to the length of one experimental functional run) to familiarize participants with the paradigm, especially the delay timing, which required performing the cued action only at the beep (go) cue. A testing session for one participant included setup time (∼45 min), eight or nine functional runs, and one anatomical scan and lasted ∼2.5–3 h. We did not conduct eye tracking during the scan session because there are no MR-compatible eye trackers that can monitor gaze in the head-tilted configuration (because of occlusion from the eyelids). Nevertheless, multiple behavioral control experiments from our laboratory have shown that subjects can maintain fixation well under experimental testing.
MRI acquisition and preprocessing.
Imaging was performed on a 3 tesla Siemens TIM MAGNETOM Trio MRI scanner. The T1-weighted anatomical image was collected using an ADNI MPRAGE sequence [repetition time (TR), 2300 ms; echo time (TE), 2.98 ms; field of view, 192 × 240 × 256 mm; matrix size, 192 × 240 × 256; flip angle, 9°; 1 mm isotropic voxels]. Functional MRI volumes were collected using a T2*-weighted single-shot gradient-echo echo-planar imaging acquisition sequence [TR, 2000 ms; slice thickness, 3 mm; in-plane resolution, 3 × 3 mm; TE, 30 ms; field of view, 240 × 240 mm; matrix size, 80 × 80; flip angle, 90°; and acceleration factor (integrated parallel acquisition technologies (or IPAT) of 2 with generalized autocalibrating partially parallel acquisitions reconstruction (or GRAPPA)]. We used a combination of parallel imaging coils to achieve a good signal/noise ratio and to enable direct viewing without mirrors or occlusion. We tilted (∼20°) the posterior half of the 12-channel receive-only head coil (six channels) and suspended a four-channel receive-only flex coil over the anterosuperior part of the head. Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ∼30° caudal tilt with respect to the anterior-to-posterior commissure (ACPC) line, providing near whole-brain coverage.
Preprocessing included slice scan-time correction, 3D motion correction (such that each volume was aligned to the volume of the functional scan closest in time to the anatomical scan), and high-pass temporal filtering (three cycles per run). We also performed functional-to-anatomical coregistration such that the axial plane of functional and anatomical scans passed through the ACPC space, which was then transformed into Talairach space (Talairach and Tournoux, 1988). Other than inadvertent smoothing arising from the sinc interpolation for all transformations, no additional spatial smoothing was performed. Talairach-transformed data was only used for group voxelwise analyses to define the planning-related ROIs common across all subjects. These same areas were then defined anatomically within each subject's ACPC data. We decided to define ROIs within the ACPC data in this way because multivoxel classification analysis discriminates spatial patterns across voxels and the additional spatial interpolations inherent in normalization may hinder such analyses. Indeed, pattern classification using three of the subject's Talairach-transformed data showed that decoding accuracies were on average ∼1–2% less during both plan and execute time phases than in the corresponding subject's ACPC data. Using the ACPC data also had the advantage that each region of interest could be reliably identified in single subjects regardless of variations in slice planes, a particular problem given the sulcal variability of parietal cortex. The cortical surface from one subject was reconstructed from a high-resolution anatomical image, a procedure that included segmenting the gray and white matter and inflating the surface at the boundary between them. This inflated cortical surface was used to display group activation for figure presentation (Fig. 2).
For each participant, functional data from each session were screened for motion and/or magnet artifacts by examining the time course movies and the motion plots created with the motion correction algorithms. Any runs that exceeded 1 mm translation or 1° rotation within the run were discarded, leading to the removal of all runs from one subject and one run from another subject. Action performance was examined offline from videos recorded using an MR-compatible infrared-sensitive camera that was optimally positioned to record the participant's movements during functional runs (MRC Systems). No errors were observed, likely because, by the time they actually performed the required movements in the scanner, subjects were well trained in the delay task. All preprocessing and analyses were performed using Brain Voyager QX version 2.12 (Brain Innovation).
Regions of interest.
To localize specific planning-related areas in which to implement pattern recognition analyses, we used a general linear model (GLM) group random-effects (RFX) voxelwise analysis (on the Talairach-transformed data). Predictors were defined at the onset of the preview, plan, and execute periods for each individual trial with a value of 1 for (1) three volumes during the preview phase, (2) five volumes during the plan phase, and (3) one volume for the execute phase and 0 for the remainder of the trial period (ITI). Each of these predictors was then convolved using a Boynton hemodynamic response function (Boynton et al., 1996). Data were processed using a percentage signal change transformation.
Using the GLM, we contrasted activity for movement planning versus the simple visual response to object presentation [plan > preview: (GT plan + GB plan + touch plan) vs (GT preview + GB preview + touch preview)]. This plan > preview statistical map of all positively active voxels (RFX, t(7) = 3.5, p < 0.01) was then used to define 17 ROIs [foci of activity selected within a (15 mm)3 cube centered on a particular anatomical landmark; only clusters of voxels larger than 297 mm3 were used (minimum cluster size estimated by 1000 Monte Carlo simulations of p < 0.05 corrected, implemented in the cluster threshold plug-in for BVQX)], which could then be localized in single subjects. Fourteen of these ROIs (across parietal, motor, and premotor cortex) were selected based on their well documented and highly reliable coactivations across several movement-related tasks and paradigms (Andersen and Buneo, 2002; Chouinard and Paus, 2006; Culham et al., 2006; Filimon et al., 2009; Cisek and Kalaska, 2010; Filimon, 2010; Grafton, 2010) and the other three ROIs (somatosensory cortex and left and right Heschl's gyrus) were selected as regions known to respond to transient stimuli (i.e., sensory and auditory events) and often activated in experimental contexts but not expected to necessarily participate in sustained movement planning or intentional-related processes (i.e., to serve as sensory control regions). Importantly, all these ROIs are easily defined according to anatomical landmarks (sulci and gyri) and functional activations in each individual subject's ACPC data (see below, ROI selection). Critically, given the contrast used to select these 17 areas (i.e., plan > preview), their activity is not biased to show any plan-related pattern differences between any of the experimental conditions (for confirmation of this fact, see the univariate analyses in Fig. 5).
Voxels submitted for pattern classification analysis were selected from the plan > preview GLM contrast on single-subject ACPC data and based on all activity within a (15 mm)3 cube centered on defined anatomical landmarks for each of the 17 ROIs (for details, see below, ROI selection). We chose (15 mm)3 cubes for our ROI sizes not only to allow for the inclusion of numerous functional voxels for pattern classification (an important consideration) but also to ensure that adjacent ROIs did not overlap. These ROIs were selected at a threshold of t = 3, p < 0.003, from an overlay of each subject's activation map (cluster threshold corrected, p < 0.05, so that only voxels passing a minimum cluster size were selected; average minimum cluster size across subjects, 110 mm3). All univariate statistical tests used the Greenhouse–Geisser correction and for post hoc tests (two-tailed paired t tests), we used a threshold of p < 0.05 (see Fig. 5). Only significant results are reported.
The following ROIs were chosen: left and right superior parieto-occipital cortex (SPOC), defined by selecting voxels located medially and directly anterior to the parieto-occipital sulcus on the left and right (Gallivan et al., 2009); left anterior precuneus (L-aPCu), defined by selecting voxels farther anterior and superior to the L-SPOC ROI, near the transverse parietal sulcus (in most subjects, this activity was located medially, within the same sagittal plane as SPOC, but in a few subjects, this activity was located slightly more laterally) (Filimon et al., 2009); left pIPS (L-pIPS), defined by selecting activity at the caudal end of the IPS (Beurze et al., 2009); left middle IPS (L-midIPS), defined by selecting voxels approximately halfway up the length of the IPS, on the medial bank (Calton et al., 2002), near a characteristic “knob” landmark that we observed consistently within each subject; left region located posterior to L-aIPS (L-post aIPS), defined by selecting voxels just posterior to the junction of the IPS and post-central sulcus (PCS), on the medial bank of the IPS (Culham, 2004); left aIPS (L-aIPS), defined by selecting voxels directly at the junction of the IPS and PCS (Culham et al., 2003); left supramarginal gyrus (L-SMG), defined by selecting voxels on the supramarginal gyrus (SMG), lateral to the anterior segment of the IPS (Lewis, 2006); left motor cortex, defined by selecting voxels around the left “hand knob” landmark in the central sulcus (CS) (Yousry et al., 1997); left PMd (L-PMd), defined by selecting voxels at the junction of the precentral sulcus (PreCS) and superior frontal sulcus (SFS) (Picard and Strick, 2001); left precentral gyrus, defined by selecting voxels lateral to the junction of the PreCS and SFS, encompassing the precentral gyrus and posterior edge of the PreCS; left PMv (L-PMv), defined by selecting voxels slightly inferior and posterior to the junction of the inferior frontal sulcus (IFS) and PreCS (Tomassini et al., 2007); left presupplementary motor area (L-PreSMA), defined by selecting bilateral voxels (although mostly left-lateralized) superior to the middle/anterior segment of the cingulate sulcus, anterior to the plane of the anterior commissure and more anterior and inferior than those selected for left supplementary area (Picard and Strick, 2001); left supplementary motor area (L-SMA), defined by selecting voxels bilaterally (although mostly left-lateralized) adjacent and anterior to the medial end of the CS and posterior to the plane of the anterior commissure (Picard and Strick, 2001); left somatosensory cortex (L-SS cortex), defined by selecting voxels medial and anterior to the aIPS, encompassing the postcentral gyrus and PCS; and left (L-HG) and right (R-HG) Heschl's gyri, defined by selecting voxels halfway up along the superior temporal sulcus (STS), on the superior temporal gyrus (between the insular cortex and outer-lateral edge of the superior temporal gyrus) (Meyer et al., 2010). See Table 1 for details about ROI coordinates and sizes and Figure 2 for anatomical locations on one representative subject's brain.
Non-brain control regions.
To demonstrate classifier performance outside of our plan network ROIs, we defined two additional control ROIs in which no BOLD signal was expected and thus no reliable classification performance should be possible. To select these ROIs in individual subjects, we further reduced our statistical threshold (after specifying the plan > preview network within each subject) to t = 0, p = 1 and selected all positive activation within a (15 mm)3 cube centered on two consistent points: (1) within each subject's right ventricle and (2) just outside the skull of the brain, near right visual cortex in the ACPC plane.
Multivoxel pattern analysis.
Multivoxel pattern analysis (MVPA) was performed with a combination of in-house software (using Matlab) and the Princeton MVPA Toolbox for Matlab (http://code.google.com/p/princeton-mvpa-toolbox/) using a support vector machines (SVM) binary classifier (libSVM, http://www.csie.ntu.edu.tw/∼cjlin/libsvm/). The SVM model used a linear kernel function and a constant cost parameter, C = 1 [congruent with many other fMRI studies (LaConte et al., 2003; Mitchell et al., 2003; Mourão-Miranda et al., 2005; Haynes et al., 2007; Pessoa and Padmala, 2007)] to compute the hyperplane that best separated the trial responses.
Voxel pattern preparation.
For each voxel within a region and each trial, we extracted the average percentage signal change activation corresponding to the 4 s time windows specified by each of the gray shaded bars in Figures 1D and 3 (i.e., the activity elicited by each distinct phase of the trial: plan, preview, and execute) and entered these as data points for pattern classification. Beyond allowing us to characterize which types of movements within an area could be accurately decoded, this time-specific approach also allowed us to investigate when predictive information pertaining to a particular movement was available (i.e., within the preview, plan, or execute phase).
The baseline window was defined as the average of volumes 1 and 2 with respect to the start of the trial (which avoids contamination from the previous trial and in which response amplitude differences do not exist). For the preview phase time points, we extracted the mean of volumes 3–4, time points that correspond to the peak of the visual transient response (Fig. 3). For the execute phase time points, we extracted the average of volumes 11–12, which correspond to the peak of the transient movement response, after the subject's action (Fig. 3). Last, for the plan phase, the time points of critical interest for decoding subject's intentions, we extracted the average of volumes 7–8 (the final two volumes of the plan phase), corresponding to the sustained activity of a planning response (Fig. 3). After the extraction of each trial's percentage signal change, these values were z-scored across the run, for each voxel within an ROI.
Our reasoning for using the average of volumes 7–8 (the final two volumes of the plan phase) for pattern classification is obvious: planning is not a transient but sustained process. Whereas simple visual or motor execution responses typically show transient neural activity (Andersen et al., 1997; Andersen and Buneo, 2002), in which the hemodynamic response function peaks approximately at 6 s after the event and then falls, planning responses generally remain high for the duration of the intended movement (Curtis and D'Esposito, 2003; Curtis et al., 2004; Chapman et al., 2011). With this rationale, we figured that, if pattern differences were to arise during movement planning, they would more likely occur during the sustained planning response after the hemodynamic response had reached its peak. For these reasons, we selected the final two volumes of the plan phase to serve as our data points of interest: a critical two-volume window in which the hemodynamic response had already plateaued, any non-plan-related transient responses associated with the auditory cue would be diminishing, and most importantly, a time point before the subject initiated any movement.
For each subject and for each of the 17 plan-related ROIs, nine different binary SVM classifiers were estimated for MVPA (i.e., for each of the preview, plan, and execute phases and each pairwise comparison, GT vs GB, GT vs touch, and GB vs touch). We used “leave-eight-trials-out” cross-validation to test the accuracy of the binary SVM classifiers, meaning that eight trials from each condition (i.e., 16 trials total) were reserved for testing the classifier and the remaining trials were used to train the classifier (i.e., 40 or 46 remaining trials per condition, depending on whether the subject participated in eight or nine experimental runs, respectively). Although a full cross-validation is not feasible with a leave-eight-trials-out design because of the ∼108 possible iterations, a minimum of 1002 train/test iterations were performed for each classification.
A critical assumption underlying single-trial classification analysis is that each individual trial and condition type, beyond being randomly selected for each iteration, be equally represented for classifier training and testing across the total number of iterations. To meet this assumption, given differences in the total number of trials for each subject, the number of iterations was increased for some subjects. For instance, for subjects with eight runs, 1002 iterations were used (each trial was used exactly 167 times to train the classifier). For subjects with nine runs, a perfect solution was not achievable. For these subjects, the number of iterations was increased to 1026 to ensure that each trial was used 152 ± 1 times to train the classifier. This high number of train-and-test iterations produces a precise estimate and highly representative sample of true classification accuracies (this method showed a test–retest reliability within ±0.5%; based on multiple simulations of 1026 iterations conducted in three subjects). Given the noise inherent in single trials and the fact that each trial for training could be randomly selected from any point throughout the experiment, single-trial classification provides a highly conservative but robust measure of decoding accuracies. Moreover, much of the motivation of this present work is to determine the feasibility of predicting specific motor intentions from single fMRI trials, which could be then be applied to human movement-impaired patient populations.
Decoding accuracies were computed separately for each subject, as an average across iterations. The average across subjects for each ROI is shown in Figure 4. To assess the statistical significance of decoding accuracies, we performed one-sample t tests across subjects in each of the ROIs to test whether the decoding accuracy for each pairwise discrimination was significantly higher than 50% chance (Fig. 4, black asterisks, two-tailed tests) (Chen et al., 2011).
SVMs are designed for classifying differences between two stimuli and LibSVM (the SVMs implemented here) uses the so-called “one-against-one method” for each pairwise discrimination. Often the pairwise results are combined to produce multiclass discriminations (Hsu and Lin, 2002) (i.e., distinguish among more than two stimuli). For this particular experiment, however, looking at the individual pairwise discriminations was valuable because it could specifically determine what particular type(s) of planned movements were decoded within each brain area. For instance, a brain region showing a decoding pattern of grasps versus touches (GT vs touch and GB vs touch, but not GT vs GB movements), an interesting theoretical finding here, would be very difficult to assess with multiclass discrimination approaches.
In addition to the t test, we separately assessed statistical significance with nonparametric randomization tests (Golland and Fischl, 2003; Etzel et al., 2008; Smith and Muckli, 2010; Chen et al., 2011), which also determined that the chance distribution of decoding accuracies was approximately normal and had a mean ∼50%. For each subject, after classifier training (and testing) with the true trial identities, we also performed 100 random permutations of the test trial identities before testing the classifier. That is, to empirically test the statistical significance of our findings with the true data labels, we examined how a model trained on true data labels would perform when tested on randomized trial labels. For each of the 100 permuted groupings of test labels, we ran the same cross-validation analysis procedure 1002/1026 times (depending on the number of runs per subject). As in the standard analysis, we averaged across the 1002/1026 cross-validation iterations to generate 100 mean accuracies. We then used these 100 random mean accuracies plus the real mean accuracy (the one correct labeling) in each subject to estimate the statistical significance of our group mean accuracy for the eight subjects (see below paragraph for additional details on how exactly this was accomplished). This procedure was performed for each ROI and pairwise discrimination separately.
The data plotted in Figure 4 represent the average across each subject's mean accuracy, calculated from the cross-validation procedure. Thus, the empirical statistical significance of this “true” group mean accuracy is equal to the probability that the true group mean accuracy lies outside a population of random group mean accuracies (Chen et al., 2011). This population was generated from 1000 random group accuracies in which each sample was the average, n = 8, of randomly drawn accuracies from each subject's 101 permuted test labels. The percentile of the true group mean accuracy was then determined from its place in a rank ordering of the permuted population accuracies [thus, the peak percentile of significance (p < 0.001) is limited by the number of samples producing the randomized probability distribution]. The important result of these randomization tests is that brain areas showing significant decoding with one-sample parametric t tests (vs 50%) also show significant decoding (at p < 0.001) with empirical nonparametric tests (data not shown). Thus, the results of this nonparametric randomization test generally produced significant results with much higher significance than those found with the parametric t test (a finding also noted by Smith and Muckli, 2010; Chen et al., 2011). This may indicate that the t test group analysis (as shown in Fig. 4) is a rather conservative estimate of significant decoding accuracies. Near identical results were produced when trial identities were shuffled before classifier training (data not shown).
We also performed a within-trial test for the significance of our decoding accuracies by examining whether classification accuracies found during the plan and execute phases of the trial were significantly higher than the decoding accuracies found during the preceding preview phase. In other words, we wanted to assess whether significant pattern classifications observed in the plan and execute phases could unequivocally be attributed to movement intentions (and executions) rather than simple visual pattern differences that begin to arise during object presentation when subjects had no previous knowledge which action they would be performing (preview phase). To do this, we ran paired t tests in each ROI to determine whether the decoding accuracies discriminating between trial types during the plan and execute phase were significantly higher than the preview phase decoding accuracies occurring earlier within the same trials (Fig. 4, two-tailed tests, red asterisks). For all parametric tests, we additionally verified that the mean accuracies across subjects were in accordance with an underlying normal distribution by performing Lilliefors tests.
The voxel patterns within several of the plan-related ROIs enabled the accurate decoding of grasp versus touch comparisons (GT vs touch and GB vs touch) and, in some cases, all three comparisons (also GT vs GB) with respect to 50% chance (for the corresponding plan-related decoding accuracies, see Fig. 4). For instance, pattern classification in all of the following ROIs successfully decoded movement plans for the grasp versus touch conditions (GT vs touch and GB vs touch): L-SPOC, L-aPCu, L-midIPS, L-aIPS, L-SMA, and L-PreSMA. Given that we found overlapping and indistinguishable response amplitudes for the three different movements types in all of these areas for each of the different time phases (preview, plan, and execute) (Fig. 3), this decoding result suggests that each of these regions differentially contribute to both grasp and reach planning (instead of coding one action vs the other) but, importantly, not toward the planning of the two different grasp movements. Instead, the decoding of movement plans for precision grasps on the different sized objects (as well as differentiation of grasp vs touch actions: GT vs GB and GT vs touch and GB vs touch) were constrained to a different set of ROIs: L-pIPS, L-post aIPS, L-motor cortex, L-precentral gyrus, L-PMd, and L-PMv (Fig. 4). This pattern of results across these parieto-frontal areas suggest that regions can be functionally classified according to whether the resident preparatory signals are predictive of upcoming grasp versus reach movements or, in addition, different precision grasps (for instance, for a color coding of the ROIs depending on the types of movements they can predict, see Fig. 2). Note that decoding accuracies were based on single-trial classifications and, as such, demonstrate that the spatial voxel patterns generated during movement planning (and used for classifier training) were robust and consistent enough across the full experiment (all eight to nine experimental runs) to allow for successful prediction.
As anticipated, the three sensory control areas (L-SS cortex, L-HG, and R-HG) showed no significant decoding during planning, highlighting the fact that predictive information can be specifically localized to particular nodes of the parieto-frontal network. This is particularly intuitive in the case of somatosensory cortex: it should not be expected to decode anything until the mechanoreceptors of the hand are stimulated at movement onset (Fig. 4). Likewise, L-HG and R-HG are primary auditory structures and thus are not expected to carry sustained plan-related predictive information. Null results should always be interpreted with caution in pattern classification [because they may reflect limitations in the classification algorithms rather than the data (Pereira and Botvinick, 2011)]; nevertheless, the absence of decoding during planning in these areas is certainly consistent with expectations.
Importantly, our results also show that plan-related decoding can only be attributed to the intention to perform a specific movement, because we find no significant decoding above 50% chance in the preceding preview phase (i.e., when movement-planning information was unavailable). Moreover, when we additionally tested whether above-chance decoding during planning was also significantly higher than the within-trial decoding found during the preceding preview phase, we found this to be the case in every region (Fig. 4, red asterisks). Critically, accurate classification only reflects the spatial response pattern profiles of different planned movement types and not the overall fMRI signal amplitudes within each ROI. When we averaged the trial responses across all voxels and subjects in each ROI (as done in conventional fMRI ROI analyses), we found no significant differences for the three different hand movements, in any phase of the trial (preview, plan, or execute) (see trial time courses in Fig. 3 and an univariate analysis of signal amplitudes for the same time windows as those extracted for MVPA in Fig. 5 for confirmation of this fact). As an additional type I error control for our classification accuracies, we ran the same pattern discrimination analysis on two noncortical ROIs outside of our plan-related network in which accurate classification should not be possible: the right ventricle and outside the brain. As expected, pattern classification revealed no significant decoding in these two areas for any phase of the trial (Fig. 6).
In addition to using the spatial voxel activity patterns to predict upcoming hand movements, we performed a voxel weight analysis for each ROI (for example, see Kamitani and Tong, 2005) to directly determine whether any structured spatial relationship of voxel activity according to the action being planned could be found (data not shown). To do this, for each iteration of the cross-validation procedure (1002 or 1026 iterations, depending on the number of runs per subject), a different SVM discriminant function was refined based on the subset of trials included for training. We calculated the voxel weights for each function and then averaged across all iterations to produce a set of mean voxel weights; this procedure was repeated for each pairwise comparison, ROI, and subject (note that the weight of each voxel provides a measure of its relationship with the class label as learned by the classifier; in this case, GT, GB, or touch planned actions) (for details, see Pereira and Botvinick, 2011). Both across and within subjects for each ROI and pairwise comparison, we found little structured relationship of voxel weights according to the action being planned. For instance, no correspondence was found between the GT versus touch and GB versus touch spatial arrangement of voxel weights in each ROI, despite the two grasp actions being highly similar and the two touch actions being exactly the same. We did, however, notice that, within individual ROIs, despite the inconsistency of voxel weight patterns across subjects and across pairwise comparisons, voxels that discriminated one planned movement versus another tended to cluster. That is, voxels coding for one particular movement (reflected by the positive or negative direction of the weight) tended to lie adjacent to one another within the ROI, although these sub-ROI clusters were not necessarily consistent between comparisons. Although caution should be applied to interpreting the magnitude of the voxel weights assigned by any classifier (Pereira and Botvinick, 2011), this general result is to be expected based on the structure of the surrounding vasculature and spatial resolution of the BOLD response (Logothetis and Wandell, 2004), further reinforcing the notion that spatial voxel patterns directly reflect underlying physiological changes. Furthermore, and more generally, the findings from this voxel weight analysis are highly consistent with expectations from monkey neurophysiology. The neural organization of macaque parieto-frontal cortex is highly distributed and multiplexed, with neurons containing different sensorimotor frames of reference and separate response properties (e.g., for effector or location) residing in close anatomical proximity (Snyder et al., 1997; Andersen and Buneo, 2002; Calton et al., 2002; Andersen and Cui, 2009; Chang and Snyder, 2010). As such, combined with the fact that we are able to accurately predict upcoming hand actions from the trained pattern classifiers, the primarily unstructured arrangement of voxel weights appears to have a well-documented anatomical basis.
Additional univariate analyses
Although not shown, we also performed a univariate contrast of [(GT execute + GB execute) vs 2 * (touch execute)] to define left aIPS, a brain area frequently reported in studies from our laboratory, consistently shown to be preferentially involved in grasping actions (Culham et al., 2003; Gallivan et al., 2009; Cavina-Pratesi et al., 2010). We localized this region in six of eight subjects (t = 2.4, p < 0.05, in four subjects these clusters did not survive cluster threshold correction), allowing a direct comparison of its general anatomical location with the left aIPS regions we defined in single subjects for pattern analyses according to the contrast of plan > preview (which instead shows no univariate differences between grasp vs touch trials during the execute phase) (Fig. 5). We found a good degree of overlap between the (plan > preview)-defined left aIPS and the left aIPS defined by a contrast of [(GT execute + GB execute) vs 2 * (touch execute)], with the latter aIPS being much smaller in size. For instance, in the six subjects who showed activity in aIPS with the [(GT execute + GB execute) vs 2 * (touch execute)] contrast, we found that this area shared the following percentage of its total voxels with the larger aIPS area defined by the plan > preview contrast: subject 1 (13.7%; size of grasps vs touch defined aIPS, 66 voxels; size of plan > preview defined aIPS, 100 voxels), subject 2 (25%; size of grasps vs touch defined aIPS, 4 voxels; size of plan > preview defined aIPS, 97 voxels), subject 3 (75%; size of grasps vs touch defined aIPS, 4 voxels; size of plan > preview defined aIPS, 62 voxels), subject 4 (33.3%; size of grasps vs touch defined aIPS, 6 voxels; size of plan > preview defined aIPS, 59 voxels), subject 5 (33.3%; size of grasps vs touch defined aIPS, 3 voxels; size of plan > preview defined aIPS, 92 voxels), and subject 6 (75%; size of grasps vs touch defined aIPS, 4 voxels; size of plan > preview defined aIPS, 84 voxels). The discrepancy in the size and signal amplitude differences between these two regions can be easily explained as a difference of contrasts: specifying, a directed search for grasps > touches reveals a much smaller subset of aIPS voxels, with each individual voxel showing the specified effect. In comparison, the anatomically defined aIPS for the more general contrast of plan > preview (used for pattern analyses here) additionally selects for voxels outside the range of this smaller voxel subset, and, thus, when averaging the response amplitudes across this larger cluster size (as shown in Figs. 3, 5), we effectively diminish the influence of the contribution of each individual voxel on the overall ROI signal. It is worth mentioning that, in addition to finding small left aIPS activations in six subjects with the univariate contrast of [(GT execute + GB execute) vs 2 * (touch execute)], small clusters of voxels in three other areas (left pIPS, left motor cortex, and left PMd) were also reliably coactivated; areas revealed here to decode all three planned movements with pattern classification analyses. Apart from distinguishing univariate and multivariate approaches (for additional explanations and examples, see Mur et al., 2009; Pereira et al., 2009; Raizada and Kriegeskorte, 2010), these findings, more than anything, highlight the additional plan-related information contained in voxel spatial patterns.
For the first time, fMRI signal decoding is used to unravel predictive neural signals underlying the planning and implementation of real object-directed hand actions in humans. We show that this predictive information is not revealed in preparatory response amplitudes but in the spatial pattern profiles of voxels. This finding may explain why previous characterizations of plan-related activity in parieto-frontal networks from traditional fMRI subtraction methods have been primarily met with mixed degrees of success. From a theoretical perspective, these results provide new insights into the different roles played by various regions within the human parieto-frontal network, results that add to our previous understanding of the predictive movement information contained in parietal preparatory responses (Andersen and Buneo, 2002; Cisek and Kalaska, 2010) and advance previous notions of motor and premotor contributions to movement planning (Tanné-Gariépy et al., 2002; Filimon, 2010).
Decoding in parietal cortex
A particularly notable finding from this study is that preparatory activity along the dorsomedial circuit (L-SPOC, L-aPCu, and L-midIPS) decodes planned grasp versus touch movements. Although these areas are well known to be involved in the planning and execution of reaching movements in both humans and monkeys (Andersen and Buneo, 2002; Culham et al., 2006; Beurze et al., 2007), there has been remarkably little evidence to suggest their particular involvement during grasp planning. To our knowledge, the only evidence to date in support of this notion comes from neural recordings in monkeys showing that parieto-occipital neurons, in addition to being sensitive for reach direction, are also sensitive to grip/wrist orientation and grip type (Fattori et al., 2009, 2010). Based on our similar findings in SPOC, it now seems clear that fMRI pattern analysis in humans can provide a new tool for capturing neural representations only previously detected with invasive electrode recordings in monkeys. Moreover, our present results advance these previous findings by showing for the first time that motor plans requiring hand preshaping or precise object-directed interactions extend farther anteriorly into both the precuneus and midIPS.
The pIPS in the human and macaque monkey appears to serve a variety of visuomotor and attention-related functions: it is involved in the orienting of visual selection and attention (Szczepanski et al., 2010), encodes the 3D visual features of objects for hand actions (Sakata et al., 1998), and integrates both target and effector-specific information for movements (Beurze et al., 2009). pIPS preparatory activity in our task may primarily reflect the combined coding of all these properties given that differences in finger precision, hand orientation, and attention to 3D object shape is required across the three hand movements. Because attention is often directed toward a target location before movement, these particular findings might provide additional evidence for the integration of visuomotor and attention-related processes within common brain areas during movement planning (Moore and Fallah, 2001; Baldauf and Deubel, 2010).
Area aIPS in both the human and monkey shows selective activity for the execution of grasping movements (Murata et al., 2000; Culham et al., 2003). Here we show that both aIPS and an immediately posterior division, post-aIPS, are selective for the planning of grasp versus reach movements. Moreover, aIPS decodes between similar grasps on objects of different sizes during execution, whereas post-aIPS performs such discriminations during planning. These results are consistent with the object size tuning expected from macaque anterior intraparietal area (AIP) (Murata et al., 2000) and provide additional support for a homology between AIP and human aIPS. Importantly, the distinction here between the two human divisions of aIPS provides evidence for a gradient of grasp-related function, with an anterior division perhaps more related to somatosensory feedback (Culham, 2004) and the online control of grip force (Ehrsson et al., 2003) and a posterior division more related to visual object features (Culham, 2004; Durand et al., 2007) and object–action associations (Valyear et al., 2007). In fact, these functionally distinct regions may correspond to anatomically distinct regions defined by cytoarchitechtonics (Choi et al., 2006).
Decoding in motor and premotor cortex
Although motor cortex, traditionally speaking, is predominantly engaged near the moment of movement execution and presumed, at least compared with the higher-level cognitive processing observed in parietal and premotor cortex, to be a relatively lower-level motor output structure [i.e., given its direct connections with corticospinal neurons (Chouinard and Paus, 2006) and that much of its activity can be explained in simple muscle control terms (Todorov, 2000)], such descriptions likely only partially capture some of its complexity. For instance, microstimulation of motor cortex structures can produce a complex array of ecologically relevant movements [e.g., grasping, feeding, etc. (Graziano, 2006)], and recent evidence also suggests that its outputs reflect whether an action goal is present or not (Cattaneo et al., 2009). The fact that we can decode each particular hand movement from the preparatory responses in motor cortex several moments before action execution might additionally speak to a more prominent role in movement planning processes. Alternatively, it might reflect the fact that higher-level signals from other regions must often pass through motor cortex before going to spinal cord.
In addition to motor cortex, areas in premotor cortex have direct anatomical connections (albeit weaker) to spinal cord (Chouinard and Paus, 2006) but, importantly, are also highly interconnected with frontal, parietal, and motor cortical regions (Andersen and Cui, 2009), making them ideally situated to receive, influence, and communicate high-level cognitive movement-related information. Beyond forming a critical node in the visuomotor planning network, recent evidence proposes that different premotor areas (e.g., PMd and PMv) may have dissociable processes. For instance, experiments in both humans and monkeys appear to suggest that PMv is more involved in hand preshaping and grip-specific responses (distal components), whereas PMd is more involved in power-grip or reach-related hand movements (proximal components) (Tanné-Gariépy et al., 2002; Davare et al., 2006). These findings are consistent with the suggestion that PMv and PMd form the anterior components of dissociable parieto-frontal networks involved in visuomotor control, with the dorsolateral circuit—involving connections from pIPS to AIP and then to PMv—thought to be specialized for grasping, and the dorsomedial circuit—involving connections between V6A/aPCu to midIPS and then to PMd—thought to be specialized for reaching (for review, see Rizzolatti and Matelli, 2003; Grafton, 2010). Given that most of these previous distinctions are based on characterizations of activity evoked during the movement itself, the accurate decoding of different planned hand movements shown here provides a significant additional dimension to such descriptions. Indeed, although our finding that PMv can discriminate different upcoming movements with the hand (grasps and reaches) may be congruent with this parallel-pathway view, the same finding in PMd (more traditionally implicated in reach planning) seems essentially incompatible. There are several reasons, however, to suspect that PMd, as shown here, may also be involved in grasp-related movement planning. For instance, both PMd and PMv contain distinct hand digit representations (Dum and Strick, 2005), PMd activity is modulated during object grasping (Raos et al., 2004), by grasp-relevant object properties (Grol et al., 2007; Verhagen et al., 2008) and the grip force scaling required (Hendrix et al., 2009), and multiunit responses in PMd (as well as PMv) are highly predictive of the current reach and grasp movement (Stark and Abeles, 2007). Furthermore, previous work from our laboratory has found differences in PMd between grasping and reaching during the execution phase of the movement (Culham et al., 2003; Cavina-Pratesi et al., 2010). Our current findings with fMRI in humans add to an emerging view that simple grasp (distal) versus reach (proximal) descriptions cannot directly account for the preparatory responses in PMd and PMv and that significant coordination between the two regions is a requirement for complex object-directed behavior.
Here we have demonstrated that MVPA can decode surprisingly subtle distinctions between actions across a larger network of areas than would be expected from past human neuroimaging research. Based on nonhuman primate neurophysiology, one might expect decoding of more pronounced differences between trials (such as the effector used or the target location acted on). Here, however, effector and object location remained constant, yet we found decoding of slight differences in the planning of actions: in several areas, we were able to discriminate upcoming grasp versus reach hand movements, and, in a subset of these areas—even more surprisingly—we could additionally discriminate upcoming precision grasps on objects of subtly different sizes. These findings suggest that neural implants within several of the reported predictive regions may eventually enable the reconstruction of highly specific planned actions in movement-impaired human patient populations. A critical consideration for cognitive neural prosthetics is the optimal positioning of electrode arrays to capture the appropriate intention-related signals (Andersen et al., 2010). Here, we highlight a number of promising candidate regions that can be further explored in nonhuman primates to not only further assist their development but also expand our understanding of intention-related signals related to complex sensorimotor behaviors.
This work was supported by Canadian Institutes of Health Research Operating Grant MOP84293 (J.C.C.). We are grateful to Fraser Smith for assistance with multivoxel pattern analysis methods, Mark Daley for helpful discussions, and Fraser Smith and Cristiana Cavina-Pratesi for their valuable comments on this manuscript.
- Correspondence should be addressed to Jason P. Gallivan, Centre for Brain and Mind, Natural Sciences Centre, University of Western Ontario, London, ON N6A 5B7, Canada.