Abstract
Many daily activities rely on the ability to produce meaningful sequences of movements. Motor sequences can be learned in an effector-specific fashion (such that benefits of training are restricted to the trained hand) or an effector-independent manner (meaning that learning also facilitates performance with the untrained hand). Effector-independent knowledge can be represented in extrinsic/world-centered or in intrinsic/body-centered coordinates. Here, we used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis to determine the distribution of intrinsic and extrinsic finger sequence representations across the human neocortex. Participants practiced four sequences with one hand for 4 d, and then performed these sequences during fMRI with both left and right hand. Between hands, these sequences were equivalent in extrinsic or intrinsic space, or were unrelated. In dorsal premotor cortex (PMd), we found that sequence-specific activity patterns correlated higher for extrinsic than for unrelated pairs, providing evidence for an extrinsic sequence representation. In contrast, primary sensory and motor cortices showed effector-independent representations in intrinsic space, with considerable overlap of the two reference frames in caudal PMd. These results suggest that effector-independent representations exist not only in world-centered, but also in body-centered coordinates, and that PMd may be involved in transforming sequential knowledge between the two. Moreover, although effector-independent sequence representations were found bilaterally, they were stronger in the hemisphere contralateral to the trained hand. This indicates that intermanual transfer relies on motor memories that are laid down during training in both hemispheres, but preferentially draws upon sequential knowledge represented in the trained hemisphere.
- coordinate transformations
- intermanual transfer
- motor sequences
- multivoxel pattern analysis
- skill learning
Introduction
Many motor skills, for instance, playing a musical instrument or writing, demand the production of long sequences of movements. Behavioral evidence indicates that motor sequences are not encoded at a single level of the motor hierarchy, but rather at various stages in the translation from abstract action goals to muscle commands (Keele et al., 1995; Hikosaka et al., 1999). Take the example of learning a novel piano tune: the motor system can acquire a representation in terms of musical notes or key positions (i.e., in “extrinsic” or environmental coordinates), or in terms of the necessary muscle commands (i.e., in “intrinsic” or body-centered coordinates; Fig. 1a).
Effector-independent representations. A, Hypothetical motor hierarchy: a stimulus (e.g., notes when playing the piano, or numbers, as in this experiment) is translated into an extrinsic representation of the keys that need to be pressed and subsequently into an intrinsic representation of the muscle commands for each hand. In the traditional conceptualization (Hikosaka et al., 2002), intrinsic representations are specific to the effector (hand) used. B, Alternative architecture: even the intrinsic sequence representation is still partly shared across hands in a mirror-symmetric fashion. C, Experimental design: during scanning, participants performed four sequences with the left and right hand. Each sequence of a given hand corresponded to one sequence on the other hand in extrinsic coordinates (blue, same numbers on the screen), and to one sequence in intrinsic coordinates (red, same sequences of muscle commands). D, Across hands, there were 16 possible pairs of sequences, of which four were extrinsic, four intrinsic, and eight unrelated pairs.
The reference frame of a sequence representation determines whether and how the skill generalizes to the contralateral effector (for review, see Shea et al., 2011). Extrinsic sequence representations are, by definition, effector-independent: learning a sequence with one hand improves performance of the same sequence in extrinsic space with the other hand (Grafton et al., 2002; Kovacs et al., 2009; Boutin et al., 2012). Conversely, it has been hypothesized that sequence representations in intrinsic coordinates are effector-specific (Fig. 1a; Hikosaka et al., 2002), and that learning in this coordinate frame only benefits trained hand performance (Karni et al., 1995). There is, however, also some evidence that learning transfers to sequences that demand the mirror-symmetric pattern of muscle activity (Bapi et al., 2000; Korman et al., 2003; Panzer et al., 2009; Gruetzmacher et al., 2011). This suggests that effector-independent representations may also exist in intrinsic coordinates (Fig. 1b).
Here, we ascertained whether brain regions exhibit effector-independent sequence representations in intrinsic space, and if so, how these differ from extrinsic representations. Using multivoxel pattern analysis (MVPA), we recently demonstrated that different movement sequences, all matched for kinematic parameters, elicited classifiably different activity patterns in motor/premotor areas (Wiestler and Diedrichsen, 2013). Classification relied on a unique spatial activity pattern for each sequence, rather than on differences in temporal profiles, indicating that single voxels developed preferential tuning for certain sequences. Furthermore, the sequence-specific component of these activity patterns increased with training. We use this technical innovation to test whether motor regions contain an extrinsic or intrinsic effector-independent representation.
Participants practiced four different five-finger sequences with one hand, and subsequently performed these with either hand during fMRI. Each trained sequence had an extrinsic match on the contralateral hand (involved the same spatial keyboard positions), an intrinsic match (involved the same sequence of fingers), and was unrelated to the two remaining sequences (Fig. 1c). In regions with an extrinsic, effector-independent representation, the four extrinsic sequence pairs (Fig. 1d) should exhibit more similar activity patterns than any of the unrelated pairs. In regions with intrinsic coding, the four intrinsic pairs should evoke similar activity patterns. In regions with effector-dependent representations, any sequence pair should be equally dissimilar. This paradigm enabled us to map the cortical architecture of sequence representations across the cortex, test for the existence of effector-independent representations in a body-centered reference frame, and to investigate how training on either hand influenced these representations.
Materials and Methods
Participants.
Fourteen healthy, right-handed participants (7 male, 7 female; average age 21.86 years, SD = 2.74; average Edinburgh Handedness Inventory score 89.64, SD = 7.46) volunteered for the experiment. All procedures were approved by the University College London Research Ethics Committee. Exclusion criteria were identical to those used in a previous independent study (Waters-Metenier et al., 2014). The 14 subjects served as a sham control group in a larger study with 28 additional participants who received bihemispheric transcranial direct current stimulation (tDCS) during training. All results reported in this paper are, if not otherwise noted, based on the 14 sham participants only. However, all main findings replicate in the full set of 42 participants, and we occasionally report statistics on the full group, if the results in the sham group were of marginal significance. Comparisons between sham and tDCS groups will be reported in a separate paper.
Apparatus.
Sequences were executed on a keyboard comprised of 10 piano-style keys. These keys could not be depressed, but were equipped with force transducers (FSG-15N1A, Sensing and Control, Honeywell; dynamic range, 0–25 N), that measured the force exerted by each finger with an update rate of 5 ms. The device was engineered to be MRI-compatible by using shielded cables and inserting a low-pass filter where the cable penetrated the wall of the shielded scanner room and has been previously described in detail (Wiestler et al., 2011).
Procedure: behavioral training.
The sequence task required participants to press each finger in a predefined order, which was represented as numeric characters on a computer screen. Each trial started with the presentation of an imperative cue (for 2.7 s) that instructed each participant which sequence to execute, followed by three (or, during pre- and post-test, 4) executions of the same sequence. The display consisted of a string of five numbers within a box, which indicated, from left to right, the keys that had to be pressed. Because we wanted to distinguish a representation in intrinsic coordinates from a representation of the visual stimulus, the cue was presented in extrinsic coordinates. Specifically, “1” referred to the left-most key (left little finger or right thumb), whereas “5” referred to the right-most key (left thumb or right little finger). Thus, extrinsic, but not intrinsic, sequence pairs shared the same numbers on the screen during the cue phase. Two small (0.53 × 0.53 cm) colored boxes flanking the sequence instructed participants which hand to use; the hand on the side of the green box was required to execute the sequence while the one on the side of the red box remained still and resting on the keyboard.
Each of the three (or 4) sequence executions was triggered separately with five white asterisks, which served as the “go” signal. The objective of the task was to perform the five presses as fast as possible while keeping errors to a minimum. Fingers had to be pressed in the correct sequence with a force of at least 2.5 N, whereas all other fingers had to rest on the keyboard with a force <2.2 N. After each correct press, the corresponding asterisk in the sequence turned green, but when participants pressed an incorrect key, the corresponding asterisk turned red. Additionally, asterisks turned yellow for correct presses that exceeded the upper force limit (8.9 N). Execution time (ET) was measured as the duration between the onset of the first press and the release of the last press, and error rate was defined as the percentage of sequences that contained one or more incorrect finger presses. Throughout the behavioral training, we encouraged a constant error rate by instructing participants to speed up if error rate was lower than 20% and slow down if it was higher. For data analysis, we calculated the median ET for each run, sequence, and hand over all (correct and incorrect) trials and then averaged these results across sequences and participants. To penalize runs in which participants made a larger number of errors, we replaced the ET for incorrect trials with the maximum ET of that run and sequence, which effectively increased the median ET by an amount related to the error rate.
After each sequence execution, participants were shown brief feedback (0.8 s) as follows: one green asterisk (equivalent to 1 point) indicated that the sequence was correct; three green asterisks (= 3 points) meant that the sequence was correct and executed with ≥20% faster ET than the average in the previous run; one blue asterisk specified that the sequence was executed with 20% slower ET than the average of the previous run (= 0 points); and one red asterisk signified that one or more errors were made in the sequence (= −1 point). Participants received a financial bonus according to their final point score.
All sequences consisted of a different ordering of the same five fingers. We excluded any sequence that contained a run of more than three adjacent fingers. From the remaining candidate sequences, we selected 12 sequences of matched difficulty, based on pilot experimentation (Wiestler and Diedrichsen, 2013). These sequences were divided into three training sets that each consisted of four sequences, two “original” unrelated sequences (e.g., A and B) and their spatially mirror-reversed counter parts (A′ and B′). Thus, left hand sequence AL was identical to AR in extrinsic space (i.e., the same relative spatial positions on the keyboard) and to A′R in intrinsic space (i.e., the same fingers; Fig. 1c). Within each hand and set, all sequences were made maximally different from each other by avoiding sequences that shared any common transitions between two fingers.
The experiment started with a short practice run with four easy sequences to familiarize participants with the task. During the pretest, participants performed the full set of 12 sequences (4 to-be-trained and 8 untrained) with both left and right hands. Each hand performed two trials per sequence (with 4 executions per trial). Pre- and post-tests consisted of eight runs with 12 trials each. Within the first four runs, the order of sequences and hands was randomly permuted, and the order was reversed in the second half to counterbalance possible learning effects. Subjects were then assigned to one of two groups: one cohort that practiced with the left hand, and a cohort that practiced with the right hand. Training lasted for 4 d, during which subjects practiced the four sequences of one of the three possible sequence sets. The set assignment was counterbalanced across the two cohorts. Each training session lasted ∼an hour, during which participants performed 128 trials (with 384 sequence executions), divided into 16 runs with two trials per sequence each. The behavioral experiment ended with a separate session for the post-test, which was conducted in exactly the same way as the pretest.
Procedure: imaging.
One day after the post-test, participants underwent functional magnetic resonance imaging. Functional images were acquired using a 3T Siemens Trio MRI scanner with a 32-channel head coil. We used a 2D echo-planar sequence with a TR of 2.72 s, eight runs, 159 volumes per run, 32 interleaved slices with 2.7 mm thickness, 3 mm gap, and 2.3 × 2.3 mm2 in-plane resolution. The images were acquired in an oblique orientation, with a ∼45° tilt angle from the AC-PC line. This permitted coverage of the motor regions on the dorsal surface of the cerebral cortex, as well as the superior part of the cerebellum. The slice prescription excluded the inferior prefrontal, interior, and anterior temporal lobes. To correct for distortions due to field inhomogeneities, we also acquired a B0 field-map (Hutton et al., 2002). To reconstruct the cortical surface, we acquired an anatomical image using a 3D MPRAGE sequence with 1 mm isotropic resolution.
During fMRI, participants performed the four trained sequences with either the right or left hand; the eight untrained sequences were not imaged. Each of the eight imaging runs consisted of 24 randomly ordered trials (3 per trial type, 4 sequences × 2 hands, with 3 sequence executions per trial, yielding 72 total executions per run). Each trial consisted of a cueing phase (2.7 s = 1 TR) and three sequence executions, triggered 3.6 s seconds apart, and therefore lasted 13.5 s (5 TRs). Participants were instructed to produce the sequence with an ET of ∼1.3 s as accurately as possible. This speed was selected because it was the fastest speed that most subjects could achieve with both trained and untrained hands. Each sequence execution had to be completed within 2.8 s to allow for a 0.8 s feedback phase. No extra feedback was given for fast performance or hard presses, and “too slow” feedback was only shown when ET exceeded 1700 ms. Otherwise, cues and feedback were identical to those presented during behavioral training. Baseline BOLD activation was measured during 8 randomly interspersed rest phases of 13.5 s during which participants were instructed to fixate on a central asterisk presented on the screen and to avoid movement. To monitor for mirror activity on the nonmoving hand, participants were required to keep all 10 fingers on the keyboard and to produce a small baseline force of ∼0.5 N at all times.
Basic data analysis.
Data analysis was performed using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/) and custom-written MATLAB (MathWorks) routines. The first three TRs of each functional run were excluded to allow the functional imaging signal to approach equilibrium. The remaining 156 images were adjusted for the sequence of slice acquisition, and subsequently corrected for field inhomogeneities and head motion (Hutton et al., 2002). The data were high-pass filtered to remove slowly varying trends with a cutoff frequency of 1/128 s and coregistered to the individual anatomical scan. No smoothing or normalization to a group template was implemented during preprocessing.
The data were analyzed using a general linear model to obtain estimates of how much each voxel was activated by each of the 8 trial types (4 sequences × 2 hands) in each of the 8 runs. The regressors in the design matrix consisted of boxcar functions that assumed the value of 1 while the respective trial type was executed, and zero otherwise. Each regressor, therefore, averaged activation across the three sequence executions within each trial and across the three occurrences of each trial type within each run. The boxcar functions were then convolved with an individual estimate of the hemodynamic response function. The model of the hemodynamic function was composed of two Gamma functions: the first modeled the activation and the second the post-stimulus undershoot. Each component had a free parameter for the delay to peak, dispersion, and onset (see spm_hrf.m in SPM8). These parameters were estimated for each subject by optimizing the proportion of variance that the model could explain of the time-series of voxels in the primary motor cortex (bilaterally). These estimates were then applied to the whole brain. For HRF estimation, we treated all sequences of the left and all sequences of the right hand as one trial type; therefore, this procedure did not bias any subsequent analysis that concerned differences between sequences. The 64 estimates for the regression coefficients (8 runs × 4 sequences × 2 hands) were used in the subsequent multivariate analyses.
Classification analysis.
To test whether different sequences led to discernibly different local activity patterns, we used linear discriminant analysis (Duda et al., 2001). Classification was performed for each participant and each hand separately. The input data consisted of four (sequences) by eight (runs) activation estimates for a set of P voxels, selected by the surface-based searchlight or region of interest (ROI) approach (see below). For each run and hand, we subtracted the mean activity of each voxel averaged over the four sequences. The data from seven runs were used to estimate the mean activation vector for each sequence, and the average P × P within-sequence covariance matrix (Wiestler et al., 2011). The activation vectors from the remaining eighth run were then classified by assigning them to the class with the highest likelihood, assuming that each sequence pattern came from a multivariate normal distribution with a separate mean but identical covariance. We repeated this procedure eight times, each time leaving out a different run, thereby obtaining overall cross-validated classification accuracy. If the area showed reliable differences between activation patterns for the four sequences, then classification accuracy should be above chance (25% correct). For between-subject analyses, accuracy values were transformed to z-values assuming a binomial distribution (Pereira et al., 2009).
Above-chance classification accuracy of fMRI data are potentially attributable to behavioral confounds (Todd et al., 2013). For instance, one could observe above-chance classification accuracy if the four sequences were performed with different speeds and the BOLD signal in a region reflected the speed of movement. Therefore, we calculated the average execution time, error rate and peak force averaged over the fingers for each sequence and imaging run. We then used these values, instead of the voxelwise activation estimates, in the classification analysis and correlated those resultant accuracy values across participants with those obtained from activation data.
Pattern-component modeling.
To test for existence of effector-independent representations, we correlated activity patterns of the left and the right hand. Raw correlations, however, are highly susceptible to the level of noise and common activation patterns. We therefore decomposed activity patterns using a pattern-component modeling approach (Diedrichsen et al., 2011), allowing us to calculate the proportion of the informative activity patterns that was shared between the two hands. Specifically, the activity pattern (y, a P × 1 vector) for the i′th hand (L vs R), j′th sequence (1–4) on the k′th run (1–8), was modeled as follows: yi,j,k = handi + seqi,j + runk + noisei,j,k.
The pattern component that was shared by all sequences executed with one hand (handi) was assumed to have different variances across voxels for the left and right hands, var(handL) and var(handR), and a shared covariance covH = cov(handL, handR). The sequence-specific component (seqi,j) was assumed to have the same variance for all sequences of a given hand, but different variances across hands, yielding two estimates: var(seqL) and var(seqR). Sequence pairs in an extrinsic reference frame also shared a covariance term, i.e., cov(seq1,L, seq1,R)=covE, with a corresponding covariance term for each intrinsic pair (covI). The eight run-specific components were modeled to have the same variance and to be uncorrelated across runs. Finally, the noise component (with variance σε2) was assumed to be uncorrelated across trials and not correlated with any of the other components. The key concept in pattern component modeling is that each of the components is considered a randomly distributed variable across voxels. Thus, rather than estimating each component directly, as would be done if treating each as a fixed factor, the approach directly estimates the (co-)variances of the components across a group of voxels.
Within this framework, the raw correlation coefficients between different patterns, therefore, reflect a specific combination of variance and covariance terms. For example, the raw correlation between two unrelated sequences is given by the following:
Whereas the raw correlation between two sequences that are the same in extrinsic space is as follows:
Thus, we can see that this raw correlation not only increases with covE, but also with covH (i.e., hand-specific covariance). Furthermore, rEraw and rUraw (and also their difference), will vary with changing levels of hand-specific variances and noise (term v). However, none of these changes in raw correlation have anything to do with the amount of shared information. Hence, the size of raw correlation coefficients is difficult to interpret. Pattern component modeling allows us to calculate correlation coefficients that are not influenced by noise and the common activation components, simply by using the direct variance estimates from the model:
This correlation coefficient, therefore, reflects the degree to which sequential information was shared between sequences on both hands that matched in extrinsic coordinates (and an equivalent correlation coefficient was calculated for intrinsic pairs). Because this correlation estimate could become unstable when the denominator tended to zero, we regularized it by setting the variance estimate of each hand for this calculation to 0.5% of the noise if it fell below this limit.
Surface-based analysis.
To visualize the distribution of sequence representations across the cortical surface, we used FreeSurfer (Dale, 1999). This program permits the extraction of the white-matter gray-matter surface and pial surface from the anatomical image. After the surfaces were obtained, they were inflated to a sphere and morphed to fit to a group template based on the sulcal depth and local surface curvature information (Fischl et al., 1999). All hemispheres were then resampled onto a regular grid containing 163,842 vertices. Left and right hemispheres were morphed to the same mirror-symmetric template, allowing us to easily mirror functional maps for analyses that were combined across hands.
Multivariate analyses (both classification and pattern-component modeling) were performed using a surface-based searchlight (Oosterhof et al., 2011). For each vertex, this method defined a sphere on the cortical surface and selected all voxels between pial and white-gray surfaces. The radius of the surface was adjusted such that exactly 160 voxels were contained in each searchlight, resulting in an average searchlight radius of 11.1 mm. Multivariate analysis was conducted on the selected group of voxels, and the integrated result was assigned to the center node. By covering all possible vertices, a full surface map of information content could be constructed.
Statistical tests on the surface were conducted using an uncorrected threshold of t(13) = 3.01, p < 0.005, and family-wise error was controlled by calculating the critical size of the largest superthreshold cluster that would be expected by chance, using Gaussian field theory as implemented in the fmristat package (Worsley et al., 1996). Results were displayed using the 3D-visualization software Caret (Van Essen et al., 2001).
ROI.
We defined six anatomical regions of interest symmetrically in both hemispheres. These ROIs were identical to those used in our previous work (Wiestler and Diedrichsen, 2013). We based regions on a cyto-architectonic atlas aligned to the FreeSurfer atlas surface (Fischl et al., 2008). The hand region of primary motor cortex (M1) was defined as Brodman area (BA) 4, 2.5 cm above and below the hand knob (Yousry et al., 1997). Primary somatosensory cortex (S1) was defined by BA 2, 3, and 1, again 2.5 cm above and below the hand knob. PMd was defined as the lateral aspect of BA 6, superior to the middle frontal gyrus. The supplementary motor areas (SMA/pre-SMA) comprised the medial aspect of BA6. Finally, the posterior superior parietal area was subdivided into an ROI including all areas medial to the fundus of the intraparietal sulcus (IPS) and the regions of the occipitoparietal junction (OPJ). All regions were defined on the symmetric group template and then projected into the individual data space via the individual surface.
To analyze ROI data, we submitted the data from each of the six ROIs to a repeated-measures ANOVA with the factors “hemisphere” (left vs right) and “hand” (contralateral vs ipsilateral). All tests were Bonferroni-corrected for the number of ROIs tested (critical p = 0.05/6). Unless otherwise reported, we used all voxels present in each ROI. For the analysis of training effects, however, we restricted all multivariate analyses to the 220 most activated voxels of each ROI. For this selection, the activity was averaged across sequences and across hands. Although this procedure restricted the analysis to the most functionally involved subregion in each ROI, it was not sensitive to any difference between sequences and hence did not bias the multivariate measures on which we drew inferences.
Results
Behavioral results
Participants underwent training for 4 d with either left or right hand. During pre- and post-test, they were tested on four trained sequences, as well as eight untrained sequences. ET (Fig. 2), the time from first finger press to last finger release, was reduced for trained sequences by 1130 ms (±181 ms) from pre- to post-test, without significant changes in error rate (t(13) = −0.249, p = 0.807). For untrained sequences, participants showed a 628 ms (±159 ms) improvement. This reduction in ET was significantly smaller than that for the trained sequences, as indicated by a significant day (pretest vs posttest) × sequence type (trained vs untrained) interaction (F(1,12) = 35.142, p = 0.0001). Thus, a substantial part of the learned skill was sequence-specific.
Average execution times during pretest, 4 d of training and post-test. Participants were trained on four sequences with either the left (green) or right (purple) hand. At pre- and post-test, they were tested with both hands on the four trained (circles), as well as eight untrained (triangles) sequences. The fMRI experiment occurred after the post-test, during which participants performed only the four trained sequences with either left or right hand with matched execution times. Error bars indicate between-subject SEM.
To investigate the degree to which the acquired skill was effector-independent, participants were also tested on the untrained hand during pre- and post-test. Each untrained hand sequence was the same as one of the trained sequences in extrinsic space (same letters on the screen), and the same as a different trained sequence in intrinsic space (same sequences of muscle commands). Therefore, for behavioral analysis, our design did not allow the distinction between extrinsic and intrinsic transfer, as performance improvements could result from both. However, by also testing the untrained hand on a set of eight untrained sequences, we could assess performance benefit caused by the combination of extrinsic and intrinsic intermanual transfer.
We observed a substantial decrease in ET for the untrained sequences performed with the untrained hand. Although some of this reduction may indicate some learning of general task parameters, most of this drop can be explained by the fact that the repeated (pre- and post-) testing of the untrained hand induced learning (Waters-Metenier et al., 2014). Importantly, however, the reduction in ET on the untrained hand was larger for trained than untrained sequences; the day × sequence type interaction was significant; F(1,12) = 12.221, p = 0.0044. This effect did not interact with the factor training cohort; F(1,12) = 0.107, p = 0.7496. Thus, left and right hand-trained cohorts showed similar sequence-specific transfer to the untrained hand. In summary, our behavioral results indicate that training led to the development of both effector-dependent and effector-independent sequence representations, with the latter promoting the performance of the untrained hand.
Overlap of activation and representation for the left and right hands
To determine the locus of these effector-independent representations, we combined data from left and right hand-trained participants by averaging left and right hemispheres, therefore also combining data from the trained and untrained hand. In the last section of the Results, we consider how the side of training influenced neural sequence representations.
One may hypothesize that an effector-independent representation should be activated by both the left and the right hand. The average percentage BOLD signal compared with rest is shown in Figure 3a. In ipsilateral M1 and S1, we observed suppression of the BOLD signal below resting baseline, whereas all other regions showed clear evidence for bilateral activity. In general, however, activity was larger for movements of the contralateral hand. To test individual areas, we analyzed the data for six symmetrically defined ROIs using a hemisphere (left vs right) × hand (ipsilateral vs contralateral) repeated-measures ANOVA. For all regions, the effect of hand was significant (all F(1,13) > 13.435, p < 0.0029; Fig. 4a). Consistent with previous results, (Kim et al., 1993; Verstynen et al., 2005), we also noted that ipsilateral activity was higher during left hand movements in the left hemisphere than for right hand movements in the right hemisphere. This effect was significant for M1 (t(13) = 2.695, p = 0.018), and for S1 (t(13) = 3.527, p = 0.004), even after correcting for the number of ROIs tested.
Overlap of average activity and classification accuracy for left and right hand sequences. Data are averaged across the two training cohorts. Results are shown on an inflated representation of the human neocortex. The fundus of the central sulcus (CS), postcentral sulcus (PoCS), IPS, and superior frontal sulcus (SFS) are represented by dotted lines. A, Percentage signal change relative to baseline, averaged over all participants and sequences. Red areas indicate BOLD signal increase and blue areas designate BOLD signal reduction relative to rest. B, Average accuracy of a linear classifier to distinguish between the four sequences of the left and between four sequences of the right hand.
ROI analysis. A, Percentage signal change for left and right hand sequences and for the left and right hemispheres: ** indicates a significant hand × hemisphere interaction, p < 0.0086. B, Classification accuracy using random subsets of 160 voxels from each ROI. C, Classification accuracy for sequences as a function of the activation level during ipsilateral hand performance in M1 (solid line) and S1 (dashed line).
An effector-independent representation should not only be activated for both hands, but also should represent sequential aspects of left and right hand movements. The existence of a sequence representation can be manifested as slightly different activation patterns for different sequences (Wiestler and Diedrichsen, 2013). To test for significant pattern differences, we used a classification approach (see Materials and Methods). As can be seen in Figure 3b, sequences could be successfully classified from a set of motor-related areas, including sensory-motor cortex, premotor and supplementary motor cortex, and superior parietal cortex. These regions were significant for both left and right hands when correcting for multiple comparisons across the cortex (p < 0.05, uncorrected threshold p = 0.001, cluster threshold 141 mm2). In all six ROIs (Fig. 4b) classification accuracy was significantly better than chance (all F(1,13) > 12.807, p < 0.0034).
However, above-chance classification accuracy could also reflect subtle behavioral differences between sequences, rather than a real sequential representation. Although we endeavored to match the four sequences of each hand in terms of execution time, error rate and peak forces, we cannot exclude the possibility that the classifier picked up subtle behavioral differences between the sequences in each participant. Although the differences between the four sequences (as measured by the between-sequence SD; Table 1) were modest, we could classify sequences based on these behavioral variables alone, with classification accuracy for all variables together reaching 44% for the untrained hand (classification accuracy; Table 1). However, there was no significant correlation between the classification accuracy based on behavioral variables and the classification accuracy based on neural activity patterns, neither in our original 14 participants (r < 0.393, p > 0.164, for all 6 ROIs), nor in the full set of all 42 participants (r < 0.165, p > 0.298). Moreover, because each sequence execution was relatively fast, and because we averaged the BOLD signal changes over three repeats of the same sequence (10.88 s), it is unlikely that the classifier picked up on differences in the temporal activation profile (Wiestler and Diedrichsen, 2013). Thus, our results can indeed be taken to reveal different spatial patterns of activation (i.e., representations) in a range of motor/premotor areas for the four sequences of each hand.
Behavioral performance during fMRI acquisition
From Figure 3b, it is also apparent that sequence representations for the two hands overlap greatly. In the posterior parietal regions and SMA, no difference in the strength of representation in ipsilateral and contralateral hand was found. Thus, the sequence representation in these regions appeared to be equally strong for both hands. In contrast, M1 (F(1,13) = 16.22, p = 0.0014), S1 (F(1,13) = 21.88, p = 0.0004), and PMd (F(1,13) = 9.96, p = 0.008), exhibited higher accuracies for contralateral compared with ipsilateral hand. However, even these regions clearly encoded ipsilateral sequences (all t(13) > 4.465, p < 0.001). Thus, even primary sensory and motor cortex showed a significant ipsilateral representation.
The existence of ipsilateral representations in M1 and S1 appears to contradict the finding that these regions were, on average, only activated by movements of the contralateral hand. To test whether representations of ipsilateral sequences were restricted to the subset of voxels activated by ipsilateral movements, we split the voxels in each M1 and S1 ROI into three equal groups depending on their level of ipsilateral hand activation (Fig. 4c). We then repeated the classification analysis within each of these sets of voxels separately. An ANOVA revealed that classification accuracy was higher for activated voxels (F(2,26) = 7.740, p = 0.0023) and that this effect did not differ between regions (F(2,26) = 0.091, p = 0.91). Importantly, however, even the deactivated sets of voxels exhibited above-chance classification accuracy in S1 (t(13) = 7.168, p = 7.28 × 10−06) and M1 (t(13) = 3.423, p = 0.0045). Thus, even regions of primary sensory and motor cortices that were deactivated relative to rest showed a representation of a sequence executed with the ipsilateral hand. These findings are parallel to our observation of encoding of single finger movements (Diedrichsen et al., 2013), and indicate that ipsilateral sensory-motor cortex, below a global suppression, exhibits patterns of relative activation and deactivation that reflect the fine-grained details of the ipsilateral movement.
Coordinate frame of effector-independent representations
Our previous analysis showed sequence encoding for both hands across the hierarchy of cortical motor areas, even in primary sensory and motor regions. For these representations to be truly effector-independent, however, the region should be in a similar neural state when the same sequence is performed with the left or right hand (Gallivan et al., 2013). Otherwise, sequential representations acquired with one hand could not benefit the production of the same sequence with the other. Therefore, the activation pattern when the left hand performs sequence A should be more similar to the activation pattern when the right hand performs the same sequence, relative to when the right hand performs sequence B.
This correspondence analysis also allowed us to determine the coordinate frame of the representation. Left and right hand sequences can correspond to each other either in extrinsic or intrinsic coordinates (Fig. 1c). Therefore, if a region represented sequences for both hands in extrinsic coordinates, the patterns for the four extrinsic pairs (e.g., AL and AR; Fig. 1d) should be more correlated than for unrelated pairs (e.g., AL and BR). Conversely, if the region represented the sequence in intrinsic coordinates, the patterns for intrinsic pairs (e.g., AL and A′R) should correlate more with each other than those of unrelated pairs.
Raw correlations are, however, hard to interpret, as they are influenced by noise level (which decreases correlations), and by the amount of common activation pattern (which increases correlations, but decreases differences between correlations). Therefore, we used pattern-component modeling (Diedrichsen et al., 2011) to decompose the patterns in each region into general (hand) and sequence-specific components (seq; Fig. 5a; see Materials and Methods).
Correlation analysis reveals intrinsic and extrinsic sequence representations. A, Using a pattern component model, the variance of activation across voxels was decomposed into a component that was common to all sequences of one hand (handL/R), a component that was specific to each individual sequence (seqL/R), and a noise component. Left and right hand patterns shared a common activation (covH). The sequence-specific patterns could share a component that was common in extrinsic (covE) or intrinsic (covI) space. B, Cross-section through the motor-related ROIs, running from dorsal PMd to medial OPJ, averaged across both hemispheres. The strength of sequence-specific patterns (seqL/R) is shown for the contralateral (solid line) and ipsilateral (dashed line) hands, relative to the noise estimate. The covariance of the pattern component in extrinsic space (covE, blue) and intrinsic space (covI, red) are shown. C, Map of correlation of the sequence-specific pattern components in extrinsic space (blue) and in intrinsic space (red), thresholded at r > 0.15. The cross-section displayed in B is indicated by a white dotted line on the left hemisphere. D, Corrected correlation coefficients (see text) computed for each ROI in extrinsic (blue) and intrinsic (red) space for the left (dark blue/red) and right (light blue/red) hemisphere. Asterisks indicate correlations that are significantly larger than zero; **p < 0.0086, *p < 0.05.
Figure 5b shows the estimate of the sequence-specific components in a cross-section of the cortical surface (indicated as a white dotted line on the left hemisphere in Fig. 5c, but averaged over hemispheres), running from rostral PMd to the caudal end of the OPJ (Culham and Valyear, 2006). The sequence-specific component for the contralateral and ipsilateral sequence showed a distribution similar to what was obtained for classification accuracy maps (Fig. 3b): rostral PMd, IPS, and OPJ represented a sequence from either hand equally well, whereas caudal PMd, M1, and S1 exhibited better encoding of the contralateral hand sequence.
Importantly, we could now estimate the correlation between the two sequence-specific components; i.e., what proportion of the informative, sequence-specific pattern was shared between the two hands. Because the baseline correlation between unrelated left and right-hand sequence is captured by the general (i.e., not sequence specific) covariance covH, these terms encapsulate the increased similarity between sequences that match in extrinsic (covE) and intrinsic (covI) space. The estimated covariances were then normalized by the strength of the sequence-specific components for the left and right hand, effectively calculating a corrected correlation coefficient (see Materials and Methods).
In PMd (Fig. 5b,c), pairs of sequences that were matched in extrinsic space clearly correlated higher with each other than unrelated pairs of sequences. We assessed the statistical significance using both ROI and surface-based analysis. In the ROI analysis (Fig. 5d), PMd showed extrinsic correlations that were significantly different from zero (F(1,13) = 9.906, p = 0.0065). This was also clear in the surface-based analysis, in which the two largest significant clusters were located in left and right PMd (Table 2). Collectively, these findings demonstrate that PMd comprises an effector-independent sequence representation in extrinsic space. The other two clusters with significant extrinsic correlations were located in PMv and the mouth area of M1, and in the rostral cingulate zone, an area associated with movement preparation (Table 2; Picard and Strick, 2001). Representations in both these regions are possibly related to subvocal rehearsal of the number string.
Surface-based analysis of extrinsic and intrinsic correlations, corrected for the baseline correlation between unrelated sequences (see Materials and methods)
In contrast, we found evidence for a common representation in intrinsic coordinates in M1 and S1 (Romei et al., 2009; Orban de Xivry et al., 2011), where the intrinsic correlations were significantly larger than zero (S1: F(1,13) = 12.846, p = 0.0033; M1: F(1,13) = 14.499, p = 0.0022). In the surface-based analysis (Table 2), we found a significant cluster in right M1 and S1. A similar cluster in the left hemisphere failed to reach significance. This was likely due to lack of statistical power, as in the ROI analysis, no difference between hemispheres was found (both F(1,13) < 1.257, p > 0.283). Therefore, S1 and M1 showed similar patterns of activity during execution of mirror-symmetric sequences with either hand. Corrected intrinsic correlations were, on average, r = 0.33 (±0.09). Because these correlations were corrected for noise, they would have been close to 1, if the informative part of the patterns for the left hand and right hand sequences had been identical. Therefore, our results also imply that a substantial part of sequence-specific encoding in these areas was effector-dependent.
Before concluding that the motor system has effector-independent sequence representations in intrinsic coordinates, we needed to consider the alternative hypothesis that participants involuntarily mirrored the sequence with their passive hand during imaging. Because mirror-movements in normal individuals are usually subthreshold (Cincotta and Ziemann, 2008), we used a sensitive technique to measure the strength of mirroring (Armatas et al., 1994): we required participants to preactivate the muscles of the passive hand by exerting gentle pressure onto the keyboard with all 10 fingers at all times. The average recorded force on the passive fingers was 0.55 N, and we observed small fluctuations with a SD of 0.027 N (±0.0025 N) around this mean. Although the majority of these fluctuations was random, the force pattern on the two hands correlated with r = 0.08 (±0.015) across fingers and time points. Thus, although we could reveal a slight tendency for mirror activity at the muscle level, it was much weaker than the correlation between intrinsic pairs observed in the neocortex. Furthermore, the size of the peripheral mirror-correlation was not related to the strength of the cortical mirror correlation (r = −0.086, p = 0.585), such that even the subset of participants that did not show any evidence of mirroring exhibited clear intrinsic correlation in primary sensory and motor cortex.
In PMd, we found evidence for an overlap of coordinate systems: significantly positive correlations were observed for sequences that matched in extrinsic space, (F(1,13) = 9.906, p = 0.0065), and in intrinsic space (F(1,13) = 10.543, p = 0.0064). The coexistence of two coordinate frames (Cisek et al., 2003) in PMd was not an artifact of the within-hand correlation: within each hand, related pairs of sequences (e.g., AR and AR′) did not correlate more with each other than unrelated pairs (AR and BR; t(13) = 0.597, p = 0.561). Thus, although previous results (Cisek et al., 2003; Gallivan et al., 2013) have shown that PMd contains effector-independent representations of actions, we show here for the first time that this occurs simultaneously in two different coordinate systems.
Similarly, the parietal cortex appeared to have common coding in intrinsic, and possibly also extrinsic coordinates. Both IPS and OPJ showed a tendency for intrinsic encoding (nonsignificant if measured against the Bonferroni-corrected value of p = 0.05/6; F(1,13) = 5.759, p = 0.0321 and F(1,13) = 4.919, p = 0.045). The surface-based analysis showed a significant cluster of intrinsic correlation in the right IPS (Table 1). The larger sample that also included the 28 tDCS participants, however, confirmed intrinsic encoding in the IPS (F(1,41) = 44.626, p < 0.0001), as well as intrinsic (F(1,41) = 29.65, p < 0.0001) and extrinsic (F(1,41) = 15.075, p = 0.0004) encoding in the OPJ.
No common encoding for the two hands was found in SMA/pre-SMA for intrinsic (F(1,13) = 0.013, p = 0.911) or extrinsic encoding (F(1,13) = 0.035, p = 0.854). However, classification accuracies were substantially lower here than in the lateral premotor regions, leading to reduced power to detect sequence-specific correlations.
Influence of training side
Does the location of effector-independent representation depend on how the skill was acquired? We hypothesized that the hemisphere contralateral to the trained hand would obtain a stronger representation of the sequence, which could then be accessed by the untrained hand. To test this idea, we averaged the estimated variance (strength) of the sequence-specific pattern component over both hands and compared them between the “trained” hemisphere (contralateral to trained hand) and the “untrained” hemisphere (ipsilateral to trained hand).
We discovered that (averaged over contra- and ipsilateral hands) the left hand-trained cohort had a better sequence-related representation in right PMd, whereas the right hand-trained cohort had a better representation in left PMd (Fig. 6, top row; F(1,13) = 12.06, p = 0.0041). This contralateral bias was confirmed in the larger sample of 42 participants for M1 (F(1,41) = 9.237, p = 0.0041), and PMd (F(1,41) = 18.532, p = 0.0001). The correlation in intrinsic coordinates exhibited similar training-dependent lateralization (Fig. 6, bottom row). The intrinsic correlation tended to be larger in the trained compared with the untrained hemisphere in PMd (F(1,13) = 6.032, p = 0.0289) and in IPS (F(1,13) = 6.895, p = 0.0210). No such differences were found for the extrinsic correlation.
Sequence representation is stronger in the hemisphere contralateral to the trained hand. A, Strength of sequence-specific pattern components in the hemisphere contralateral to the trained hand (trained hemisphere) compared with the hemisphere ipsilateral to the trained hand (untrained hemisphere). Variance is expressed as a percentage of the noise component. B, Size of intrinsic correlation. All results are averaged over hands and training cohorts. Significant differences indicated by: **p < 0.0086, *p < 0.05.
In sum, the voxel activity patterns in the hemisphere contralateral to the trained hand showed, regardless of the hand that executed the sequence, higher sequence-specific variance. This suggests that the effector-independent representation was laid down preferentially in the hemisphere that was active during training, and was subsequently called upon when executing the sequence with the untrained hand.
Discussion
For any goal-directed action, the motor system translates extrinsically defined goals into muscle coordinates. When playing the piano, the extrinsic goal is defined by the sequence of notes and the corresponding spatial locations of the keys, and the intrinsic representation consists of the complex pattern of muscle activity needed to produce the desired tune. Although extrinsic representations are by definition effector-independent, the required muscle activity depends on whether the tune is played with the left or right hand. Therefore, intrinsic representations have been hypothesized to be effector-dependent (Fig. 1a; Hikosaka et al., 2002). In this view, only learned extrinsic representations could lead to performance improvements on the untrained hand.
We used MVPA to visualize extrinsic and intrinsic sequence representations in the human neocortex for the first time. We achieved this by correlating activity patterns for left and right hand sequences that were the same in either extrinsic or intrinsic references frames. We found evidence for extrinsic sequence encoding especially in PMd. This is consistent with studies of reaching movements, which have indicated that neurons in premotor cortices are tuned more clearly for the spatial direction of movement than intrinsic variables, such as joint angles or forces (Crammond and Kalaska, 1989, 1996, 2000; Cisek et al., 2003).
Although we have clearly demonstrated the existence of extrinsic representations, our experiment, unfortunately, does not reveal their exact nature. One obvious candidate for extrinsic coding is the sequence of spatial positions on the keyboard (Keele et al., 1995), or a more abstract stimulus–response code (Wise and Murray, 2000). However, patterns in PMd may have also represented the imperative cue (the string of digits). Although we presented the digits only very briefly during a short announcement phase and not during sequence execution, we also observed above-chance classification accuracy in extra-striate areas, such that this possibility cannot be fully excluded. Furthermore, PMd may have also represented the sequence in terms of a subvocal phonological code (Hartwigsen et al., 2013), although given their functional specialization this is a more likely explanation for the significant extrinsic correlations in PMv and rostral cingulate zone (Picard and Strick, 2001).
We also found widespread effector-independent activation patterns that were coded in intrinsic coordinates; i.e., activity patterns that were similar for two mirror-symmetric sequences. This widespread mirroring is surprising, as intrinsic representations are commonly thought to be effector-dependent and not shared across the two limbs (Hikosaka et al., 2002). These mirrored representations were even found in ipsilateral primary sensory and motor cortices, which exhibited reduced BOLD signal relative to rest. This is consistent with previous findings, which showed similar mirrored representations for single finger movements (Diedrichsen et al., 2013). Because we carefully monitored the forces produced by the ipsilateral hand, however, we can be relatively confident that these patterns did not rely on overt mirror activity.
PMd exhibited a gradual transition between coding in extrinsic and intrinsic coordinate frames. Although the overlap may partly reflect the limited spatial resolution of fMRI and the multivariate searchlight analysis, this finding is consistent with the observation of a mixture of intrinsic and extrinsic reference frames in premotor cortex during arm movements (Wu and Hatsopoulos, 2007). This mixture makes PMd a probable substrate for the coordinate transformation from spatial goals to joint movements.
A similar mixture of extrinsic and intrinsic codes was also observed in OPJ. Along the IPS, however, intrinsic correlations dominated; a slightly surprising result given the functional importance of these regions for movement planning and control of attention in spatial coordinates (Bisley and Goldberg, 2010). This raises the possibility that some of the intrinsic correlations were not due to coding in a muscle-centered references frame, but due to a mirror-symmetric spatial encoding of external locations.
In contrast, no evidence for shared sequence representations was found in SMA. Although activity patterns here reflected both left and right hand sequences, we did not find a significant correspondence between these patterns in either intrinsic or extrinsic coordinates. This appears to contradict findings that disruption of SMA reduces intermanual transfer (Perez et al., 2007a). The failure to find strong correlates of intermanual representations in SMA may partly be due to a power issue, as overall classification accuracy was substantially lower here than for lateral motor areas (Wiestler and Diedrichsen, 2013). This may indicate that representations in SMA are organized spatially on a finer grain than those in dorsal premotor cortex, making them less amenable to detection using fMRI.
What is the functional relevance of these effector-independent representations? One of their advantages is that motor skills learned with one hand can also be executed with the other hand. Indeed, our sequence-learning task showed a substantial amount of intermanual transfer (Korman et al., 2003; Panzer et al., 2009; Gruetzmacher et al., 2011). A skill like playing the piano would clearly benefit from transfer in extrinsic coordinates, such that the same tune can be played with either hand. Other skills, such a grating cheese or swinging a baseball bat, involve objects that are mirror symmetric, and hence would benefit from transfer in intrinsic coordinates. Our data show that the motor system has effector-independent representations in both extrinsic and intrinsic coordinates, which could support transfer in either reference frame (Dizio and Lackner, 1995; Criscimagna-Hemminger et al., 2003; Wang and Sainburg, 2004; Ahmed et al., 2008; White and Diedrichsen, 2008).
However, what can our data reveal about the mechanism through which intermanual transfer occurs? Theories of intermanual transfer can be divided into two classes (Lee et al., 2010): “bilateral activation models” propose that unilateral motor training activates the contralateral cortex, but also spreads to the ipsilateral hemisphere, and hence also causes learning in motor areas that subserve the untrained hand. Contrastingly, “bilateral access” models state that learning occurs mostly in the hemisphere contralateral to the trained hand. These representations are then called upon when the untrained, ipsilateral hand performs the task (Parlow and Dewey, 1991). Our data are consistent with both views. However, our findings also suggest that an artificial dichotomy between bilateral activation and bilateral access models may not necessarily be helpful.
It is clear from our representational analysis that during the execution of unilateral movement, activation patterns in the ipsilateral hemisphere reflect specific features of the on-going movement (Diedrichsen et al., 2013). This is even the case when the sequence is performed with the trained hand, such that these activation patterns are unlikely to reflect bilateral access. Furthermore, fMRI studies of sequence learning have shown that, during unilateral training, changes in secondary motor areas can be observed bilaterally (Hardwick et al., 2013; Wiestler and Diedrichsen, 2013). Neurophysiological measures, such as short intracortical inhibition are reduced bilaterally (Perez et al., 2007b; Camus et al., 2009), possibly indicating reduction of synaptic efficiency in GABAergic interneurons. Finally, rTMS disruption of ipsilateral M1 during or immediately after training reduces the amount of intermanual transfer (Perez et al., 2007a; Romei et al., 2009; Lee et al., 2010). These data suggest that movement-specific activation patterns in the ipsilateral hemisphere do induce some learning, which then may support the execution of the same sequence with the other hand.
Our data, however, also provides some indication that sequence representations are preferentially laid down in the hemisphere contralateral to the trained hand, which then are subsequently accessed by the untrained hand through callosal communication (Parlow and Dewey, 1991). The strength of the measured sequence representations (averaged over trained and untrained hands) was found to be stronger in the hemisphere contralateral to the trained hand. This finding was mostly driven by common representations in an intrinsic reference frame; that is, sequence-specific representations that were activated during the execution of the mirror-reversed sequence with the untrained hand.
The ubiquity of such shared representations across the motor hierarchy, both in intrinsic and extrinsic coordinates, indicates that the distinction between these two models of transfer may ultimately not be illuminating. It is possible that the use of the word “transfer” as a verb may have misled many of us to view intermanual transfer as a process distinct from unimanual sequence learning or sequence production. Under this assumption, it would then indeed be meaningful to ask whether this transfer “occurs” during encoding or during retrieval of the motor memory.
The widespread nature of effector-independent representations, as uncovered here, suggests an alternative view: rather than being conceived as an additional process, intermanual transfer should be considered an emergent property of a highly bilaterally organized motor system. In this view, transfer does not occur during encoding or retrieval; indeed, it does not occur at all. Rather, it is a natural consequence of motor areas that are in a similar activation state when the same sequence is produced with the left or right hand. Our results provide the first neural evidence that such representations not only exist on an a relatively abstract level that encodes sequences in an extrinsic reference frame (Hikosaka et al., 2002), but also in a movement-related intrinsic reference frame in primary sensory-motor areas.
Footnotes
This work was supported by grants from the Wellcome trust (094874/Z/10/Z) and James McDonnell foundation, both to J.D. and a Doctor of Philosophy studentship from the Brain Research Trust (6CHB) to S.W.-M. The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust (091593/Z/10/Z).
The authors declare no competing financial interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Dr Jörn Diedrichsen, Institute of Cognitive Neuroscience, University College London, Alexandra House, 17 Queen Square, London WC1N 3AR, UK. j.diedrichsen{at}ucl.ac.uk
This article is freely available online through the J Neurosci Author Open Choice option.