Abstract
As a sequence of movements is learned, serially ordered actions get bound together into sets to reduce computational complexity during planning and execution. Here, we investigated how actions become naturally bound over the course of learning and how this learning affects cortical representations of individual actions. Across 5 weeks of practice, neurologically healthy human subjects learned either a complex 32-item sequence of finger movements (trained group, n = 9; 3 female) or randomly ordered actions (control group, n = 9; 3 female). Over the course of practice, responses during sequence production in the trained group became temporally correlated, consistent with responses being bound together under a common command. These behavioral changes, however, did not coincide with plasticity in the multivariate representations of individual finger movements, assessed using fMRI, at any level of the cortical motor hierarchy. This suggests that the representations of individual actions remain stable, even as the execution of those same actions become bound together in the context of producing a well learned sequence.
SIGNIFICANCE STATEMENT Extended practice on motor sequences results in highly stereotyped movement patterns that bind successive movements together. This binding is critical for skilled motor performance, yet it is not currently understood how it is achieved in the brain. We examined how binding altered the patterns of activity associated with individual movements that make up the sequence. We found that fine finger control during sequence production involved correlated activity throughout multiple motor regions; however, we found no evidence for plasticity of the representations of elementary movements. This suggests that binding is associated with plasticity at a more abstract level of the motor hierarchy.
Introduction
Being able to combine simple movements into coordinated sets of actions is critical to many everyday skills, such as typing on the computer or driving a manual transmission car (Lashley, 1951). Over the course of evolution, the brain has solved this sequencing problem multiple times, resulting in many interacting algorithms that facilitate the consolidation of complex skills (for review, see Beukema and Verstynen, 2018). One of these algorithms is the process of set building, also called chunking or binding (Verwey, 1996). Binding serial actions into sets improves computational efficiency during the production of complex actions by representing multiple movements under a single selection command (Ramkumar et al., 2016).
To illustrate this process, consider the graphical model presented in Figure 1. On each trial, the manual response to a visual cue occurs through a hierarchical system of perception, selection (e.g., key), and motor planning (e.g., finger movement), which are all represented as latent states with their own independent sources of noise. In this example, the serial order of cues across trials follows a deterministic sequential order. Before training (Fig. 1A), each response is selected and planned independently of the other responses. Once the order of cues is learned (Fig. 1B), the brain can consolidate the selection process such that a set of motor plans is represented under a single selection state. This selection state is triggered by the presentation of the first stimulus in the series, after which subsequent motor commands are cued by the internal state, rather than by the visual cues. This produces faster responses to items within a set, as well as a correlation in responses within bound sets due to their shared upstream command (Fig. 1C; Verstynen et al., 2012; Acuna et al., 2014; Lynch et al., 2017).
Some forms of nonsequential motor learning rely on the reorganization of movement representations in motor networks (Nudo et al., 1996), suggesting that action binding during sequence learning could alter internal motor representations of individual movements themselves; however, this effect has been largely unexplored. By examining neural representational patterns, previous work has shown that the structure of individual fingers in primary motor cortex is organized according to their coarticulation during natural hand movements (Ejaz et al., 2015), suggesting a degree of plasticity of the cortical representations of individual digits (Merzenich et al., 1984). Indeed, artificial manipulations of pairwise finger correlations alters the distance between finger representations in primary somatosensory cortex (Kolasinski et al., 2016), although representations of individual fingers can persist in the cortex even decades after amputation (Kikkert et al., 2016), suggesting some degree of rigidity in sensory areas (Makin and Bensmaia, 2017). Thus, it remains unclear whether elementary sensory or motor representations are plastic and subject to changes over time.
If individual actions are bound under a common motor command, then the internal representations of those actions, at some level of the motor hierarchy, should change with learning. One possibility is that if two movements are executed repeatedly in a sequence, then the activation of one finger movement may preactivate the following movement. In the extreme, this model makes the prediction that two fingers that are regularly paired together will become enslaved together over time, thereby reducing behavioral flexibility (Lashley, 1951). This, however, is not typically observed. It is therefore more likely that the process of binding alters the representation of contextually cued actions in upstream regions linked to more abstract response selection (Diedrichsen and Kornysheva, 2015), which would predict observing altered representations in higher premotor areas (e.g., premotor and parietal regions). Wherever this binding process happens, the multivariate activity pattern for the two bound movements should become more similar in that region (Fig. 1D).
Here we tested this hypothesis using a combination of behavioral analysis and event-related fMRI. Binding was measured behaviorally by looking at the naturalistic emergence of correlations between successive behavioral responses after training on a unimanual 32-item sequence. Population-level representations of visually cued single finger movements in the cortex were measured using multivariate analysis of fMRI data both before and after 5 weeks of training on the complex sequence. If the simple binding hypothesis is correct, then cortical representations for individual actions that are bound should be reduced after prolonged practice at the motor sequence task.
Materials and Methods
Participants.
Eighteen right-handed participants (6 females, mean age: 26 years) were recruited locally from Carnegie Mellon University (CMU) and the University of Pittsburgh. Two authors (P.B. and T.D.V.) were included in the sample. All participants provided informed consent and were financially compensated for their time. All experimental protocols were approved by the institutional review board at CMU.
Experimental design and statistical analysis.
Participants were trained for 25 nonconsecutive days on a variant of the serial reaction time task (Nissen and Bullemer, 1987). Participants were instructed to train for at least 5 d a week, but could chose to take time off at their discretion, but no more than 2 d and not in the days leading up to the scan. All experimental procedures were performed on a laptop running Ubuntu 14.04. At the beginning of each training session, participants were instructed to place their right hand over the “h” (index finger),”j” (middle finger), “k” (ring finger), and “l” (pinky) key. Each trial consisted of a presentation of one of four unique fractal cues appearing on a black background. Each cue was uniquely mapped to one of four keys on the keyboard (Fig. 2A). The trial ended either when the participant executed a response or once a maximum response window expired, depending on which event happened first. A description of the adaptive response window is presented in the next paragraph. After a trial termination, the next cue was presented after a 250 ms intertrial interval. Each trial block consisted of 256 trials and was followed by a rest period where the mean response time (RT) and accuracy for that block was provided to the participant. On each training day, participants completed 1792 trials separated into seven trial blocks. RT was calculated as the delay between stimulus presentation and a key press. Stimulus presentation and recording was controlled with custom-written software in Python using the open source Psychopy package (Peirce, 2007). The software used for training is available on GitHub (Beukema, 2019).
Before the first session, subjects were assigned to either a trained group (n = 9; 3 female) or a control group (n = 9; 3 female). For participants in the trained group, trial blocks were separated into two types: blocks of pseudorandomly ordered cues (random; blocks 1, 2, and 6) or blocks of deterministically ordered cues following an embedded 32-element sequence (sequence; blocks 3, 4, 5, and 7). Figure 2B shows the blockwise structure for a single subject in the trained group. Trials during the random blocks were constrained such that repeated presentations of the same cue were excluded. This was done so that random trial blocks would appear more similar to the sequence trial blocks. The 32 element sequence presented on sequence blocks consisted of the following key presses: 3-4-2-3-1-4-2-1-3-4-3-4-1-3-4-2-1-2-4-2-3-1-2-1-2-4-3-1-3-1-2-4 using the following mapping: 1, index finger; 2, middle finger; 3, ring finger; and 4, little finger). Each sequence block began in a random position of the sequence. For the first two blocks, the response threshold for each trial was set to 1000 ms. To encourage faster responses, the response window of blocks 3–5 was adaptively controlled such that the response window on one trial block was the mean ±1 SD of the RTs from the previous trial block. If that value fell below 200 ms or if the accuracy on the preceding block was <75%, then the threshold was reset to 1000 ms. The threshold was removed for the final probe blocks (6 and 7) so that participants could move as quickly as they chose. For the control group, the procedure was nearly identical to the trained group, with the exception that all seven blocks consisted of pseudorandomly ordered trials; that is, there was no exposure to sequence blocks.
Analysis of training data.
Data analysis was conducted with custom Python code available on GitHub (https://github.com/CoAxLab/binding_manuscript), along with source data to generate all manuscript figures. All behavioral analysis during training focused on responses during the last two trial blocks (probe blocks) when no adaptive response window was applied: random and sequence conditions for the trained group and random and random conditions for the control group. Differences in RT and accuracy (percentage correct responses) were measured as the difference in the means between the last two blocks normalized by the SD of values in trial block 6; that is, z-scored difference in performance (Verstynen et al., 2012). In the trained group, this reflected the sequence-specific change in performance on each day. Because three subjects completed 24 of 25 d of training, average group visualizations are presented for day 24 to evaluate the same state of learning for all subjects.
Binding was measured by computing the autocorrelation of the series of RTs within each probe trial block. The first 32 trials were excluded to remove the exponential decay as it distorts the autocorrelation analysis (Verstynen et al., 2012). The linear trend was then removed by regression and the residuals were used to calculate the autocorrelation function for lags 1 through 31, following the same procedure as described previously (Verstynen et al., 2012; Lynch et al., 2017).
Positive autocorrelations could be confounded by the fact that the trained group executed faster responses than the control group. Therefore, we also examined the correlation as a function of the interpress interval (IPI) using linear regression. The IPI was computed as the time between successive key presses, and the correlation was computed as before. For every subject, we computed the slope of the linear regression line between IPI and correlation (Fig. 3D).
Because the autocorrelation function measures general associations across all sequential lags, it is not sensitive to specific associations between individual elements, and therefore cannot be used to measure binding between specific finger pairs. Therefore, we conducted a secondary analysis on the same data but examined pairwise correlations between each distinct element (1–32) in the sequence across cycles. Average correlations, ordered by sequence element, are shown in Figure 4, A and B. Binding between successive elements is reflected by increases in correlations before compared with after training.
To measure how much the correlation between finger responses matches the statistical structure of the trained sequence, we collapsed the elementwise correlation matrices by finger identity (index, middle, ring, pinky), forming 4 × 4 observed correlation matrices. To measure the similarity of the observed binding structure to the expected binding structure, we computed the mean squared error between the finger pairing frequencies of the sequence and observed correlations. This gives a normalized similarity measure for how well the pattern of correlations in the behavioral responses matches the pairwise similarities of the trained sequence.
Imaging acquisition.
Participants were scanned twice: the day before training started (pretraining) and within 2 d of training completion (posttraining). All participants were scanned at the Scientific and Brain Research Center at Carnegie Mellon University on a Siemens Verio 3 T magnet fitted with a 32-channel head coil. High-resolution T1-weighted anatomical images were collected for visualization and surface reconstruction (MPRAGE, 1 mm isotropic, 176 slices). A field map with dual echo-time images (TR: 746 ms, TE1: 5.00 ms, TE2: 7.46 ms, 66 slices, 2 mm isotropic) was acquired to correct for field map inhomogeneities. For the functional imaging sessions, we acquired 241 T2* weighted echo-planar imaging volumes (2 mm isotropic, TR: 2000 ms, TE: 30.3 ms, MB factor: 3, 66 slices, A ≫ P, FoV: 192 mm, interleaved ascending order, flip angle: 79°, matrix size: 96 × 96 × 66, slice thickness: 2.00 mm). For the finger-mapping task, we collected a total of six runs, resulting in 1446 volumes. Functional images were oriented to maximize coverage of the entire cortex and cerebellum. All imaging data are openly available at OpenNeuro: https://openneuro.org/datasets/ds001233/versions/00003.
Neuroimaging tasks.
We collected a set of finger-mapping runs to estimate the activation patterns evoked by performing each distinct cue–response pair in isolation (i.e., not embedded within a sequence). Before the first scan, subject learned the mapping of cue to effector. The same stimuli from the behavioral experiments were projected on an MR-compatible LCD screen mounted at the rear of the scanner. Participants could see this screen through a mirror mounted on the head coil. Responses were recorded on a five-key MR-compatible response glove (PST) placed under the right hand. Each effector (e.g., individual cue–response pairing) was presented in isolation on each trial with no structured order between trials. Thus, the paradigm only measured responses to individual cued movements, not the sequence itself. Each trial type was repeated 12 times per run, totaling 72 trials per session. Subjects were instructed to press the cued key several times after stimulus presentation until the cue disappeared from the screen (1 s). The intertrial interval was sampled according to an exponential distribution ranging from 6 to 18 s. Between runs, subjects were given the option to take several minutes of rest.
Imaging analysis.
Functional imaging data were analyzed using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/) and custom MATLAB and Python functions. Raw functional EPI images were realigned to the first volume. No slice time correction was applied due to the fast TR. These realigned images were then corrected for field distortions using the field maps. All analyses were performed in native functional space. Structural T1 images were used to reconstruct the pial and white surfaces using Freesurfer (Fischl, 2012). All custom code is publicly available (Beukema, 2019).
All analyses of task-related responses were performed using a region of interest (ROI) approach. Anatomical ROIs were defined separately for each subject using the surface-based Brodmann areas extracted from Freesurfer (Fischl et al., 2008) following similar conventions as described previously (Wiestler and Diedrichsen, 2013). The hand voxels of the primary motor cortex (M1) were defined as the surface nodes with the highest probability of belonging to Brodmann area (BA) 4, 1 cm above and below the hand knob (Yousry et al., 1997). Primary somatosensory cortex (S1) was defined as the nodes in BA1 BA2, BA3a, or BA3b 1 cm above and below the hand knob. Premotor cortex was defined as the nodes belonging to BA6 medial (PMv) or lateral (PMd) to the medial frontal gyrus. Supplementary motor area (SMA) was defined as the voxels in BA6 along the medial wall. The Freesurfer atlas was used to define the superior parietal gyrus, as well as the putamen and caudate as these regions are not defined by Brodmann area. As a control ROI, we extracted the voxels belonging to primary auditory cortex as this region would not be expected to exhibit any significant decoding of the visually cued finger patterns. Each surface based ROI was projected back into native functional space.
Analysis for effector representations was performed using representational similarity analysis (RSA) (Kriegeskorte et al., 2008) using the “crossnobis” estimator (Nili et al., 2014; Walther et al., 2016). A GLM with regressors for each effector was fit for each mapping run, along with the six head motion regressors (x, y, z, pitch, yaw, roll). Omissions and incorrect key presses were regressed out of the model. Raw time series were orthogonalized by eigenvector decomposition and projected into the principal component space to minimize model bias in the decoding. To estimate the differences between finger patterns, we used a cross-validated estimate of the Mahalanobis distance between activity patterns for each effector (Diedrichsen et al., 2016). The crossnobis distance has the advantage over other distance measures in that it is unbiased because noise is orthogonalized across runs, resulting in an expected distance of 0 if a voxel or region does not reliably distinguish two finger patterns (Ejaz et al., 2015). The estimated distance (d̂i,j) between the patterns (u) of two fingers (i, j) was averaged across every pair (m, l) of runs (M), resulting in (6 choose 2) = 15 folds using the following equation: Unlike correlation distances, Mahalanobis distances can exceed the value of 1. Furthermore the cross-validated nature of the crossnobis estimate also allows d to become negative. The pairwise distances between each of the fingers are summarized in a representational dissimilarity matrix. To test for encoding and plasticity within each voxel or ROI, we extracted the average distance between each pair of fingers pattern (K = 4) using the following equation: To examine the extent of finger representations across all of cortex, we conducted a surface-based searchlight (Oosterhof et al., 2011), assigning every surface node an H value based on the local (p = 160) patterns surrounding an ∼10 mm radius. Values for the number of voxels (p) and radius were chosen based on previous studies (Yokoi and Diedrichsen, 2017). This searchlight approach enabled us to examine the entire H distribution across all voxels in each of the ROIs to confirm that each region reliably discriminated individual effectors. Due to the observed positive skew, we extracted the median H for all regions across all subjects and conducted a one-sample t test against 0 to establish whether a region reliably decoded the single finger movement representations.
Changes in representational distances were estimated by calculating the difference in H values, for each ROI, between the posttraining and pretraining imaging sessions (i.e., Hpost − Hpre). For each ROI, we calculated both pretraining and posttraining H values using the responses from all voxels in the region mask. To estimate group-level training effects, the average difference in H from these voxels was calculated for each subject and each ROI. The change in H values was determined by looking for consistent patterns across subjects, within each ROI. Along with the group level effects, we also calculated the significance of changes in H at the single-subject level.
In addition to the standard null hypothesis tests, a repeated-measures ANOVA was used to examine the influence of training on distances in each ROI. Bayesian repeated-measures ANOVA with a JZS prior over all models was used to determine the inclusion Bayes factor (BF) to measure the extent to which the data supported inclusion of the interaction effect (JASP Team, 2017). The guidelines in Kass and Raftery (1995) were used to interpret the weight of the evidence in support of the null hypothesis.
Results
Learning-related changes in behavior
To assess how training affected performance, we compared the evolution of RTs and accuracy across days for the trained and control groups. Figure 2B illustrates all trialwise responses during a single day for a subject in the trained group. Although responses during random trial blocks (black dots) remained relatively constant, the RTs during sequence trial blocks (green dots) get steadily faster with training. The last two trial blocks were used to probe learning across time. On average, both the control (dashed line, Fig. 2C) and trained subjects (dashed line, Fig. 2D) exhibited a general improvement in response speeds during the final random trial block (block 6). This general across-session speeding of responses during a trial block with random sequences likely reflects the improved learning of the cue–response mapping across days. During the final sequence block (block 7), however, sequence-specific responses in the trained group also decreased rapidly across training days. Repeated-measures ANOVA indicated a significant block × time effect: F(23,368) = 15.37, p = 7.93 × 10−41, with average RTs dropping just below 200 ms at the end of training (solid line, Fig. 2D). As expected, this effect was not observed in the control group, F(23,368) = 0.77, p = 0.76, where the final trial block did not contain an embedded sequence (solid line, Fig. 2C). To capture sequence-specific changes in response speed, we normalized the mean RT for the final trial block (sequence in trained group, random in control group) by the mean and variance of RTs during trial block 6 (random in both groups; see Materials and Methods). This analysis depicts a steady improvement in sequence-specific RTs across the 5 weeks for the trained group, with sequence block responses ∼4 SDs faster than the random trial blocks at the end of training (Fig. 2E). Repeated-measures ANOVA indicated a significant group by time effect, F(23,368) = 12.79, p = 1.67 × 10−34. Unlike response speed, average accuracy during the final trial block gradually rose at a steady rate for both groups, saturating at ∼90% for the trained group and 85% for the control group, with no significant between group differences, F(1,368) = 0.36, p = 0.99 (Fig. 2F).
There are several ways that responses could get faster during the sequence blocks (Beukema and Verstynen, 2018). The binding hypothesis (Fig. 1B), however, makes the specific prediction that serially successive actions that are bound under a shared motor plan should exhibit a correlation in their responses over time, as a consequence of arising from a common, high-level motor plan (Fig. 1C). For an index of binding, we used the autocorrelation of RTs during the last trial block for both groups (Verstynen et al., 2012). Figure 3 shows the autocorrelation functions for early (day 1), middle (day 12), and late (day 24) stages of practice for the control (Fig. 3A) and trained (Fig. 3B) groups separately. Although participants in the control group did not show reliable autocorrelation structure in RTs with training, we did see evidence of an emergent structure in the trained group. Specifically, participants in the trained group showed no evidence of an autocorrelation in their RTs at day 1; however, by the middle of training, a pronounced autocorrelation of temporally adjacent responses emerged. This correlation increased throughout the training period, tapering off at approximately the middle of training (day 12) (Fig. 3B, inset).
To exclude the possibility that the observed increases in RT autocorrelations are simply the result of executing faster responses, we also examined the correlations in consecutive intertrial RTs as a function of the IPI. If the increased correlation in temporally adjacent RTs was simply the result of faster responses, then a negative relationship should exist between the observed autocorrelation and the IPI, with higher correlations for faster responses and little or no correlation for slower responses. A representative example of the relationship between the IPI and the RT correlation is shown in Figure 3C and reveals no clear association. Across all subjects, the slope of the regression line between the two variables was not significantly different from zero (Fig. 3D). This result suggests that the observed increases in correlation are due to executing responses under a shard motor command and are not the result of speed increases alone.
We next set out to examine the structure of the associations across movements by examining the pairwise correlations between items in the sequence. For this analysis, we organized the data into a matrix of 32 responses by cycles. We then looked at the correlations between different sequence elements across cycles of sequence production. Before practice, this 32 × 32 correlation matrix does not show much structure, with all items approximately equally correlated (Fig. 4A). After training, a clear structure in the correlations emerged, with local clusters of correlated responses found along the diagonal of the matrix (Fig. 4B).
If these clusters of correlated responses in the sequence reflected the interfinger transition frequency (Fig. 4C), then the pairing frequency of individual fingers should determine the degree of similarity between finger responses. Thus, we repeated our interitem correlation analysis except, rather than mapping response to each item in the sequence, we mapped it to the finger that executed the response. This was done by creating a new matrix of single-trial RTs, with each column representing a finger and each row representing a cycle through the sequence, and then calculating the 4 × 4 correlation matrix of interfinger responses. The similarity between the observed correlations and expected correlations based on the pairwise frequencies (Fig. 4D) was computed using the mean squared error (MSE). The mean observed correlation matrix across all subjects on the final day of training is shown in Figure 4E. There was increased similarity between the observed and expected correlations across days (Fig. 4F) in the trained group (F(23,184) = 0.0026), but the structure in the control group remained unchanged (F(23,184) = 0.41), resulting in a significant group-by-time interaction (F(368,23) = 1.90, p = 0.0079). These results indicate that binding occurs in a principled way that originates at least in part in the statistical structure of the sequence.
Stable motor representations after training
To directly measure multivariate cortical representations of the individual cued movements, we used a rapid-event-related fMRI design consisting of presentations of each cued finger press followed by a period of fixation (Fig. 5A). An ROI analysis was performed on the cortical motor network including the M1, S1, PMd, PMv; SMA, and superior parietal lobule (SPL). These regions were anatomically localized using Brodmann areas extracted from Freesurfer (see Materials and Methods). These regions are shown on the group average surface (Fig. 5C). In each of the cortical motor ROIs, we quantified the activity pattern related to each cued finger movement and then calculated a cross-validated Mahalanobis (crossnobis) distance between the activity patterns for each cued finger pair (Fig. 5B). If two cued fingers generate the same cortical activity patterns, then the corresponding distance between them will be 0. However, if two finger movements consistently generate dissimilar finger patterns, then the corresponding distance will be positive. Cross-validation allows us to test the value of the distance estimates directly against zero (Walther et al., 2016; Diedrichsen et al., 2016; Diedrichsen and Kriegeskorte, 2017). The distances between every possible pair of fingers is summarized in a representational dissimilarity matrix (RDM) for each ROI (Fig. 5D).
Although the magnitude of the representational distances is slightly smaller than distances reported in previous studies (Ejaz et al., 2015), likely due to the use of an event-related design in our study, the relative representational patterns that we observed in primary motor and primary somatosensory cortex qualitatively matches previous reports. Specifically, the index finger is furthest from the little finger, whereas the middle and ring fingers are close together. This pattern of representational distances is also similar to what is observed in the other cortical motor regions, although the overall between effector distances are smaller in these premotor regions (Fig. 5D). To confirm that each region has reliably different representations for the fingers, we computed the average cross-validated pairwise distance between all finger movements (Fig. 5B; Materials and Methods). Average distance (H) >0 indicates above-chance encoding (Diedrichsen and Kriegeskorte, 2017). To estimate the reliability of this encoding across subjects, we extracted the median distance across voxels within each searchlight for each subject and ROI. The median was chosen to account for the fact that the distribution of H values within a region is highly skewed. A one-sample t test on those median values (one median per subject), after adjusting for multiple comparisons using a Bonferonni correction, found significant separation of cued finger representations (i.e., positive average distances) in the cortical sensorimotor areas, but not the A1 control region nor the putamen (Table 1). A follow-up paired-samples t test (within subject) showed that H was greater in M1, S1, PMd, PMv, and SPL, but not in SMA, compared with A1 (Table 1).
Along with the cortical regions, we also examined the distances between finger representations within the caudate and the putamen (Fig. 4E, inset). Overall, the distances within the striatum were significantly separable within the caudate but not the putamen. However, the magnitude of the representational distances was very weak in these subcortical regions, with distances several orders of magnitude smaller than in any cortical regions.
Overall, the analysis of cortical representations of individual fingers is consistent with previous studies (Ejaz et al., 2015), confirming that the patterns of activity in the motor network can reliably discriminate individual effectors. This effect is substantially weaker in subcortical regions, likely having to do with the lower signal-to-noise ratio of the BOLD signal in the striatum and other regions of the basal ganglia. Therefore, these ROIs were excluded from further analysis.
To determine whether the emergence of binding in the behavioral responses coincides with alterations of these representational distances of individual cued actions, we measured how average distances changed for each cortical motor ROI before and after training. The simple form of the binding hypothesis is that the representations of frequently paired actions will become more similar (Fig. 1D) after training, predicting that the distances between frequently paired movements will decrease after practice only in the trained group. When looking at all pairwise distances (Fig. 6A), we were unable to find a reliable influence of sequence training on the average pattern distances in any cortical motor region. In most areas, the distances decreased only marginally for both trained and control groups together, but the finger patterns remained largely separable, with patterns exhibiting a high degree of stability. Across all regions, we failed to detect a reliable interaction between group and time that would be indicative of a training effect in representational distances (all p > 0.26; full statistics are reported in Table 2). To evaluate the evidence in support of the null hypothesis that the interaction is not present, we conducted a JZS BF ANOVA with uniform prior across all models and found evidence in support of the null model that training does not influence distances. The BFs ranged from 0.099 to 0.658 (Table 2), which can be considered positive anecdotal evidence in support of the null hypothesis (Kass and Raftery, 1995).
Of course, looking at changes in overall representational distances may not be sensitive enough to pick up changes in the representational distances of only a few finger pairs. The simple plasticity model that we proposed in the introduction predicts that the greatest plasticity should be observed in the finger pairs most often executed together in the sequence. If the distances decreased for the more frequently paired effectors but increased for the less frequently paired effectors, then this may result in a net change for the overall average distance near 0. To explore this possibility, we reanalyzed the distance changes by looking at the frequently and infrequently occurring finger pairs in the sequence structure itself (Fig. 4C). Based on the pairing frequencies, we identified four frequently used finger pairs (index-middle, index-ring, middle-little, ring-little) and two infrequently used pairs (Fig. 6C) (middle-ring and index-little). Qualitatively, the pattern of distances for each pair type appeared to match what was observed in the overall distance patterns, with higher distances in M1 and S1 and lower distances in the premotor and parietal regions. Thus, much like the overall distance patterns, we were unable to resolve focal changes in representational distances in either of the most frequently (Fig. 6D) or infrequently (Fig. 6E) paired effectors. Across all regions, two-way repeated-measures ANOVA indicated no significant group-by-time interaction for either frequently paired (all p > 0.26; full statistics are provided in Table 3) or infrequently paired fingers (all p > 0.13; full statistics are provided in Table 4). The Bayesian ANOVA revealed anecdotal evidence in favor of the null hypothesis for both the frequently (BFs: 0.108–0.631, Table 3) and infrequently (BFs: 0.108–0.391, Table 4) paired fingers.
Discussion
Here, we investigated whether the binding of serial actions during long-term sequence learning alters the cortical representations of individual cue–response pairings. We found that during sequence production, temporally adjacent responses develop a high degree of correlation in their response speeds, consistent with participants binding multiple responses together under a unified command to reduce computational complexity (see also Verstynen et al., 2012; Ramkumar et al., 2016; Lynch et al., 2017). Using a multivariate pattern analysis approach based on the cross-validated Mahalanobis estimator, we also replicated previous studies showing that cortical motor areas reliably distinguish between activation patterns of individually cued finger responses (Ejaz et al., 2015). We were, however, unable to find evidence for learning-related changes in this representational structure of cued finger responses in any of the cortical regions tested. Together, these findings show that the process of binding actions into chunked sets during long-term skill learning does not affect the representation of individual cued actions, suggesting that binding relies on changing more complex levels of representation beyond individual movements.
At first glance, the absence of plasticity in population-level representations of individual actions that we observed appears to be incompatible with previous reports of plasticity in sensorimotor cortex. Kolasinski et al. (2016) found that the representational distances of individual fingers shifted in S1 after physically yoking two fingers together for a period of 24 h. In their study, the sensory representations of the two yoked fingers remained spatially and temporally identical, however the unyoked fingers altered their distances, suggesting a possible compensatory effect in the sensory representations themselves. In contrast to this observation, other studies have shown that finger representations in S1 are still robust and distinct even decades after amputation (Kikkert et al., 2016), suggesting that the sensory representations of digits have some a degree of robustness. In contrast to these sensory representation studies, our task here relied on training associations between temporally independent movements in a specific context. It is possible that, had we trained on chord-like movements, in which multiple fingers are simultaneously engaged (Verstynen et al., 2005), for a longer period of time, we might have observed similar changes in cortical sensorimotor representations, a hypothesis that is left open to future studies.
Alternatively, there is a strong rationale for why single effector representations would remain stable in cortical sensorimotor networks, particularly motor execution areas such as M1, after long-term sequence learning. First, binding responses at the execution level may be a maladaptive strategy for maintaining a flexible movement repertoire (Lashley, 1951). For example, if index finger movements were consistently bound with middle finger movements because a single daily task required them to work together in sequential fashion, then they might exhibit a prepotent response in inappropriate contexts. To maximize flexibility, it would be beneficial for the movements to be bound at a more abstract motor planning stage upstream from execution processes. Second, practice may involve refining the control of execution-level representations without necessarily affecting the representations themselves. This would suggest that the process of binding during the consolidation of complex movement sequences is dependent on plasticity mechanisms at a hierarchically higher level of processing (Wong et al., 2015).
Of course, it is possible that there is plasticity in the representations of individual sensorimotor effectors during long-term sequence learning, but limitations in our experimental design may preclude identifying those changes. First, whereas the duration of training that we used was longer than many classic sequence learning experiments in humans, 5 weeks may still not be enough time to lead to measurable representational changes in primary motor cortex. This concern is tempered by the fact that we were able to show strong evidence of action binding in the behavioral responses. A second methodological limitation is the lack of power to observe what is likely a relatively modest effect size. Previous studies of sensory representational plasticity provide a reasonable measure of the true effect size, suggesting that we are reasonably powered (Kolasinski et al., 2016). Although it is true that the number of samples was comparatively low for a typical univariate functional imaging study (at nine participants per group), several design choices alleviate this concern. We collected a substantial amount of data per subject. Each subject was scanned for ∼2 h before training and 2 h after training, with 6 identical and independent imaging sessions per run. This relatively large volume of data per subject enabled us to obtain robust estimates of the population patterns of interest. Thus, although the number of subjects was modest, we do not believe that our results are simply the result of insufficient power.
Despite these limitations, our experiment clearly shows that 5 weeks of training on a complex unimanual sequence task does not alter the sensorimotor representations of individual effectors despite clear evidence of binding in the motoric actions. This suggests that execution-level representations remain stable during learning and that proficiency is likely controlled by a higher level within the motor hierarchy.
Footnotes
This work was supported by the Pennsylvania Department of Health (Formula Award SAP4100062201) and the National Science Foundation (CAREER Award 1351748). P.B. received support from the Multimodal Neuroimaging Training Program (National Institutes of Health Grant T90 DA022761). We thank Kyle Dunovan and Kevin Jarbo for helpful comments on this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Timothy D. Verstynen at timothyv{at}andrew.cmu.edu