Abstract
As one learns to dance or play tennis, the desired somatosensory state is typically unknown. Trial and error is important as motor behavior is shaped by successful and unsuccessful movements. As an experimental model, we designed a task in which human participants make reaching movements to a hidden target and receive positive reinforcement when successful. We identified somatic and reinforcement-based sources of plasticity on the basis of changes in functional connectivity using resting-state fMRI before and after learning. The neuroimaging data revealed reinforcement-related changes in both motor and somatosensory brain areas in which a strengthening of connectivity was related to the amount of positive reinforcement during learning. Areas of prefrontal cortex were similarly altered in relation to reinforcement, with connectivity between sensorimotor areas of putamen and the reward-related ventromedial prefrontal cortex strengthened in relation to the amount of successful feedback received. In other analyses, we assessed connectivity related to changes in movement direction between trials, a type of variability that presumably reflects exploratory strategies during learning. We found that connectivity in a network linking motor and somatosensory cortices increased with trial-to-trial changes in direction. Connectivity varied as well with the change in movement direction following incorrect movements. Here the changes were observed in a somatic memory and decision making network involving ventrolateral prefrontal cortex and second somatosensory cortex. Our results point to the idea that the initial stages of motor learning are not wholly motor but rather involve plasticity in somatic and prefrontal networks related both to reward and exploration.
SIGNIFICANCE STATEMENT In the initial stages of motor learning, the placement of the limbs is learned primarily through trial and error. In an experimental analog, participants make reaching movements to a hidden target and receive positive feedback when successful. We identified sources of plasticity based on changes in functional connectivity using resting-state fMRI. The main finding is that there is a strengthening of connectivity between reward-related prefrontal areas and sensorimotor areas in the basal ganglia and frontal cortex. There is also a strengthening of connectivity related to movement exploration in sensorimotor circuits involved in somatic memory and decision making. The results indicate that initial stages of motor learning depend on plasticity in somatic and prefrontal networks related to reward and exploration.
Introduction
One of the challenges a beginner faces in learning a golf swing or a tennis serve is that the desired somatosensory state is initially unknown. Of necessity, learning proceeds largely by trial and error and involves a process in which the acquisition of motor commands and the development of somatic targets occur in parallel. The functional brain networks that subserve this stage of learning are largely unknown and constitute the focus of the present investigation. Much of the current literature on motor learning focuses on adaptation paradigms, in which some form of perturbation impairs movement to well-learned sensory targets. A prominent feature in adaptation is the introduction of systematic error followed by a progressive reduction of this error through adjustments to motor commands. Accordingly, neuroimaging studies investigating motor adaptation have highlighted the role of areas, such as cerebellum, as a key node for error correction (Diedrichsen et al., 2005), and of the posterior parietal cortex, involved in the sensorimotor transformations that are necessary during adaptation to replan spatially guided movements (Bernier and Grafton, 2010).
The role that these previously identified networks may play in situations in which sensorimotor targets have to be acquired in the first place is unknown. In such a situation, it is possible to hypothesize that a different set of brain areas will show changes in conjunction with motor learning. First, recent behavioral work points to the importance of the somatosensory system for this kind of task. For example, using an experimental manipulation similar to the one used here, Bernardi et al. (2015) showed that somatosensory experience delivered through passive movements generated learning comparable with that seen in participants trained with active movements. Second, the process of skill acquisition can be memory-dependent in the sense that one must be able to repeat the correct or successful actions and avoid previously incorrect movements. Accordingly, one might expect the recruitment of somatic memory and decision-making circuits in this sort of learning (Romo et al., 1999, 2002), and more generally the prefrontal cortex (Miller and Cohen, 2001). Finally, the involvement of reinforcement-related brain networks would be expected, as positive feedback may effectively shape learning and compensate for the lack of detailed error information in the early stages of learning. Previous behavioral studies have shown the contribution of reinforcement to motor learning in tasks, such as those involving arm reaching (Izawa and Shadmehr, 2011; Shmuelof et al., 2012; Manley et al., 2014), saccadic eye movement (Takikawa et al., 2002; Madelain et al., 2011), and precision gripping (Dayan et al., 2014). Brain networks that support reinforcement and reward-based learning in general have been studied and comprise, among others, the ventromedial prefrontal cortex (vmPFC) and striatum (Schultz et al., 2000; Berns et al., 2001; O'Doherty et al., 2004; Haruno and Kawato, 2006; Bischoff-Grethe et al., 2009).
In the present study, we examined changes in functional connectivity (FC) in resting-state brain networks that occurred following movements to a small unseen target. When the movement landed within the target zone, positive feedback was provided to indicate success. This task was designed as an analog to the early stages of learning a novel motor skill, for which reinforcement-based selection of the sensory targets is central, rather than error-based adjustments of the motor commands. We found that training resulted in improvements in movement that were accompanied by changes in FC (ΔFC) in both reinforcement-related networks and those related to memory and decision making. The results point to the idea that reward-related prefrontal regions contribute to the early stages of learning in sensorimotor circuits. Somatic memory and decision making networks support movement variability and presumably exploration.
Materials and Methods
Experimental setup.
A total of 22 right-handed participants were recruited (14 females, mean ± SD age, 22.5 ± 3.19 years) and provided written consent. All procedures were approved by the McGill University Institutional Review Board. The participants were healthy adults with no prior physical or neurological conditions. The experimental session for each participant was completed within the same day.
The behavioral paradigm in this study was based on that used by Bernardi et al. (2015). Briefly, we used a two degree-of-freedom robotic manipulandum (Interactive Motion Technologies), with a vertical handle attached to the end-effector. The handle position was provided by a set of 16-bit optical encoders (Gurley Precision Instruments). Participants were seated in front of the robot with their right shoulder abducted to ∼70° and the elbow supported by an air sled. A semisilvered mirror, which served as a display screen, was placed just below eye level and blocked the vision of the arm and the robot handle. A green circle, 20 mm in diameter, was positioned on the display screen along the subject's body midline and was used as the start position of each movement. To the left, a 1-cm-thick target stripe, tilted at 45° with respect to the horizontal, extended the entire width of the display screen (Fig. 1A). Within this bar, there was an unseen rectangular target zone, the center of which was located 15 cm from the center of the start circle. A thin yellow line served as a visual cue to indicate the distance of the hand from the target stripe. A small 12-mm-diameter yellow circle attached to the yellow line corresponded to hand position. This circle was shown briefly at the beginning of each movement and disappeared as soon as the subject left the start position. No information about the lateral displacement of the hand was provided during movement or at movement end.
Experimental blocks.
Participants were first given a set of instructions about the experiment followed by 15 familiarization trials. They were told to perform outward reaching movements, 45° to the left of the midline, until they reached the target stripe. They were told that the trajectory had to be as straight as possible with no corrective movements throughout. Each trial had to be completed within 800 ms, and participants received feedback about their speed by means of a target color change (red, green, and blue corresponding to too fast, correct speed, and too slow, respectively). However, there was no penalty if the movement did not end on time. Once the movement ended, the robot would bring the arm back to the start position.
The experiment began with a block of 15 baseline trials in which participants performed reaching movements toward the target stripe. No feedback was provided as to whether the movements were accurate. Following this, they proceeded to the Brain Imaging Centre at the MNI for a first scanning session. This neuroimaging session comprised two resting-state scans with eyes closed, followed by a gradient field map acquisition and a T1-weighted scan. A more detailed description of the functional imaging procedures is presented below.
After the initial scanning session, participants returned to the laboratory and completed four training blocks of 50 trials each. They were told that this was the opportunity to learn which movement to the unseen target was successful. A particular movement was defined as successful when the trial ended within the hidden target zone. The success was determined based on the lateral dimension of the movement endpoint, not the movement speed. Following a successful trial, an animated explosion and the words “Nice shot!” appeared on the screen to provide positive feedback. The participant was told to pay attention to the experience of moving to the target correctly and to collect as much positive feedback as possible. The feedback was binary; that is, no information about error magnitude or direction was given for movements that ended outside the target zone.
To facilitate learning progressively, the width of the target zone (W) was changed over the course of training, keeping the center position fixed (Fig. 1B). We adopted this progressive level of difficulty as a form of behavioral shaping (Skinner, 1965; Darshan et al., 2014). In the first training block, the width of the target zone was calculated as the lateral range within which 50% of the baseline movements ended. In the second block, the width was set to half the distance between the first and the last target width. A final target width of 8 mm was used for the remaining two training blocks and was the same for all participants. A short break was given between successive training blocks.
After the completion of the final training block, the participants were brought to the imaging center for a second series of fMRI scans. The scans consisted of two resting-state scans, a gradient field map acquisition, a T1-weighted structural scan, and a task-based movement localizer that will be described below. Following these scans, the participant again returned to the laboratory to perform 15 movements without any feedback. The last block served to evaluate motor performance following learning.
Data analysis.
Motor performance was quantified at movement end based on the unsigned magnitude of the lateral perpendicular deviation, |PD|, with respect to a straight line connecting the center of the start position and the center of the target zone (Fig. 1C). Movements that ended closer to the center had smaller |PD| scores. For each subject, the average |PD| before (PRE) and after (POST) training was calculated using the 15 trials without feedback, and the difference served as a measure of the participant's improvement in accuracy, , with larger positive values corresponding to greater learning. We also assessed the training-related performance in terms of the number of successful trials on which feedback was presented. To check the linear dependency between the improvement in accuracy and the overall number of successful trials, we computed Pearson's correlation coefficient between the improvement in movement accuracy from PRE to POST with the total number of successful trials in all training blocks.
Trial and error in search for the correct movement trajectory is presumably important for learning. To see how the feedback or its absence influenced the movement on the following trial, we assessed how trial-to-trial movement direction changed after every successful trial (S = 1) and every unsuccessful trial (S = 0). We quantified this with Δmn = |PDn+1 − PDn|, which signifies the difference in PD between trial n and n + 1, contingent upon trial n being successful or unsuccessful. For each subject, we first computed the mean Δm in these two conditions and then used the set of means in each condition to estimate the group mean and variability of the sampling distribution. We tested whether the average Δm was different following successful and unsuccessful trials.
MRI acquisition.
MRI data were acquired at the MNI using a 3.0 T MRI scanner (Tim Trio, Siemens). To reduce head motion and scanner noise, foam padding and earplugs were provided to the participants. During resting-state scans, each participant was instructed to lie quietly with eyes closed and avoid any head motion during the scan.
Functional images were obtained using the Simultaneous Multi-Slice BOLD-EPI WIP sequence (Setsompop et al., 2012) as follows: slice acceleration factor = 3×; TR = 1690 ms; TE = 25 ms; slices = 63; thickness = 2 mm (no gap); FOV = 200 mm × 200 mm; and flip angle (FA) = 90°. Each functional scan lasted for ∼7 min and yielded 250 volumes. Two scans were performed before and after training, respectively. We acquired two 7 min resting-state runs, rather than a single continuous 14 min scan, for the practical reason that it keeps subjects from falling asleep. Structural images were acquired with a T1-weighted 3D MPRAGE sequence as follows: TR = 2300 ms; TE = 2.98 ms; slices = 192; thickness = 1 mm (no gap); FA = 90°; and FOV = 256 mm × 256 mm, iPAT mode = ON (GRAPPA, acceleration 2×). We used a multiband accelerated imaging sequence in the current studies because we could acquire more data in a relatively short scan time (Moeller et al., 2010). Simultaneous acquisition was achieved using 32-channel multiarray head coil.
fMRI data preprocessing and independent component analysis (ICA).
Data preprocessing was performed using FSL version 6.0 software packages (www.fmrib.ox.ac.uk, FMRIB, Oxford, United Kingdom) (Smith et al., 2004). Briefly, image preprocessing consisted of the following: the removal of the first three volumes in each scan, nonbrain removal using BET, motion correction (using a six parameter affine transformation implemented in FLIRT), spatial smoothing with Gaussian kernel of FWHM 5 mm, and temporal high-pass filtering (Gaussian-weighted least-squares straight line fitting, σ = 100.0 s). The boundary-based registration with fieldmap correction aligned the subject's functional image to the subject's structural space (Greve and Fischl, 2009) and the 12 DOF nonlinear registration using FNIRT normalized the structural space to the standard MNI152, 2 mm template.
Noise artifacts in the individual datasets were identified using ICA in FSL-MELODIC (Beckmann and Smith, 2004). There is presently no consensus on the optimal number of components for the noise removal. For our present application, the ICA dimension was determined automatically by the software. On average, the total number of independent components ranged from 45 to 60. From this, components associated with the physiological noise, signal dropout, and sudden head motions were identified by visual inspection following the guidelines by Kelly et al. (2010). The number of components classified as noise and then removed was ∼10% of the total. We found that removing additional components did not yield further changes to the group statistical map.
ROI identification.
Using seed-based analysis, we assessed the temporal correlation of specific brain ROI with all other voxels in the brain. ROI locations were identified using a task-based localizer fMRI (Vahdat et al., 2011). Briefly, the task involved movement of the right arm with six alternate blocks of movement and rest, each lasting for 30 s. The movement speed was 1/3 Hz and was paced by visually presented stimuli. During the rest block, the participant remained still.
Subject-level statistical analyses of the localizer task were performed using the FEAT toolbox in FSL (Beckmann and Smith, 2004). Here, the block design was convolved with the hemodynamic response function as the main predictor in the linear model. After this analysis was completed for each participant separately, a group-level mixed-effect model analysis (FLAME) was performed using the same toolbox. The statistical map was subsequently thresholded using Z = 4.0 and p < 0.01, corrected for multiple comparisons. This map identified regions in the brain that were on average activated across subjects during the task. The map was then used to identify seed locations in the MNI coordinates. Each ROI was represented as a spherical mask of 5 mm radius around the local maximum.
A list of ROIs used in this study with their corresponding MNI coordinates and the Z value of the local maximum can be found in Table 1. Briefly, seeds were placed in the primary motor and somatosensory cortices (M1 and S1), the dorsal premotor cortex (PMd), the supplementary motor area (SMA), and the second somatosensory cortex (SII) in the parietal operculum (Vahdat et al., 2014). One seed was placed in the cerebellar lobule V and another seed in the motor region of the left putamen, all of which corresponded to the local maxima as identified by the localizer task.
Seed-based analysis with behavioral factors.
Analysis of the resting-state fMRI data was performed using a seed-based approach. We first obtained the time series of the nuisance components using the ICA process described above. Additionally, to account for further potential artifacts, the average signals within the white matter, the ventricles, and the whole-brain mask were regressed out in the present analysis (Desjardins et al., 2001). To do so, white matter and ventricles were first segmented using FSL-FAST before being mapped into the subject's native functional space. To increase tissue precision, both images were thresholded using a tissue probability of 90%. We then used the resulting image as a mask to extract the average time series inside the white matter and ventricles.
To extract the temporal correlation between a seed and other brain regions, a multiple regression analysis was performed using FEAT. Specifically, the ROI time series was the main predictor of interest, whereas the average time series of white matter, ventricles, global signal, the nuisance components obtained from ICA earlier, and six motion parameters were regressed out from the whole-brain time series. The results were brain regions that were temporally correlated with the seed after regressing out unwanted temporal noise. We repeated this step for all seeds on every run of each subject.
After this stage was completed, a group-level repeated-measures t test was performed for each seed using a mixed-effect model (FLAME) package in FSL. The design matrix consisted of a series of explanatory variables or predictors. The first set explained the subject average or common effect among different runs. The second set comprised a behavioral factor with the aim of finding differences that were associated with our behavioral manipulation (Vahdat et al., 2011). Specifically, we examined ΔFC in relation to the number of trials with positive feedback during training. Only successful trials in the last two training blocks were used for this analysis because the width of the target was the same for all subjects. An analysis of connectivity changes related to improvements in movement accuracy was also conducted. The patterns were similar to those reported below for connectivity changes related to successful movements. These are not presented separately because subject feedback during training was restricted to binary feedback on movement success; hence, we considered positive feedback as the main factor determining the increase in accuracy in the post-training session thereafter. Moreover, the two behavioral measures of reinforcement and accuracy were significantly correlated (see below).
In a second set of group-level analyses, we examined ΔFC related to trial-to-trial changes in movement direction. For each subject, we averaged the changes in movement direction (Δm) regardless of the trial outcome and applied this as the behavioral predictor. In a subsequent analysis, we examined ΔFC that were uniquely attributed to the change in movement following either successful or unsuccessful trials. Here, we separately averaged Δm following only successful (S = 1) and only unsuccessful trials (S = 0). We put these two sets of values as the predictors within one general linear model to determine changes that were uniquely explained by one factor independent of the variability shared with the other factor.
For both group analyses, a correction for multiple comparisons was performed using Gaussian random field theory using a cluster forming threshold Z = 2.40 with p < 0.05. Two different contrasts were evaluated (i.e., POST > PRE and POST < PRE) to test for increases or decreases in FC following training. The thresholded group statistical maps of each seed revealed clusters whose changes in connectivity with the seed region were reliably associated with the corresponding behavioral predictor. To correct for multiple seeds (i.e., Bonferroni correction for choosing seven seeds), clusters obtained from the group-level analyses were considered to be significant if the probability level was lower than p < 0.05/7.
The whole-brain global signal in the resting-state data is usually included as one of the unwanted components. However, the removal of the global signal has been controversial as it introduces a negative bias to the resting-state statistical map (Saad et al., 2012). Because we computed the difference between the PRE and POST training scans, this negative bias did not affect the difference maps presented below. To quantify the strength of the FC measure in each scan before and after training trials, we repeated the same analysis but without removing the global signal time series. The results of the group analyses without the global signal removal yielded similar statistical maps.
Results
Behavioral performance
Figure 2A depicts movement accuracy as quantified using the absolute lateral deviation at the end of movement during the baseline test (PRE), the four training blocks, and the motor evaluation block (POST). The reduction in the mean |PD| over 15 movements between the PRE and POST training blocks provides a measure of how accuracy improved as a result of training. On average, the reduction was found to be significant (t(21) = 2.080, p < 0.05) and reliably correlated with the total number of successful trials over the course of training (r = 0.44, p < 0.05) (Fig. 2B). Participants that achieved a greater number of successful trials had a tendency to display a greater improvement in movement accuracy.
We gave the participants the opportunity to improve their movement accuracy with four training blocks during which they received positive feedback if the movement ended within the target zone and no feedback otherwise. Three target zones that gradually decreased in width were incorporated during training trials to progressively shape subjects' behavior. During the first and second training blocks, the percentage of success was in the range of 70%–80%. When the target width was reduced to the smallest, the percentage decreased to 30%–40% but nevertheless increased over the course of training (Fig. 2A, inset). We checked the relationship between subjects' performance in the first two blocks, in which the width of the target varied and that in the last two blocks, in which target width was fixed (8 mm). We found that subjects that had more successful movements during the first two blocks did so as well in the last two blocks (r = 0.46, p < 0.05). The fact that the target was smaller in the last two blocks likely contributed to the slowing of learning seen in the third and final block (Zone III) of Figure 2A.
To assess the effect of feedback on subsequent movements, we calculated Δm as the absolute change in PD between the current and next immediate movement. We used training data from blocks 3 and 4 for this calculation because the target size was uniform across subjects. Figure 2C illustrates the distributions of Δm following successful and unsuccessful trials as a half-normal Gaussian curve. The figure shows that the average Δm after successful trials is significantly less than the average Δm after failed trials (t(21) = 3.988, p < 0.001). In other words, failing to get positive feedback resulted in a greater trial-to-trial change in movement direction, presumably in search of the correct target zone. The average Δm after successful and unsuccessful trials were linearly related (r = 0.53, p < 0.05). In addition, subjects who had a greater number of successful movements, and hence received more positive feedback, displayed a smaller change in movement direction following both successful (r = −0.72, p < 0.001) and unsuccessful movements (r = −0.58, p < 0.005).
Furthermore, again restricting the analysis to the data from the final two training blocks, we assessed the trial-to-trial change in movement direction (Δm) as a function of the number of consecutive successful trials and of the number of consecutive failed trials. A weighted least-squares regression was calculated to predict these relationships (Fig. 2D). We found, using a simple linear mixed model, that the average Δm increased with the number of failed trials since the last successful movement (F(1,17.13) = 6.97, p < 0.05). On the other hand, there was a reduction in average Δm when preceding movements were successful (F(1,10.64) = 10.81, p < 0.01). Thus, movement variability, and presumably exploration, progressively increased following unsuccessful movements and decreased following successful movements (Sutton and Barto, 1998).
Selection of ROIs (seeds)
We assessed changes in ΔFC associated with the number of successful trials during learning using a seed-based approach. We identified seven seed locations based on the local maxima in the group-level task-based localizer data. The seeds regions were located in the left M1 (primary motor cortex, BA4), left S1 (primary somatosensory cortex, BA2), left dorsal premotor cortex (PMd, BA6), the SMA, left second somatosensory cortex SII (parietal operculum, OP1), right cerebellar lobule V (CbV), and the left rostral motor area of putamen (Pu). The seed location of putamen in this study is restricted to the motor region as defined by the Oxford-GSK-Imanova Striatal Atlas (Tziortzi et al., 2014). The MNI coordinates of each seed region along with its corresponding Z score are listed in Table 1.
Changes in FC related to training performance
To identify ΔFC associated with the behavioral manipulation, we included as the predictor the number of successful trials in the last two training blocks which had the same target size for all subjects. Figure 3 shows ΔFC that are significantly correlated with the number of successful trials. The seed regions are given in green and, to the right, are those clusters of voxels for which the correlation with the seed region changed in proportion to the number of successes. The scatterplots depict the relationship between the change in FC and individual differences in the behavioral performance. Additionally, Table 2 provides the list of clusters that show change in connectivity with the individual seed and the coordinate of the maximum Z value in the clusters. The cluster p value is significant when it is <0.05/7 (corrected for multiple seed selection). FC measure (strength) before and after training is given by the average Z score of correlation between the ROI time series and the time series of the corresponding cluster, with a negative value indicating an anticorrelation.
Changes in FC related to the number of successful trials were observed with seeds in the left M1 and PMd (Fig. 3). This measure was associated with increases in FC in a network comprising M1, PMd, S1, and SMA. The positive correlation indicates that subjects that achieved more successful trials had higher connectivity strength following training. A similar trend was observed in the connectivity between the seed in the left SII and S1. An increase in connectivity strength between SII and S1 was positively correlated with the number of successful trials.
The number of successful trials also predicted both increased and decreased connectivity with the putamen (Pu) seed. An increase in FC was found with vmPFC that extends to a portion of the ventral striatum. On the other hand, we found a reduction in the functional strength with somatic areas comprising a region in the parietal operculum (SII) and S1 that extends to the anterior intraparietal sulcus. We further tested this observation and found that the increase in connectivity between Pu and vmPFC was strongly correlated with the decrease in connectivity between Pu and S1/SII (r = −0.58, p < 0.01). This suggests that subjects who were more successful during training, and thus received more positive feedback, had stronger connectivity involving vmPFC but reduced connectivity with the somatosensory areas of the brain.
Changes in FC related to feedback-dependent changes in movement direction
Figure 4 depicts the results of a second set of analyses, focusing on functional networks related to trial-to-trial movement direction changes (Δm). We first analyzed ΔFC associated with Δm, regardless of the trial outcome. We then proceeded to segregate networks involved in the repetition of successful movements and those presumably involved in exploration when the preceding movements were unsuccessful. In this case, both factors were included in a single general linear model, enabling us to identify brain areas that were associated with each predictor separately after removing changes in connectivity that were related to the other variable.
In Figure 4 (top), the seed regions are shown in green and, to the right, we show the voxels whose correlation with the seed region is dependent on change in the movement direction regardless of whether the preceding movement was successful or unsuccessful. Table 3 summarizes the connectivity measure (strength) between the individual seed locations and the corresponding clusters before and after training. It is seen that connectivity between SII and sensorimotor areas is strengthened as a result of training, but the connectivity between the SMA seed and two subcortical clusters is reduced. The clusters were found to be bilateral but with a statistical peak in the left putamen and left thalamus, respectively (Table 3, top). Subjects with smaller trial-to-trial changes in direction had greater SMA-putamen connectivity.
Significant changes in connectivity, which were dependent on whether the preceding trial was successful or unsuccessful, were restricted to movements following unsuccessful trials (Fig. 4, bottom). It was found that change in movement direction after unsuccessful trials predicts the decrease in connectivity between SII and two areas in the right hemisphere. The first area is BA 9/46 in the lateral prefrontal cortex just above the inferior frontal sulcus, and another area is supramarginal gyrus. The correlation is found to be negative; that is, subjects who explored the space more widely following unsuccessful trials experienced a greater reduction in functional strength. We did not observe any reliable correlation with the left prefrontal region.
The connectivity between M1 and posterior intraparietal sulcus (pIPS) was also found to increase in proportion to the change in direction following unsuccessful trials. The cluster with increased connectivity covers the parieto-occipital border and extends to posterior angular gyrus (Area PGp). A seed placed in SMA shared a similar pattern of change in connectivity with pIPS and angular gyrus. The positive correlation observed here implies that stronger functional interaction between the two regions is associated with a greater change in movement direction following unsuccessful movements. It is noteworthy that there is no direct anatomical connection between pIPS and M1 in macaques. However, the observed FC between pIPS and M1 might be supported through the dorsal premotor area, which is directly connected with both M1 and pIPS (Tanné-Gariépy et al., 2002).
No reliable changes were observed in connectivity that was uniquely associated with the change in movement direction following successful trials. This might be due to the fact that change in movement direction is substantially less following successful movements and differs little between subjects (Fig. 2C). The absence of significant ΔFC under these circumstances thus likely resulted from a lack of variability in the behavioral predictor.
Discussion
The motivation for this study was to identify changes in functional networks of the brain that are associated with learning sensorimotor targets in the initial stages of human motor learning. To focus on somatic target acquisition in the early stages of learning a novel motor skill, we used movements that were already part of the individual's motor repertoire in combination with target locations that were initially unknown. The task was designed to allow trial and error in search for the correct limb position and to provide positive feedback as reinforcement during training. The results suggest that the initial stages of motor learning are to be understood as not entirely motoric. Evidence of plasticity was obtained in somatic networks that are related to exploration, and also in prefrontal areas, related to reinforcement.
In behavioral terms, we found that, on average, performance improved compared with baseline. The extent of the improvement varied in proportion to the number of successful training trials, with subjects that were more successful during training having the greatest improvements in movement accuracy.
We used resting-state fMRI to elucidate changes in connectivity in relation to success during learning. We found that learning changed the FC both in cortical sensory and motor areas of the brain. Participants that had a greater number of successful trials showed larger increases in FC in a network comprising the left M1, S1, SMA, and PMd. The finding is consistent with previous resting-state imaging work involving both sensorimotor adaptation and somatosensory perceptual learning (Albert et al., 2009; Vahdat et al., 2011, 2014). The participation of these same motor regions in reward-related tasks has been observed in prior studies in both humans (Ramnani and Miall, 2003; Kapogiannis et al., 2008) and monkeys (Roesch and Olson, 2003; Sul et al., 2011).
Areas in the prefrontal cortex not typically associated with motor learning were likewise involved and showed a contribution, which varied across subjects in a manner related to their behavioral performance. Specifically, changes in connectivity were observed between the putamen and the vmPFC that were related to the number of successful trials. The vmPFC is a region in which activity is associated with stimulus–reward value, selecting actions that are more rewarding (O'Doherty et al., 2003; Rushworth et al., 2004; Daw et al., 2006) and encoding the value of performed decisions (Knutson et al., 2001; Smith et al., 2010).
We observed that, across participants, the increase in connectivity between Pu and vmPFC was accompanied by a reduction in connectivity with the primary and secondary somatosensory regions. This suggests that there are individual differences in the participation of putamen in motor learning. In particular, individuals who are more reliant on reward for learning, as indicated by a strengthening of connectivity with prefrontal circuits, show a functional dissociation between the putamen and sensorimotor areas. This is consistent with the idea of a competition between the somatic and reward-related neural networks in the basal ganglia during early stages of human motor learning (Mink, 1996; Colder, 2015). More generally, these changes may bear on the relationship between reinforcement-based learning and error-guided behavior that has been the focus of previous research. As the sensorimotor goal takes shape following exploration and reinforcement, motor learning and control processes presumably shift to be more error-based. The finding that, following learning, individuals who show greater increases in connectivity between putamen and medial frontal cortex show reduced connectivity between putamen and sensorimotor cortex may reflect the neuroanatomical substrate of this progressive shift.
Prefrontal regions involved in reward-guided decision making, such as ventromedial and orbitofrontal cortex, have extensive anatomical connections with the ventral striatum (Haber et al., 1995), but not with the putamen. The observed changes in connectivity might be explained by the fact that FC measures are not only modulated by direct anatomical connections but also by indirect pathways (Koch et al., 2002). A potential indirect pathway underlying the observed result entails the projection of vmPFC to the ventral striatum, and in turn to substantia nigra pars compacta and then to sensorimotor striatum (Selemon and Goldman-Rakic, 1985; Haber et al., 2000). As part of the reward system, vmPFC and ventral striatum potentially guide motor learning where one is able to learn the appropriate target position and attempt to repeat successful movements. Such reward-guided action selection is thought to involve putamen (Samejima et al., 2005).
Unlike studies in motor adaptation and sequence learning, the current study did not find a statistically reliable correlation between behavioral predictors and changes in the corticocerebellar functional network. Activation of the arm area in the cerebellar cortex was observed in task-based localizer scans, so the lack of change in connectivity is not due to an inability to observe activity in cerebellum. Moreover, reliable ΔFC between cerebellar cortex and frontal motor areas have been observed previously in the context of force-field adaptation (Vahdat et al., 2011). If cerebellum plays a role in the correction for error (Diedrichsen et al., 2005; Smith and Shadmehr, 2005), the absence of a reliable relationship in the present data may arise by virtue of the task involved in which the sensory error signal is weak at this stage of learning.
The current study provides an account of spontaneous exploration dynamics during the early stages of learning a novel motor skill. We observed a trial-to-trial change in movement direction that was influenced by the preceding feedback. Change in movement direction was greater following unsuccessful trials. We also identified a relationship between exploration and feedback, such that exploration increased proportionally with an accumulation of unsuccessful trials and decreased proportionately with an accumulation of successful trials. Moreover, subjects that produced more accurate movements had smaller change in their movement direction, even when the preceding movement was unsuccessful.
We assessed ΔFC using as predictors the change in movement direction following only either successful or unsuccessful trials. Connectivity between SII and a region in the ventrolateral prefrontal cortex varied systematically with changes in movement direction following unsuccessful trials. This area, in monkeys, which is analogous to BA 9/46v, is somatic and has both inputs and outputs to other somatic regions of the brain, such as ventral premotor cortex, the parietal operculum (SII) and the inferior parietal lobule (Petrides and Pandya, 1984). This area of lateral prefrontal cortex is engaged during somatic memory and discrimination tasks in both nonhuman primates and human studies (Romo et al., 1999; Stoeckel et al., 2003; Kostopoulos et al., 2007). Other neuroimaging studies indicate that the right prefrontal cortex is involved in tasks involving bimanual motor sequences (Sun et al., 2007) and spatial working memory (d'Esposito et al., 1998; Owen et al., 2005), specifically, in relation to visuomotor adaptation (Anguera et al., 2010). The pattern of connectivity changes observed here suggests that working memory may be one of the elements through which reinforcement results in learning, especially during movement exploration.
It is worthwhile noting that the ΔFC observed here were obtained from scans that occurred 1 h following the end of the behavioral training. The persistence of learning observed following brief periods of training with a motor task is consistent with a considerable body of behavioral and neuroimaging data. This has been shown behaviorally in the context of reinforcement-based motor learning (Bernardi et al., 2015), as well as in force-field adaptation and visuomotor rotation (Shadmehr and Brashers-Krug, 1997; Krakauer et al., 2005). Persistence of learning in these studies has been observed at intervals up to 1 week. Similarly, neuroimaging studies have observed that changes in resting-state networks persist for at least 6 h following brief periods of motor learning (Sami et al., 2014). The persistence of these changes is likely supported by cellular mechanisms, such as LTP and LTD. These mechanisms affect neuronal metabolism and oxygen consumption, which in turn are reflected in the resting-state signal following learning (Logothetis, 2002).
Overall, it is observed that the acquisition of sensorimotor targets in the early stages of motor learning is dependent on both exploration and positive reinforcement. It is found that reinforcement is associated with an increase in FC in traditional sensorimotor circuits (M1, S1, PMd, SMA). Areas of prefrontal cortex are also important, subserving both reward-guided behavior (medial prefrontal cortex) and exploratory movement (ventrolateral prefrontal cortex). In future studies, it would be desirable to directly test the role of ventrolateral prefrontal cortex in providing somatic working memory during exploratory behavior. It would also be meaningful to test the idea that, as learning progresses, there is progressive shift from reinforcement-based learning during the formation of sensorimotor targets to error-based control as learning progresses.
Footnotes
This work was supported by the National Institute of Child Health and Human Development R01 HD075740, Les Fonds Québécois de la Recherche sur la Nature et les Technologies, Québec, and the Natural Sciences and Engineering Research Council of Canada. We thank Dimitrios Palidis for assistance in conducting the experiments; and Dr. Floris Van Vugt for valuable feedback.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. David J. Ostry, McGill University, Department of Psychology, 1205 Dr. Penfield Avenue, Stewart Biology Building, Montreal, Quebec H3A 1B1, Canada. david.ostry{at}mcgill.ca