We studied the neural correlates of visuomotor sequence learning using functional magnetic resonance imaging (fMRI). In the test condition, subjects learned, by trial and error, the correct order of pressing two buttons consecutively for 10 pairs of buttons (2 × 10 task); in the control condition, they pressed buttons in any order. Comparison between the test condition and the control condition revealed four brain areas specifically related to learning: the dorsolateral prefrontal cortex (DLPFC), the presupplementary motor area (pre-SMA), the precuneus, and the intraparietal sulcus (IPS). We found that the time course of activation during learning was different between these areas. To normalize the individual differences in the speed of learning, we classified the performance of each subject into three learning stages: early, intermediate, and advanced stages. Both the relative increase of signal intensity and the number of activated pixels within the four areas showed significant changes across the learning stages, with different time courses. The two frontal areas, DLPFC and pre-SMA, were activated in the earlier stages of learning, whereas the two parietal areas, precuneus and IPS, were activated in the later stages. Specifically, DLPFC, pre-SMA, precuneus, and IPS were most highly activated in the early stage, in both the early and intermediate stages, in the intermediate stage, and in both the intermediate and advanced stages, respectively. The results suggest that the acquisition of visuomotor sequences requires frontal activation, whereas the retrieval of visuomotor sequences requires parietal activation, which might reflect the transition from the declarative stage to the procedural stage.
- visuomotor sequence
- functional magnetic resonance imaging
- prefrontal cortex
- presupplementary motor area
- intraparietal sulcus
The pattern of human brain activation in motor learning has been extensively studied using various paradigms, including pursuit rotor task (Grafton et al., 1992, 1994), complex sequential finger movements (Seitz et al., 1990; Schlaug et al., 1994; Karni et al., 1995), sequential eye movements (Petit et al., 1996), and serial reaction time task (Grafton et al., 1995; Rauch et al., 1995), as well as motor learning with trial and error (Jenkins et al., 1994; Jueptner et al., 1997a,b). These studies have shown activation of several areas in the association cortices and subcortical structures. Specifically, the prefrontal and posterior parietal cortices were activated when the subject was explicitly learning a new motor behavior (Jenkins et al., 1994; Schlaug et al., 1994; Grafton et al., 1995; Jueptner et al., 1997a), and the activation decreased when the subjects had learned the behavior (Seitz et al., 1990; Grafton et al., 1994; Jenkins et al., 1994). However, there have been few studies until recently that asked whether the time course of activation was different between the prefrontal and posterior parietal areas (Shadmehr and Holcomb, 1997).
We have been studying the neural mechanism of learning using a behavioral paradigm called 2 × 10 task (2 × 5 task for monkeys) (Hikosaka et al., 1995, 1996). In this task, 10 pairs of targets are presented consecutively in a fixed order, and subjects have to press the corresponding buttons in the correct order that they have to find by trial and error. After repeating the same sequence sufficiently often, the performance of the subject becomes nearly automatic. This task was first applied to experiments using monkeys (Hikosaka et al., 1995). Miyashita et al. (1995) showed that the neurons in the presupplementary motor area (pre-SMA) were activated preferentially for new sequences and that the activity decreased over time as the monkey acquired the sequence. However, it was difficult to compare the activities between different areas simultaneously. Imaging studies should be advantageous in this respect. We thus decided to apply the same task to human subjects using the functional magnetic resonance imaging (fMRI) technique. In the preceding paper, we reported that the anterior portion of the medial premotor cortex, which was presumed to be the pre-SMA, was activated during learning (Hikosaka et al., 1996). In the present study, we found that two frontal areas (including the pre-SMA) and two parietal areas showed learning-related activation. Our main interest was to compare the time course of activation among the four cortical areas.
MATERIALS AND METHODS
Seven normal subjects participated in the study (six males and one female; ages 29–50; all right-handed). Informed consents were obtained from all of the subjects before the study. Four of the seven subjects were studied repeatedly.
Stimuli and apparatus
Four white rectangles arranged in a two × two matrix were shown on a screen in which two circles appeared simultaneously (called “set”). A total of 10 sets (called “hyperset”) was sequentially presented in a fixed order. Presentation of the stimuli and generation of the sequence were controlled by a computer (PC9801FA or PC9821Ap2; NEC). The subjects lay supine in the MRI scanner and saw the visual stimuli through a mirror. They held a plate on which four button switches were arranged in a two × two matrix, corresponding to the four rectangles on the screen. During the fMRI experiments, an auditory cue was presented at a constant rate of 1 Hz through headphones.
True learning: test condition. The test paradigm, “true learning,” required the subjects to learn the correct order of button presses by trial and error (Hikosaka et al., 1996) and was modified from the 2 × 5 task originally devised for monkeys (Hikosaka et al., 1995). The subject was asked to press sequentially the two buttons corresponding to the circles presented on the screen with the index and the middle fingers of both hands. Instruction was given to use only one finger per button. The positions and the order of the circles were fixed in each set, as was the order of the 10 sets within one hyperset (an example is shown in Fig.1). The subject had to find the correct order of button presses by trial and error for the consecutive 10 sets. If a button was pressed in the wrong order in any set, the two circles disappeared simultaneously, and after a pause of 0.5 sec, the subject had to start over again from the first set of the sequence. If the buttons were pressed in the correct order, the circles disappeared after each button press, and after a pause of 0.1 sec, the next set started. After successful completion of 10 sets, the same hyperset was repeated from the beginning. Because there was no general rule to find out the correct order, the subject had to learn the whole hyperset as a single unique sequence.
Pseudo learning: control condition. The aim of the present study was to analyze the learning-related brain activation. However, the test paradigm required the subjects to look at the visual stimuli and to move their fingers. To subtract these sensorimotor activities, we devised a control paradigm that we call “pseudo learning,” in which the subject experienced the same sensorimotor processes but in which no learning was involved. In this task, two circles appeared simultaneously within a two × two matrix, and the subject had to press the corresponding buttons, similarly to the true-learning task. However, the correct order of button presses was randomized for each trial. Therefore, it was no use for the subject to try to learn the correct order, and the subject was instructed to press the buttons in any order. To mimic the true-learning process, we decreased gradually the probability of error for a given set from the chance level (50%) to 0% each time the set was completed successfully. Thus, the number of completed sets in each trial gradually increased up to 10 sets, similarly to the true-learning process.
Using pseudo learning as the control task and true learning as the test task, we performed fMRI experiments. Comparison of the magnetic resonance signals between the two conditions would reveal selectively the learning-related activation. Additionally, on a separate session, we measured the reaction time of button pressing to see the change in the speed of information processing during learning.
Procedures for fMRI experiment
Before the fMRI experiments, the subjects were instructed about the task and practiced both the true-learning and pseudo-learning tasks on a training hyperset. After the subjects practiced for ∼0.5 hr, the following fMRI experiments were conducted using a new hyperset.
Subjects were scanned using the head coil of a 1.5 tesla whole-body scanner (Siemens, Erlangen, Germany). Head motion was reduced by using a bite bar, straps around the forehead, and ear fixation blocks. First, we obtained a high resolution series of T1-weighted anatomical images of the whole brain (FLASH; TR/TE/TI/FA/matrix/FOV, 2800/4/300/15/256 × 256/256 × 256; slice thickness, 1 mm) in contiguous sagittal sections, which served as the data set for the determination of anatomical landmarks and coordinates.
Then, a time series of 128 scans were performed with interscan intervals of 4.0 sec (five subjects) or 6.0 sec (two subjects), while the subjects performed eight alternating epochs of the test and control tasks. In each scan, 10 axial slices of T2-weighted gradient-echo echo-planar images (TR/TE/TI/FA, 122/66/300/90) were collected parallel to the line connecting the anterior and posterior commissure (AC–PC line), and the bottom slice was adjusted to include the AC–PC line. Images of 5 mm thickness were obtained with a 128 × 128 matrix and 1.72 × 1.72 mm in-plane resolution.
The subject performed true learning for the test epochs and pseudo learning for the control epochs. The same hyperset was used throughout the eight test epochs. Hypersets for the test and control conditions were different, so that the subject could not practice (or rehearse) the test sequence. The pseudo-learning task was complex enough to prevent the mental rehearsal of the hyperset that the subject was learning in the test condition. In the control epochs, unlike in the test epochs, the order of sets was randomized so that the subject could not learn a particular sequence spontaneously. The rate of button presses was controlled for both the test and control conditions; the subject was asked to pace two button presses (i.e., one set) with an auditory cue presented at a constant rate of 1 Hz. “Learn” or “pseudo” were presented on the screen before each epoch to indicate the next condition. During that time, one dummy scan was acquired that was not included in the functional analysis, followed by seven scans for the test or control epoch. The times between stimulus presentation and button presses as well as the judgment of the button presses (correct or wrong) were recorded and stored in the computer.
Analysis of behavioral data
fMRI experiments. The number of completed sets in each trial of the subjects was recorded. Because each epoch lasted 28 (4 × 7) sec or 42 (6 × 7) sec and a successful performance of the whole hyperset took 10 sec, the subjects could complete the whole hyperset from two to four times within an epoch (if they made no error). The number of trials within an epoch could be larger if the subject made errors (see Fig. 2).
To correlate the brain activation with the grade of learning, we classified the eight test epochs into one of the three learning stages based on the performance of individual subjects (which showed considerable variation) using the following criteria: early stage, epochs in which the subject could not complete the hyperset; intermediate stage, epochs in which the subject could complete the hyperset but also made errors in other trials; and advanced stage, epochs in which the subject made no error.
The experiments in which the subjects could not reach the advanced stage or reached it only in the last test epoch were not included for further analysis. These subjects were tested again using a new hyperset until they showed sufficient learning.
Reaction time measurements. In the fMRI experiments, the button presses were paced. We thought, however, that it was important to know the speed of information processing during learning. We thus performed an experiment for reaction time measurements as a separate session. The subjects, seated in front of the computer, were asked to learn a new hyperset by pressing buttons as quickly as possible and at the same time to minimize the number of errors while learning a hyperset. In each trial, the number of completed sets as well as the performance time for each trial was recorded. “Set completion time” was calculated for each trial by dividing the performance time by the number of completed sets; it corresponded to the mean time to complete a set (to press two buttons in a set).
An additional experiment was performed for four subjects to estimate the grade of learning after a learning session of the fMRI experiment. The subjects were asked to learn a hyperset in the same way they did in the fMRI experiments with their movement rate paced at 1 Hz, after which they were asked to perform the same hyperset as quickly as possible. The set completion time was calculated to see whether the improvement of performance occurred during the fMRI experiment similarly to that in the reaction time measurement session.
Eye movement recording. Eye movements were recorded in one subject by using an infrared eye tracker (Ober 2–12 bit parallel system; Permobil Meditech, Woburn, MA), while the subject performed the same task procedure used in the fMRI experiment. This session was conducted within the MRI scanner, but no scan was performed because it would cause severe artifacts on the eye movement recordings. The saccades with an amplitude exceeding 2° were counted every 30 sec for both the test and control conditions. This amplitude corresponded to the angular distance between two adjacent circles on the screen.
Analysis of fMRI data
Motion correction. First, we applied a motion correction program, AIR 3.0, to the obtained images (Woods et al., 1992, 1993). The first image of the sequence was designated as the reference, and each subsequent image volume was aligned with the reference in a pairwise manner. After a ratio-variance minimization process for each pair, a new time sequence of image sets was created by reslicing each of the original image sets. This program would produce images with residual motion under 0.5 mm and would remove significant motion-induced artifacts while restoring true regions of activation (Jiang et al., 1995).
Overview of the time course analysis. Based on these functional image data sets, we identified learning-related areas and analyzed the time course of activation for these areas. To account for the difference in the learning performance between subjects, we compared the pattern of brain activation between the three learning stages described above using two methods, one method based on the change in the relative signal intensity (SI) increase and the other based on the change in the number of activated pixels.
For the SI method, we first created activation maps from the entire time series of SI data and identified learning-related areas. We then calculated the time course of activation for each learning-related area. The disadvantage of this method is that the brain area that was activated only for a short period during the learning process would not have been detected.
Such errors would be minimized by the activated-pixel method. We divided the entire learning period into the three learning stages based on the performance of the subject and created an activation map for each learning stage. However, the data obtained with this method would not reflect the change in the magnitude of activation.
Therefore, we used these two methods so that each method would compensate for the shortcomings of the other. The details of these methods are described in the following sections.
Time course analysis 1: change in SI. In the first method, we analyzed the time course of SI changes for representative brain areas that showed significant activation based on the data for the entire learning period. After eliminating the SI data during the instruction period (16 data points), the time series of SI data (112 data points, seven scans in each of the eight epochs for the test and the control conditions) was cross-correlated with an idealized reference function derived from task alternation (Bandettini et al., 1992, 1993; Friston et al., 1994). The reference function was represented by a boxcar waveform shifted by one data point to account for the hemodynamic latencies and rise times. If the correlation coefficient (CC) calculated for a pixel was >0.3, the pixel was determined to show learning-related activation. This threshold of 112 data points corresponds to a statistical significance of p< 0.001 for the Student’s t test. Pixels activated in isolation were excluded from the activation maps. For graphical presentation, we assigned colors from red to dark blue for CCs of 1.0 to −1.0. These activation foci were superimposed onto the corresponding coplanar anatomical echo-planar images. The locations of the activation foci were determined further on the basis of the high resolution T1-weighted FLASH images by using a multiplanar reconstruction method. Although the functional images obtained by the echo-planar sequence were distorted compared with the anatomical images obtained by the FLASH sequence, we could determine the location of each activation focus by identifying the major sulci and gyri (Ono et al., 1990; Duvernoy, 1991; Naidich et al., 1995).
In addition, the high resolution anatomical images for each subject were individually registered onto the Talairach atlas (Talairach and Tournoux, 1988) with a polynomial transformation using MEDx 2.0 (Sensor Systems), and the coordinates of the learning-related areas were obtained. This procedure was taken only to confirm the consistency of the location of learning-related areas across the subjects, and we used the untransformed data for the subsequent time course analysis. Also no attempt was made to average the activation maps for different subjects, which would likely have resulted in artifactually diminished magnitudes of activation. Thus, the learning-related areas were identified individually for each subject.
To analyze the time course of brain activation, we calculated ΔSI, the SI change (test − control), for each of the eight test epochs individually for each activation focus. Before averaging the SIs within each activation focus, the relative SI change between the test and control conditions was analyzed individually for each pixel. Pixels with abnormally high SI increases (>10%) would be suspect for underlying vasculatures. Close inspection of the T2-weighted echo-planar anatomical images usually revealed underlying vasculatures in the corresponding areas that showed up as dark spots. These data containing vasculatures were excluded from the analysis. Then the average of the SIs of all the pixels within each activation focus was calculated for each of the 112 functional image data sets. Then, the average of the SIs for the before and after control epochs (seven data points for each) was subtracted from the average SI at the given test epoch (seven data points) and was expressed as the percent of the averaged SI for the control condition: ΔSI(n) = 100 × ([mean SI_T(n)] − [mean SI_C(n,n + 1)]) / [mean SI_C(n, n + 1)], where ΔSI(n) is the ΔSI for the nth test epoch, mean SI_T(n) is the mean SI of the nth test epoch, and mean SI_C(n, n + 1) is the mean SI of the nth and (n + 1)th control epochs. An exception was that ΔSI for the last test epoch, ΔSI(8), was calculated by subtracting only the mean SI for the eighth control epoch.
ΔSI thus calculated should reflect the degree of learning-related activation for the given test epoch, and the change in ΔSI across the eight learning epochs would reflect the time course of activation at the selected brain area. We then calculated the mean of ΔSIs during each learning stage to correlate the individual learning performance to the time course of brain activation. Thus, the ΔSIs across the learning stages obtained for individual subjects were compared using Wilcoxon’s signed rank test.
Time course analysis 2: change in the number of activated pixels. In the second method, we created activation maps individually for each of the three learning stages. The time series of SI data for each learning stage was separately cross-correlated with the boxcar reference function, and significant activation was determined as pixels above the threshold CC. Pixels activated in isolation were eliminated from the analysis, and the activation foci observed in more than four subjects were analyzed. For each learning stage, the threshold CC was determined so as to correspond to thep value of 0.01. The corresponding CC was calculated by the following equation (B. Pütz et al., unpublished observations): CC = [t 2 / (t 2 + n − 2)]1/2, where t represents the Student’st test value for n − 2 degrees of freedom (n denotes the number of data points included in the CC calculation). For example, when the subjects completed the hyperset in the fourth test epoch, the SI data from the first to the third test and control epochs were included in the data set for the early stage. The number of data points was 42 (2 conditions × 7 scans × 3 epochs), resulting in the 40 degrees of freedom. Because thet value corresponding to a p value of 0.01 is 2.423, the corresponding CC was calculated to be 0.3578. The activation map for the early stage was created by using this value of CC.
In this way, three activation maps were created for each subject, each map corresponding to one learning stage. We counted the number of activated pixels for each activation focus and then compared them between the three learning stages using Wilcoxon’s signed rank test.
The subjects, lying supine in the MRI scanner, learned a single sequence (hyperset) during a session of the fMRI experiment. The learning was divided into eight epochs (test condition), which were alternated with eight epochs of pseudo learning (control condition).
Four of the seven subjects did not show sufficient learning within the period of the fMRI experiment and were tested again using a new sequence. Only the data with sufficient learning were analyzed. As the subjects learned a hyperset in the test condition, the number of completed sets for each trial gradually increased (Fig.2 a, left). The whole hyperset (i.e., 10 sets) has been completed usually by the third to the sixth test epoch (e.g., fourth test epoch for the subject shown in Fig. 2 a). However, the learning was incomplete, and the subjects still made errors before completing the hyperset (e.g., fourth to the sixth epochs in Fig. 2 a). By the sixth to the eighth epoch, the subjects were able to complete the hyperset without an error (e.g., seventh test epoch in Fig. 2 a).
In the control condition (pseudo learning), the subjects were allowed to press the two buttons in any order, but the correct order was randomized for each trial, with the probability of error gradually decreasing from the chance level. In consequence, the number of completed sets increased similarly to the process in true learning, and the subject “completed” the whole hyperset usually by the fourth to the sixth control epoch (Fig. 2 a, right).
Reaction time measurements
We performed an experiment for reaction time measurements as a separate session. The subjects were asked to press buttons as quickly as possible. The main parameter used here was set completion time, which was the mean time required to complete a single set (i.e., time for pressing two buttons in a set). The learning process, measured as the number of completed sets, was similar to that observed in the fMRI experiment (compare Fig. 2 b, left, with 2a, left).
The set completion time was relatively short in the early stage of learning (average of the first five trials for all subjects, 657 ± 83 msec) and showed a gradual increase usually until the subjects completed the hyperset (i.e., 10 sets) for the first time (839 ± 132 msec for the average of five trials after the first completion of the hyperset). In the intermediate stage, the set completion time decreased gradually. In the advanced stage (when the subjects made no errors), the set completion time remained short (662 ± 75 msec for the first five consecutive completions of the hyperset). The results suggested that different modes of information processing occurred for the three learning stages. We therefore classified the eight epochs of fMRI experiments into the three learning stages as described in the Materials and Methods.
In pseudo learning, the subjects were allowed to press buttons in any order (Fig. 2 b, right). Although the number of completed sets increased, the set completion time remained short (571 ± 68 msec for the average of 20 trials), suggesting that cognitive processes were unchanged during pseudo learning.
We considered the possibility that the characteristics of learning may be different between the fMRI and the reaction time experiments, depending on whether or not movements were paced. We thus asked the subjects to perform a hyperset as quickly as possible after they had learned the hyperset in the same way as in the fMRI experiment. We found that they could perform the task nearly completely and quickly (Fig. 2 c). The set completion time was 626 ± 67 msec for the average of five trials after the fMRI task procedure, which was shorter than that for the advanced stage in the reaction time experiment (Fig. 2 b). The finding suggests that the subjects acquired a sequential procedure during the fMRI experiment even though they had to keep the movement rate constant.
Eye movement recording
The number of saccades exceeding 2° for every 30 sec period was 33 ± 4.9 for the true-learning and 35 ± 6.0 for the pseudo-learning conditions. For true learning, the number was relatively constant throughout the learning process (35 for the early stage, 31 for the intermediate stage, and 34 for the advanced stage). Therefore, the eye movement itself would have little effect on the brain activation observed in the present study.
Identification of learning-related areas
We first identified learning-related areas by cross-correlating all of the SI data with the reference function (Fig.3). Among them, four areas were consistently active across the seven subjects, as indicated bypink circles in Figure 3. These areas were located in the medial premotor and lateral prefrontal cortex and in the medial and lateral parietal cortex.
The activation focus in the lateral prefrontal cortex was located in the middle frontal gyrus, which corresponded to Brodmann’s area (BA) 46 and was designated as the DLPFC (Petrides and Pandya, 1994) (Fig. 4). The left side was activated in all of the seven subjects, whereas the right side was activated in five (Table 1).
The medial premotor cortex activation was located above the cingulate sulcus, slightly anterior to the AC with respect to the AC–PC line (Fig. 4) (see Table 1). In the preceding paper, we have determined this area to be the pre-SMA (Hikosaka et al., 1996), which was originally described in monkeys (Tanji, 1994). Consistent with our previous observation, activation of the pre-SMA was observed unilaterally, six subjects in the left hemisphere and one in the right (Table 1).
The activation focus within the medial parietal cortex was located in the precuneus, the medial aspect of BA 7 (Fig. 4). Bilateral activation was observed in all subjects. Although unilateral activation of precuneus is shown in Figure 4, activation on the other side was found in different slices (see Fig. 3). The number of activated pixels was significantly larger for the right hemisphere than for the left (average for all subjects, 17 pixels for the right and 10 for the left;p < 0.01, Student’s paired t test).
The lateral parietal activation was found within the posterior portion of the IPS (Fig. 4). Bilateral activation was observed in all subjects. The number of activated pixels was not significantly different between hemispheres (average for all subjects, 16 pixels for the right and 15 for the left).
The coordinates of these four areas are shown in Table 1. They were similar across the seven subjects and were consistent across repeated experiments for individual subjects.
Other activation foci were found within the cingulate cortex, well anterior to the AC (observed in four subjects; slice 3 in Fig. 3), which corresponds to the rostral cingulate area in BA 32 (Picard and Strick, 1996), and within the anterior wall of the middle part of the precentral sulcus (observed in three subjects; slice 1 in Fig. 3), which corresponds to the frontal eye field (Paus, 1996). Although an activation focus was observed within the anterior part of the insular cortex, most of the activated pixels included the underlying vasculatures. Therefore, the insular activation focus was not analyzed.
Learning-related changes in signal intensity
For all of the foci described above, the mean SI was generally higher in the test epochs than in the control epochs (Fig.5). However, the difference in SI between the test and the control epochs, ΔSI, changed across the eight alternations differently for these areas. Namely, the ΔSI for the left DLPFC (Lt DLPFC) and the pre-SMA was large at the beginning and decreased toward the end, whereas that for the precuneus and the IPS was initially small and then increased in the middle part of the experiment.
We found that the time courses of ΔSI were different for different subjects depending on their learning performance (Fig.6). For the subject with a good performance (Fig. 6, subject 5), the pre-SMA and precuneus activation decreased earlier than did that in subject 2-1. For the subject with a poor performance (Fig. 6, subject 4-2), the left DLPFC and pre-SMA activation stayed high, whereas the precuneus and the IPS remained less activated.
Learning-related changes in number of active pixels
We so far have identified learning-related areas by examining the signal intensity data across the entire learning period. With this method, however, areas might not have been detected that showed signal intensity changes only during a particular learning stage. We therefore classified the eight test epochs into three learning stages (Fig.7, bottom) and created activation maps for each of the three learning stages (Fig. 7).
The results were similar to those obtained by the preceding analysis (compare Fig. 7 with Fig. 3). Activation foci (p< 0.01) were found in the left DLPFC, pre-SMA, precuneus, and IPS in all the subjects, the right DLPFC (Rt DLPFC) in six subjects, and the rostral cingulate area in four subjects. Of interest is the fact that the number of activated pixels changed across the three learning stages. The number of active pixels appears larger in the earlier stages of learning for the left DLPFC (Fig. 7, third row) and for the pre-SMA (first row), whereas it appears larger in the later stages of learning for the precuneus (third row) and for the IPS (second and third rows).
Time course of activation across learning stages
We then examined whether learning-related changes in activation were commonly observed across subjects in terms of the SI change and in terms of the number of activated pixels (Fig.8). To account for the difference in the learning performance between the subjects, we classified the eight test epochs into three learning stages for individual subjects (see Fig. 7; Materials and Methods).
First, the pattern of the mean ΔSI change was similar across subjects but different across the brain areas (Fig. 8 a). The mean ΔSI for the left DLPFC was highest in the early stage of learning. A significant difference (Wilcoxon’s signed rank test; p< 0.05) was observed between the early and intermediate stages and also between the early and advanced stages. The right DLPFC was activated in five subjects, and the mean ΔSI was relatively higher in the early and intermediate stages compared with that for the advanced stage. The mean ΔSI for the pre-SMA was higher in the early or intermediate stage than in the advanced stage. The mean ΔSI for the precuneus was higher in the intermediate stage than in the early or advanced stage. The mean ΔSI for the IPS was higher in the intermediate or advanced stage than in the early stage. The mean ΔSI for the rostral cingulate area was higher in the early and intermediate stages than in the advanced stage.
Second, very similar time courses of activation were found in terms of the number of activated pixels (Fig. 8 b). The statistical comparison using Wilcoxon’s signed rank test showed significant differences that were identical with those observed for the signal intensity data.
Frontal-to-parietal activation during visuomotor sequence learning
We identified four cortical areas that were related to learning of visuomotor sequences: DLPFC, pre-SMA, precuneus, and IPS. The learning-related activation of the prefrontal and parietal areas have been shown in previous studies (Seitz et al., 1990; Grafton et al., 1992; Rauch et al., 1995; Blaxton et al., 1996). However, there have been few studies that asked whether these areas were differentially activated (see, however, Passingham et al., 1997).
We now demonstrate that the time course of activation was different between these areas. The two frontal areas, DLPFC and pre-SMA, became active in the early stage of learning, followed by activation of the two parietal areas, precuneus and IPS, in the intermediate or advanced stages. Recently, Shadmehr and Holcomb (1997) reported a similar prefrontal-to-parietal transition of activation in motor memory consolidation. These data might suggest a similarity between the neural mechanisms for the short-term learning in our paradigm (<10 min) and those for the long-term memory consolidation in the study of Shadmehr and Holcomb (1997) (6 hr).
In the present study, we found that there were some differences between the two frontal areas and the two parietal areas. The left DLPFC was highly active in the early stage of learning, whereas the right DLPFC and the pre-SMA were active in the early and the intermediate stages. The precuneus and the IPS became active in the intermediate stage, but only the IPS remained active in the advanced stage. Thus, a specific set of brain areas was activated in each learning stage.
Possible information processes underlying the transition of learning stages
Learning a new procedure requires attention initially, but the procedure becomes automatic eventually. This can be viewed as a transition from the declarative stage to the procedural stage (Fitts, 1964; Anderson, 1982). It is thought that attentive processes are performed one at a time, whereas automatic processes are performed in parallel (Schneider and Shiffrin, 1977; Kahneman and Treisman, 1984). This is shown experimentally as the difference in the processing time (typically measured as a reaction time). As the number of processed items increases, the processing time increases for the attentive process (Sternberg, 1966; Sternberg et al., 1978) but stays at a lower level for the automatic process (Treisman and Gelade, 1980; Schneider and Fisk, 1982; Fendrich et al., 1991).
In our behavioral experiment in which the subjects were instructed to learn a hyperset as quickly as possible (reaction time measurements), the processing time was represented by the set completion time. In the early stage, it increased monotonically with the number of processed items (sets), suggesting that the underlying processes were declarative (attentive). The set completion time then decreased in the intermediate stage during which the subjects tried to process all 10 items (i.e., sets), although sometimes without success. This suggests that the procedural process had started to work, whereas the declarative process was still required. The set completion time reached its lowest level in the advanced stage during which the subjects were able to process the 10 items perfectly, suggesting that the declarative process gradually gave way to the procedural process. Thus the process of learning involves various cognitive components that change over time. Based on the above consideration, we suggest functional roles for the four cortical areas.
The DLPFC activation was marked in the early stage, when the declarative (attentive) process is thought to dominate. This finding is consistent with the idea that the DLPFC is the central neural substrate for the working-memory process (Goldman-Rakic, 1987; Funahashi et al., 1989; Jonides et al., 1993; Petrides et al., 1993a; McCarthy et al., 1994), in which attentive manipulation of stored information is necessary (Baddley, 1992). A similar decrease in the DLPFC activation with learning has been shown in previous studies (Jenkins et al., 1994;Shadmehr and Holcomb, 1997).
The prolonged activation of the right DLPFC compared with the left is consistent with the idea that the former subserves retrieval processes, whereas the latter subserves encoding processes (Petrides et al., 1993b; Shallice et al., 1994; Tulving et al., 1994; Fletcher et al., 1995a, 1996).
Learning-related activation of the pre-SMA confirmed our previous report (Hikosaka et al., 1996) and is consistent with the results of single unit recordings in monkeys performing the same paradigm (Miyashita et al., 1995); in the initial phase of learning, pre-SMA neurons showed vigorous activity, which decreased dramatically as learning proceeded. Furthermore, the reversible blockade of the pre-SMA by muscimol injections impaired the new learning but not the performance of well-learned sequences (Miyashita et al., 1996b).
A functional distinction between pre-SMA and SMA proper was proposed in experiments using monkeys (Tanji, 1994). Recent positron emission tomography (PET) and fMRI studies suggest that the pre-SMA is also present in humans and is located slightly anterior to the anterior commissure (Picard and Strick, 1996). Unlike the SMA proper, the pre-SMA has been shown to be active in tasks that required selection of actions (Deiber et al., 1991, 1996; Rao et al., 1997), working memory (Paulesu et al., 1993; Buckner et al., 1996), and sequential saccades (Petit et al., 1996).
Our study showed that the pre-SMA, unlike the left DLPFC, remained active in the intermediate stage during which the declarative process was being replaced by the procedural process. This suggests that the pre-SMA receives declarative information from the DLPFC, via corticocortical connections (Dum and Strick, 1991; Matsuzaka et al., 1992; Luppino et al., 1993), and uses it to generate a procedure for sequential button presses.
The precuneus was activated preferentially in the intermediate stage, in which the information on the whole sequence had been acquired in working memory. This suggests that the precuneus may play a role in retrieving the sequence from memory, as suggested by Sadato et al. (1996). Consistent with this idea, the precuneus has been shown to be active in the execution of prelearned saccade sequences (Petit et al., 1996) and in the navigation of familiar routes (Ghaem et al., 1997). Crucial in navigation is the information on visuomotor sequences, as in our 2 × 10 task.
The reduced precuneus activation in the advanced stage suggests that the activation reflects declarative access to the sequence memory. This is consistent with previous reports showing that the precuneus is active in verbal memory tasks (Grasby et al., 1993; Shallice et al., 1994; Buckner et al., 1996) and manipulation of visual imagery (Roland and Gulyás, 1994; Fletcher et al., 1995b, 1996).
Unlike the precuneus, the IPS remained active in the advanced stage beyond the intermediate stage. This suggests that the IPS is more involved in the procedural process.
The IPS is thought to be part of the dorsal visual system that carries spatial information (Ungerleider and Mishkin, 1982). Another aspect of the dorsal visual system is that it is related to “intention” or “action” rather than “perception” (Goodale and Milner, 1992;Mazzoni et al., 1996; Snyder et al., 1997). Neurons in the monkey parietal cortex including the IPS become active during manipulation of particular objects (Taira et al., 1990). Lesions in the parietal cortex are known to induce apraxia, an inability to manipulate common objects (Heilman and Rothi, 1985). However, it is still unknown whether the memories for visuomotor sequences, as implemented in our 2 × 10 task, are also stored in the IPS.
The posterior portion of IPS is related to saccadic eye movements (Andersen and Gnadt, 1989; Kawashima et al., 1996; Petit et al., 1996).Miyashita et al. (1996a) showed that anticipatory eye–hand movements were critical for skilled performance in the 2 × 5 task, which developed later in the long-term learning. The posterior IPS activation would possibly reflect the process for the eye–hand coordination.
Integrated learning system
Studies from our laboratory using monkeys have shown that the subcortical structures also participated in learning and/or memory of visuomotor sequences. Miyachi et al. (1997) showed that the anterior part of the caudate/putamen is related to the learning of new sequences, whereas the midposterior putamen is related to the performance of well-learned sequences. The results were supported by a recent PET study (Jueptner et al., 1997b). In addition, Lu et al. (1996) showed that the dorsal part of the dentate nucleus was important for the storage or retrieval of long-term memories for well-learned sequences. These studies, together with recent imaging studies (Passingham et al., 1997; Shadmehr and Holcomb, 1997), suggest that cerebral cortical (frontal and parietal) areas, basal ganglia, and cerebellum all contribute to visuomotor sequence learning but with different time courses until procedural skills are eventually stored as long-term memories.
This study was supported by the Japan Society for Promotion of Science Research for the Future program and by the Basic Research System Core. We are grateful to H. Imamizu and M. Kawato in the Japan Science and Technology Corporation for their cooperation in the Talairach registration of the functional images.
Correspondence should be addressed to Dr. Okihide Hikosaka, Department of Physiology, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113, Japan.