Timing is essential to the execution of skilled movements, yet our knowledge of the neural systems underlying timekeeping operations is limited. Using whole-brain functional magnetic resonance imaging, subjects were imaged while tapping with their right index finger in synchrony with tones that were separated by constant intervals [Synchronization (S)], followed by tapping without the benefit of an auditory cue [Continuation (C)]. Two control conditions followed in which subjects listened to tones and then made pitch discriminations (D). Both the S and the C conditions produced equivalent activation within the left sensorimotor cortex, the right cerebellum (dorsal dentate nucleus), and the right superior temporal gyrus (STG). Only the C condition produced activation of a medial premotor system, including the caudal supplementary motor area (SMA), the left putamen, and the left ventrolateral thalamus. The C condition also activated a region within the right inferior frontal gyrus (IFG), which is functionally interconnected with auditory cortex. Both control conditions produced bilateral activation of the STG, and the D condition also activated the rostral SMA. These results suggest that the internal generation of precisely timed movements is dependent on three interrelated neural systems, one that is involved in explicit timing (putamen, ventrolateral thalamus, SMA), one that mediates auditory sensory memory (IFG, STG), and another that is involved in sensorimotor processing (dorsal dentate nucleus, sensorimotor cortex).
- functional magnetic resonance imaging
- basal ganglia
- supplementary motor area
The capacity to precisely time events is important for skilled actions, such as playing a musical instrument. Several decades of research have advanced our knowledge of temporal mechanisms, so that now there is broad support for the view that some aspects of time are explicitly represented in the CNS. Studies of paced-finger tapping (PFT) support the existence of a cognitively based, internal timekeeping system that is independent of motor implementation or feedback mechanisms (Wing and Kristofferson, 1973; Ivry and Keele, 1989; Sergent et al., 1993). In PFT, subjects tap their index finger in synchrony with a series of tones separated by a constant interval [Synchronization (S)]. The tones are then discontinued, and the subject continues to tap at the same pace [Continuation (C)]. Timing competency is assessed when the tone is absent, because performance depends entirely on an internal representation of the interval duration.
Some research in patients (Ivry et al., 1988) suggests that timing is controlled by the lateral cerebellum and its primary output nucleus, the dentate. This research, however, did not distinguish between damage to the dorsal and the ventral portions of the dentate, which have different output pathways. The dorsal dentate projects principally to the primary motor and ventral premotor cortices, which are associated with sensorimotor functions, whereas the ventral dentate projects to dorsolateral prefrontal areas, which are associated with higher-level cognitive processing (Strick et al., 1993; Middleton and Strick, 1994;Leiner et al., 1995). Hence, motor timing deficits after cerebellar damage (Ivry et al., 1988; Ivry and Keele, 1989) could be caused by a disruption in sensorimotor mechanisms or cognitive processes, such as timing.
Patients with Parkinson’s disease also demonstrate abnormal timing on the PFT task (Pastor et al., 1992; O’Boyle et al., 1996) (D. L. Harrington, K. Y. Haaland, N. Hermanowicz, unpublished observations). Pathological changes in Parkinson’s disease include a loss of nigral dopaminergic neurons projecting to the dorsal putamen (Brooks et al., 1990), the major output of which is to the supplementary motor area (SMA) (Alexander et al., 1986). In fact, patients with SMA lesions are impaired in the reproduction of rhythms in the absence of an auditory cue (Halsband et al., 1993).
Patient studies, therefore, suggest that timing may be mediated by the lateral cerebellum, the putamen, and/or the SMA. To examine this issue directly, we conducted whole-brain functional magnetic resonance imaging (FMRI) on healthy volunteers while they performed the S and C conditions of the PFT task. We predicted that the neural systems specific to controlling timing should show greater activation in the C than in the S condition, because the C condition makes greater demands on an internal timekeeping system. In contrast, performance in the S condition is based largely on the perception of the synchronization error and afferent delays from stimulus events (Kolers and Brewster, 1985; Mates, 1994), although some temporal processing presumably occurs when predictable stimuli are tracked.
A listening task (L), in which subjects passively attended to tones, and a pitch discrimination (D) task controlled for the auditory sensory processing in the S condition. In addition, we predicted that the D task would elicit activation patterns specific to the processing of frequency information and therefore distinct from the activation patterns of the PFT tasks.
MATERIALS AND METHODS
Subjects. Thirteen healthy volunteers (ten females and three males; mean age 23.2 years, range 18–31 years) were studied. All were strongly right-handed on the Edinburgh Handedness Inventory (mean laterality quotient = 88.8; range, 64–100) (Oldfield, 1971). Potential subjects were excluded if they had a history of neurological disease, a major psychiatric disturbance, or substance abuse, or if they were taking psychoactive prescription medications. Informed consent was obtained from subjects according to institutional guidelines established by the Medical College of Wisconsin Human Subjects Review Committee.
Activation conditions. Subjects performed a series of four consecutive activation conditions consisting of two experimental (S, C) and two control tasks (L, D), which were preceded and followed by a rest period. In the S condition, subjects made right index finger key presses in time with a series of tones separated by a constant interval of either 300 or 600 msec, target intervals close to those that have been studied in patients. Two pacing intervals were studied to assess the reliability of the FMRI findings across different temporal intervals. The auditory stimulus consisted of trains of 50 msec, 380 Hz pure tones presented binaurally at precise intervals using a computer playback system. Sounds were amplified near the scanner using a magnetically shielded transducer system and were delivered to the subject via air conduction through 180 cm paired plastic tubes. The tubes were threaded through tightly occlusive ear inserts that attenuated background scanner noise to ∼75 dB sound pressure level (SPL). Background scanner noise consisted of pulses occurring every 205 msec; this pulse was constant throughout the imaging run. Intensity of the tone stimuli averaged 100 dB SPL.
The C condition immediately followed the S condition. Subjects were instructed to maintain the same tapping rate as in the S condition (300 or 600 msec pacing intervals) but without benefit of the pacing tone.
The C condition was immediately followed by the L condition, wherein subjects passively attended to the same pacing tone as presented in the S condition (tone intervals separated by 300 or 600 msec) but were instructed not to tap their finger. The L condition controlled for the auditory sensory perception processes in the S condition.
After the L condition, subjects performed the D condition, in which they listened to a series of tone pairs separated by 300 or 600 msec and pressed a key with their right index finger whenever a transition in pitch occurred. Auditory stimuli consisted of 12 pairs of 40 msec tones (220 or 380 Hz pure tones). A tone pair was presented every 1.5 sec. Half of the tone pairs were of the same pitch (220–220 or 380–380 Hz), and the remaining pairs were of a different pitch (220–380 or 380–220 Hz), with the order of the pairs randomized. This task controlled for higher level auditory processing of time-independent information.
FMRI. Whole-brain FMRI, a technique for detecting regional changes in blood oxygenation associated with increased neural activity (Ogawa et al., 1990; Ogawa and Lee, 1991), was conducted on a commercial 1.5 Tesla scanner (Signa, General Electric Medical Systems, Milwaukee, WI) equipped with a prototype 30.5 cm inner diameter, three-axis local gradient head coil and an elliptical endcapped quadrature radiofrequency coil (Wong et al., 1992a,b). Echo-planar images were collected using a single-shot, blipped, gradient-echo echo-planar pulse sequence [echo time (TE) = 40 msec; field of view (FOV) = 24 cm; matrix size = 64 × 64] (Bandettini et al., 1992). Twenty-two contiguous sagittal 6-mm-thick slices were selected to provide coverage of the entire brain (voxel size: 3.75 × 3.75 × 6 mm). Before FMRI, high resolution, three-dimensional (3-D) spoiled gradient-recalled at steady-state (SPGR) anatomic images were collected [TE = 5 msec; repetition time = 24 msec; 40° flip angle; number of excitations = 1; slice thickness = 1.2 or 1.3 mm; FOV = 24 cm; resolution = 256 × 192]. Foam padding was used to limit head motion within the coil. A nonferrous key press device made from force-sensing resistors was used to record response times and accuracy.
Subjects underwent six functional imaging series, three each at the 300 and 600 msec pacing intervals, in an alternating sequence, the order of which was counterbalanced across subjects. During each imaging series, 104 sequential echo-planar images were collected with an interscan interval (TR) of 4.5 sec (total scanning duration = 7 min, 45 sec). A series consisted of five cycles of rest and activation, with each cycle beginning and ending with an 18 sec rest period. The activation period consisted of the four consecutive 18 sec epochs, during which subjects performed the S, C, L, and D conditions in a fixed order. Subjects were presented visual word cues to inform them of the current condition (“TAP,” “CONTINUE,” “LISTEN,” “PITCH,” and “REST” for the S, C, L, D, and rest epochs, respectively). Words were computer-generated and rear-projected onto the center of an opaque screen located at the subject’s feet (viewing distance = 200 cm). Subjects viewed the screen in a darkened room though prism glasses and corrective lenses, if necessary. Subjects briefly practiced the four conditions before scanning.
Image processing and statistical analysis. Minor anatomic distortions in the EP images attributable to local field inhomogeneities were corrected using a field map generated by increasing the TE by 1 msec on the last two images of the time series (Jezzard and Balaban, 1995). Each image time series was spatially registered in-plane to reduce the effects of head motion, using an iterative linear least squares method (Keren et al., 1988). Linear drift in each 104-image time series was removed using a regression analysis (Bandettini et al., 1993). Specifically, a line is fit through each voxel time series. The slope and intercept parameters are subtracted from the raw voxel data at each corresponding point in time.
Functional images were created by generating statistical parametric maps (SPMs) of t deviates reflecting differences between the condition and the rest states at each voxel location for each subject. Specifically, t tests were conducted at each voxel to measure changes in signal intensity between each of the four activation conditions and a local baseline (rest). The first two images (9 sec) in each of the four activation conditions and the two rest periods were discarded from analysis because of the rise and fall time of the hemodynamic response (Bandettini et al., 1992). The first stage of the analysis involved averaging the final two images in each of the four activation condition epochs. Next, the final two images of the rest periods preceding and following each condition epoch (four images in all) were averaged. A difference image was created for each of the four conditions by subtracting the average rest image from the corresponding average activation condition image. Each activation condition was compared with the neutral, rest image so that all areas involved in each task could be localized, guarding against errors associated with incorrect assumptions about the nature of processes underlying performance in each task (Sanders, 1980; Parsons et al., 1995; Shulman, 1996). In all, 15 difference images (five cycles/image series × three image series/session) were generated per subject for each of the eight experimental conditions (four activation conditions × two pacing intervals). Finally, these mean difference values were compared on a voxel-by-voxel basis against a hypothetical mean of zero using pooled-variance Student’s t tests.
Individual SPGR anatomical scans and SPMs were linearly interpolated to volumes with 1 mm3 voxels, co-registered, and transformed into standard stereotaxic space (Talairach and Tournoux, 1988) using the “MCW-AFNI” software package (Cox, 1996). To compensate for normal variation in anatomy across subjects (Thompson et al., 1996), the stereotaxically resampled 3-D SPMs were spatially averaged at each point over a sphere of radius of 4 mm. The SPMs for each condition were averaged across the 13 subjects on a voxel-by-voxel basis. Thus each voxel in the resulting averaged SPM contains an averaged t statistic. The procedure of averaging statistics was chosen to guard against nonequal MR signal variances between subjects. A threshold was then applied to the averaged t statistics to identify voxels in which the mean change in MR signal between rest and activation conditions was unlikely to be zero. The average of a set of t deviates is not a tabulated distribution. Therefore, the Cornish–Fisher expansion of the inverse distribution of a sum of random deviates (Fisher and Cornish, 1960) was used to select a threshold (t = 1.96; p < 10−8) for rejection of the null hypothesis. This threshold effectively eliminates false–positive voxels from the functional maps.
Individual 3-D SPGR data from the 13 subjects were merged to produce an “average brain” for anatomical reference. To examine the consistency between the individual and group averaged functional maps, we identified the number of subjects demonstrating significantly (t ≥ 1.96) increased changes within the individual functional maps for each significant activation foci identified by the group functional maps.
Figure 1 displays the reaction time findings from the S and C conditions of the PFT task. Inter-response intervals (IRIs) that exceeded 50% of the target interval duration were excluded from the reaction time data. This occurred on 5% of the trials and often was caused by the failure of subjects to fully depress the response key. The results demonstrated that the subjects were able to reproduce the timing intervals with a high degree of accuracy in both the S and C conditions (Fig. 1 A). There was a small but significant increase in the duration of the mean IRI between the S and C conditions [F (1,12) = 6.56; p< 0.05]. As expected, total variability (Fig. 1 B), which is the SD of the IRI, also was significantly greater in the C than in the S condition [F (1,12) = 28.59;p < 0.001]. Consistent with previous findings (Wing, 1980), variability was greater [F (1,12) = 38.5;p < 0.001] for the longer (600 msec) pacing interval. The two-way interactions (condition X pacing interval) were not significant (p > 0.10) for either mean IRI or total variability.
Subjects were highly accurate in discriminating changes in pitch in the D condition. The mean rate of accuracy was 94.5 and 96.5% for the 300 and 600 msec intervals, respectively (p > 0.10).
Functional imaging findings
Table 1 shows the center of mass, volume, and peak intensity (maximum t) of the activation foci, as well as the number of subjects demonstrating significantly activated tissue within each foci. For all four activation conditions, the anatomical location, magnitude, size, and consistency of the foci were nearly identical for the two pacing intervals. Two conclusions may be drawn from this observation. First, the pacing interval has a negligible effect on patterns of functional brain activity, at least for the intervals used in this study. Second, the functional images were highly reproducible, because the two sets of images generated for each pacing interval were derived from separate imaging series.
Both the S and C conditions produced two large areas of activation within the left sensorimotor cortex and the right cerebellum, consistent with finger movements involving the right hand (Fig.2, Table 1). Activation in these two regions was observed in 85–100% of subjects. Importantly, the center of mass, volume, and intensity of activation within the right cerebellum were nearly identical for the S and C conditions (Table 1). The solitary cerebellar site that was activated in both of these conditions was located in the vicinity of the dorsal dentate nucleus. Neither the S nor the C condition produced activation in the ventral dentate nucleus or the dorsolateral prefrontal areas.
There was no activation within the sensorimotor cortex and the cerebellum in the D condition. This was not surprising, because finger tapping rates below 1 Hz result in MR signal intensity changes that are difficult to distinguish from background noise (Rao et al., 1996). The D condition had an average tapping rate of 0.33 Hz (6 taps in 18 sec); in contrast, tapping rates for the 300 and 600 msec pacing intervals in the S and C conditions were 3.33 and 1.67 Hz, respectively.
The C condition, but not the S condition, resulted in additional activation of the medial “premotor” loop (Alexander et al., 1986), consisting of the SMA, the left caudal putamen, and the left ventrolateral thalamus (Fig. 2, Table 1). The frequency of activation in the SMA was 85–92% of subjects; subcortical activation (putamen, ventrolateral thalamus) occurred in 54–77% of subjects.
Increased MR signal intensity was found near the primary auditory cortex in all four conditions, with frequency of activation ranging from 62 to 92% of subjects. For the S and C conditions, activation occurred within the right superior temporal gyrus (STG) (Fig.3 A, Table 1). In addition, the right inferior frontal gyrus (IFG) was activated solely by the C condition in 62–69% of subjects (Figs. 2, 3 A, Table 1). STG activation was predominantly bilateral in the L and D conditions, without activation of the IFG (Table 1). No activation was observed in the left STG for the L condition at the slower stimulus rate (600 msec interval). We have demonstrated previously that magnitude of activation within the STG is a function of stimulus rate (Binder et al., 1994) and task demands, with passive listening producing less activation than conditions requiring a sensory discrimination (Binder et al., 1996).
SMA activation was also observed for the D condition. Although there is some overlap in spatial extent, the activation foci for the D condition was located rostral to the foci for the C condition (Fig.3 B). For the 300 and 600 msec intervals, the differences in the center of mass between the C and D conditions were 1.1 and 1.6 cm, respectively, along the y axis (Table 1).
Internal timing of movements
The principal findings from this study involved the comparison between the functional images derived from the S and C conditions. The C condition, but not the S condition, resulted in activation of the SMA, the left caudal putamen, and the left ventrolateral thalamus. These findings were specific to a condition in which performance depended entirely on an internal representation of time, suggesting that the medial premotor pathway plays a critical role in the explicit timing of movements. This conclusion is consistent with the motor timing deficits that have been reported in Parkinson’s disease (Pastor et al., 1992; O’Boyle et al., 1996) and in patients with SMA lesions (Halsband et al., 1993). In addition, recordings of cortical DC potentials in humans also suggest that the SMA is crucial for precise timing (Lang et al., 1990). Although our results are compatible with the view that the SMA is involved in the internal rather than the external guidance of movements, this dichotomy is imprecise and controversial (Tanji, 1994). The present findings argue not only for a more specific functional role for the SMA, but demonstrate further that the SMA is just one component of a system, the medial premotor loop (Alexander et al., 1986), which appears essential for the timing of internally generated movements.
Consistent with our predictions, judgments of pitch (D task) were correlated with activation of a neural system that seems to be functionally and neuroanatomically distinct from the system underlying internal timing operations. SMA activation was observed for the D task, but its center of mass was more rostral to that of the SMA focus in the C condition (Fig. 3 B). The SMA has been subdivided into two distinct regions (Picard and Strick, 1996): the pre-SMA, located anterior to the vertical line through the anterior commissure, and the SMA proper, located caudal to this line. In monkeys, the SMA proper projects directly to the primary motor cortex and the spinal cord, and the pre-SMA projects to the prefrontal cortex and other nonprimary motor cortical areas (Picard and Strick, 1996). This neuroanatomical differentiation is consistent with the finding that intracortical stimulation of the SMA proper evokes specific movements that follow a somatotopic organization, whereas pre-SMA stimulation typically does not evoke movements. Moreover, human functional imaging studies have suggested that the pre-SMA region is more frequently activated during “complex” tasks requiring response selection, such as in the go–no-go contingency of our D condition (Picard and Strick, 1996), whereas the SMA proper is purportedly more involved in “elementary” aspects of motor control. More investigations examining this issue are clearly needed, because current hypotheses regarding the functional differences between the SMA proper and pre-SMA are speculative.
Rehearsal of internal auditory representations
Increased MR signal intensity was observed within the right STG during the S and C conditions. Although the C condition did not involve an auditory stimulus, internal rehearsal of the tone interval duration, or auditory imagery, is likely to be used during performance in this condition. Importantly, auditory imagery and perception seem to share similar neural systems within the auditory cortex (Zatorre et al., 1996).
In addition, the right IFG was activated solely by the C condition. It has been suggested (Zatorre et al., 1996) that the right STG and the right IFG form a network specifically associated with the retrieval and rehearsal of auditory information, particularly in the absence of external stimulation (i.e., C condition). In contrast, STG activation was predominantly bilateral in the L and D tasks, and there was no activation of the IFG. The absence of significant IFG activation in these tasks may be explained by the relatively minimal demands on retrieval or rehearsal mechanisms, because auditory processing either was passive (L task) or performed relatively soon after the presentation of a tone pair (D task).
Activation of this auditory network during the performance of internally timed movements parallels findings from a study in which the silent rehearsal of letter strings produced bilateral activation of the STG and IFG (Paulesu et al., 1993). This suggested to the authors that the articulatory loop of working memory includes a subvocal rehearsal system. This interpretation suggests the possibility that in our study, an internal, nonlinguistic auditory representation of the target interval duration was sustained to guide the timing of sequential movements, just as a tone does in the S condition.
Sensorimotor control of paced finger tapping
Both PFT tasks produced two large areas of activation in the left sensorimotor cortex and the right cerebellum, within the vicinity of the dorsal dentate nucleus. These areas form a circuit (Strick et al., 1993; Middleton and Strick, 1994), which likely supports sensorimotor functions involved in the performance of both the S and C conditions. This proposal suggests that motor timing impairments in patients with cerebellar damage (Ivry et al., 1988; Ivry and Keele, 1989) may be secondary to deficits in sensorimotor processing that interact with internal timekeeping operations.
Interestingly, no activation was found in the ventral portion of the dentate nucleus, which projects primarily to dorsolateral prefrontal areas (Middleton and Strick, 1994), which also were not activated in either of the PFT tasks. The dorsolateral prefrontal areas have been associated with “higher-level” cognitive functions, including working memory. This indicates that PFT does not significantly draw on these processes, regardless of whether an auditory pacing cue is available.
In summary, our findings indicated that the performance of precisely timed movements is dependent on three interrelated neural systems, each of which supports a unique function. The medial premotor system seems to be responsible for the explicit timing of movements. This system was activated for both time intervals of the C condition, which suggests that a single neural system regulates the explicit motor timing of the intervals sampled in this study. Our findings do not rule out the possibility that timekeeping operations may be distributed across other neural systems, for intervals outside of the narrow range studied here. This is especially true for intervals spanning durations of >1–2 sec, wherein attentional biases and contextual variables increasingly contribute to marking the passage of time.
Internal timing also is performed in association with the retrieval and rehearsal of internal nonlinguistic auditory representations of time intervals. The right STG and the right IFG form a system, which seems to support this process. This finding suggests an alternative interpretation for interference effects during PFT (C condition) when subjects simultaneously performed an anagram solution task, which involves linguistic and nonlinguistic processing (Sergent et al., 1993). The authors attributed the increased IRI variability in PFT during dual-task performance to a disruption in the timing mechanism. Our findings raise the possibility that the interference could be attributable instead to a disruption in subvocal, nonlinguistic rehearsal processes.
Finally, the dorsal dentate nucleus and the sensorimotor cortex form a circuit that seems to be principally responsible for processing the sensorimotor aspects of PFT (Leiner et al., 1995). One possibility is that the cerebellum is involved in coordinating external (S condition) and internal (C condition) stimulus events with output from the motor system. This is consistent with the view that the cerebellum serves as an integrator of multisensory information from the cerebral cortex into a motor frame of reference (Bloedel, 1992), essential for the coordination of movement.
This research was funded by grants from the National Institute of Mental Health (P01-MH-51358), National Institute of Neurological Disorders and Stroke (R01-NS-33576), National Institute of Drug Abuse (R01-DA-09465), National Multiple Sclerosis Society (RG2605-A-4), and Department of Veterans Affairs. We thank J. Frost, S. Fuller, J. Kummer, T. Prieto, and L. Stapp for technical assistance, and J. Cunningham, E. DeYoe, T. Hammeke, J. Hyde, A. Rosen, E. Stein, P. Strick, and S. Woodley for helpful comments.
Correspondence should be addressed to Dr. Stephen M. Rao, Section of Neuropsychology, Medical College of Wisconsin, 9200 W. Wisconsin Avenue, Milwaukee, WI 53226.