Abstract
The neural underpinnings of rhythmic behavior, including music and dance, have been studied using the synchronization-continuation task (SCT), where subjects initially tap in synchrony with an isochronous metronome and then keep tapping at a similar rate via an internal beat mechanism. Here, we provide behavioral and neural evidence that supports a resetting drift-diffusion model (DDM) during SCT. Behaviorally, we show the model replicates the linear relation between the mean and standard-deviation of the intervals produced by monkeys in SCT. We then show that neural populations in the medial premotor cortex (MPC) contain an accurate trial-by-trial representation of elapsed-time between taps. Interestingly, the autocorrelation structure of the elapsed-time representation is consistent with a DDM. These results indicate that MPC has an orderly representation of time with features characteristic of concatenated DDMs and that this population signal can be used to orchestrate the rhythmic structure of the internally timed elements of SCT.
SIGNIFICANCE STATEMENT The present study used behavioral data, ensemble recordings from medial premotor cortex (MPC) in macaque monkeys, and computational modeling, to establish evidence in favor of a class of drift-diffusion models of rhythmic timing during a synchronization-continuation tapping task (SCT). The linear relation between the mean and standard-deviation of the intervals produced by monkeys in SCT is replicated by the model. Populations of MPC cells faithfully represent the elapsed time between taps, and there is significant trial-by-trial relation between decoded times and the timing behavior of the monkeys. Notably, the neural decoding properties, including its autocorrelation structure are consistent with a set of drift-diffusion models that are arranged sequentially and that are resetting in each SCT tap.
Introduction
The ability to extract the regular pulse or beat in music and to respond in synchrony to the pulse is called beat synchronization and is a natural human behavior exhibited during musical ensemble coordination and dancing (Large and Palmer, 2002; Merchant et al., 2015a). Human subjects can also keep an internal beat without external cues, which is a skill commonly used in musical soloists (Wing, 2002; Zarco et al., 2009). These abilities are a form of interval timing on time scales of hundreds of milliseconds to a few seconds. Arguably, human beat synchronization has evolved gradually across the primate order (Merchant and Honing, 2013; Mendoza and Merchant, 2014; Patel, 2014).
We have been studying the neural correlates of these processes using the synchronization-continuation task (SCT). In the SCT, subjects first respond in synchrony with a visual or auditory metronome and then continue, in the absence of the metronome, to produce the same interval. Functional imaging studies during this task have shown consistent activation in a set of structures, including the supplementary motor and presupplementary motor areas [that constitute the medial premotor cortex (MPC)], the putamen, the globus pallidus, and the motor thalamus (Grahn and Brett, 2007; Kung et al., 2013; Merchant et al., 2015a). The MPC, however, is crucially involved in maintaining an internal representation of beat intervals, because this area shows a larger activation during the continuation than the synchronization phase (Rao et al., 1997; Lewis et al., 2004), and patients with SMA lesions show a selective deficit in performing the continuation but not the synchronization phase (Halsband et al., 1993). Neurophysiological studies in primates have shown diverse encoding of elapsed time, including ramping activity during the SCT (Merchant et al., 2011). In addition, MPC cells are tuned to specific durations and serial order elements of the task (Merchant et al., 2013a), and small populations of tuned cells are activated in rapid succession within (Crowe et al., 2014) and across (Merchant et al., 2015b) each produced interval in the sequence of sensory-cued and internally driven rhythmic movements.
Although these results suggest an organized representation of temporal intervals in the MPC, the computational mechanism that gives rise to this representation has not been examined. There is, however, a large and well developed literature on computational models of interval timing (Schöner, 2002; Hass and Durstewitz, 2016). Several of these models are biologically plausible and can be used to make predictions about the neural representation of elapsed time. Here, we implement a previously developed drift-diffusion model (DDM; Simen et al., 2011, 2016), and examine detailed behavioral and neural correlates of this model. We find substantial evidence in the rhythmic tapping behavior of the monkeys and in the MPC population code that are consistent with the predictions of the DDM.
Materials and Methods
Subjects.
All the animal care, housing, and experimental procedures were approved by the National University of Mexico Institutional Animal Care and Use Committee and strictly conformed to the principles outlined in the Guide for Care and Use of Laboratory Animals (NIH, publication number 85-23, revised 1985). The two monkeys (Macaca mulatta; both males, 5–7 kg BW) were monitored daily by the researchers and the animal care staff, and every second day from the veterinarian, to check the conditions of health and welfare. To ameliorate their condition of life we routinely introduced in the home cage (1.3 m3) environment toys (often containing items of food that they liked) to promote their exploratory behavior. The researcher that tested the animals spent half an hour per day interacting with the monkeys directly, giving for example new objects to manipulate. All surgery was performed under Sevoflurane (1–2%) gas anesthesia, and every effort was made to minimize suffering.
Synchronization-continuation task.
The SCT has been described in detail previously (Zarco et al., 2009; Merchant et al., 2011). On each trial, the monkey tapped a button seven times in succession (producing 6 intervals) with the goal of maintaining a constant inter-tap interval duration across all taps. In this paper, we refer to the time period between taps as “intervals” and to the amount of time between taps as “duration”. The first four taps were made synchronously with a repetitive cue stimulus (either a visual stimulus presented on a computer monitor or an auditory tone on speakers in front of the animal). The monkey, then, had to tap the button three more times with the same inter-tap duration as instructed by the cues without the assistance of the sensory metronome (Fig. 1A). Five different instructed durations were used: 450, 550, 650, 850, and 1000 ms. During the recording of each group of cells (one “set”), the monkey performed five repetitions of each duration, for a total of 25 trials, with durations randomly ordered within each repetition. Trials were separated by an intertrial time between 1.2 and 4 s.
A drift-diffusion model explains the rhythmic timing behavior of the monkeys. A, SCT. Monkeys were required to push a button (Responses, black line) each time stimuli with a constant interstimulus interval (stimuli, gray line) were presented, which resulted in a stimulus–movement cycle. After four consecutive synchronized movements, the stimuli stopped, and the monkeys continued tapping with a similar interval for three additional intervals. The instructed intervals, defined by brief auditory or visual stimuli, were 450, 550, 650, 850, and 1000 ms, and were chosen pseudorandomly within a repetition. B, Constant error (produced-instructed interval) of the animals during synchronization (blue) and continuation (red) phases of SCT as a function of the instructed duration. Monkeys slightly underestimated the interval durations during the continuation. C, The temporal variability of the monkeys increased as a function of instructed interval during both phases of SCT, with a larger slope in the continuation condition. D, Example diffusion trajectories for two different durations (450 and 850 ms) and on top the resulting distributions of produced intervals. E, Relationship between mean and SD of the produced interval of a DDM (red line) that is compared with the monkey's behavior during the continuation phase of the SCT (black line).
Serial reaction time task.
This task was used as a control to determine the neural responses associated with sensorimotor and sequential behaviors (Fig. 2). Monkeys were required to push a button each time a stimulus was presented, but in this case the interstimulus interval was random (600–1400 ms), precluding the explicit temporalization of motor responses. Monkeys received a reward if the response time to each of five stimuli was within a 200–1000 ms window. The intertrial interval was also variable (1.2–4 s). Brief auditory (33 ms, 500 Hz, 65 dB) or visual (4 cm side red square, 33 ms) stimuli were used, and 10 repetitions were collected.
SRTT. Monkeys required to push a button each time a stimulus was presented, but in this case the interstimulus interval was random (600–1400 ms), precluding the explicit temporalization of motor responses. Five stimulus–response cycles were collected in a sequence. Monkeys received a reward if the response time to each of five stimuli was within a 200–1000 ms window.
Neural recordings.
The extracellular activity of single neurons in the medial premotor areas was recorded using a system with seven independently movable microelectrodes (1–3 MΩ, Uwe Thomas Recording). All the isolated neurons were recorded regardless of their activity during the task, and the recording sites changed from session to session. At each site, raw extracellular membrane potentials were sampled at 40 kHz. Single-unit activity was extracted from these records using the Plexon off-line sorter. Structural magnetic resonance imaging (MRI) was used to localize the recording sites (2). The recording positions based on MRI reconstructions were reported by Merchant et al. (2015b, their Fig. 3A). A total of 253 recording sessions (174 monkey1 and 79 monkey2) were performed during task performance. Of these, 189 had five or more cells simultaneously recorded and were analyzed further. It is important to clarify that a partial set of these neural recordings were analyzed in previous studies using different analytical tools (Merchant et al., 2011, 2013b).
Decoding elapsed time within an inter-tap interval.
We used a pattern classification analysis (“classify” function in the MATLAB Statistical Toolbox; Mathworks) to decode the elapsed time from the previous to the next tap for each produced interval (Crowe et al., 2014). First, we divided each produced interval in 50 ms bins (Fig. 3). This meant that if the monkey produced a perfect instructed 450 interval, the total number of bins for this interval would be 9 (450/50 = 9), the 550 ms instructed interval would be 11 bins, and so on. Because monkeys actually produced a range of intervals, the number of time bins in any particular produced interval was usually more or less than if a perfect interval had been produced (Fig. 3A). We then calculated the discharge rate for the cells recorded simultaneously in a session within each 50 ms bin of a produced interval, keeping this information independently for each of the six serially produced intervals (3 for synchronization and 3 for the continuation phase), and each of the five instructed intervals. We used the pattern of discharge rates of the cells during a specific produced interval to decode the bin-by-bin sequence of durations, as follows. First, we organized the neural data for each bin and trial into observations. Thus, the total number of observations corresponded to the total number of bins defined by the duration of the produced intervals by the monkey during the five trial repetitions. For example, for a perfect execution across trials for the interval of 550 ms, the total number of observations correspond to 55 (11 bins × 5 repetitions; Fig. 3B). Using these observations, we determined the degree to which neural activity represented the passage of time by classifying each observation as one of the n time bins produced by the monkey, based on the pattern of firing rates across the population of cells. In this analysis, 4/5 of the observations were used as training data to obtain an average pattern of activity across the neural population for each time bin. The activity pattern recorded on each of the remaining 1/5 of observations were used as the testing observations, and were compared with the average patterns. The result of this comparison was the classification of each testing observation as the time bin with the closest matching average pattern. This process was repeated four more times, namely, all the bins across repetitions were used as testing observations (Fig. 3B). Consequently, all the classified bins were kept as decoded durations associated to a true bin in a produced interval.
Illustration of the decoding analysis. A, Each trial consisted of six produced inter-tap intervals (S1–C3). Monkeys performed five trials in each of five instructed durations. To decode elapsed time during the inter-tap interval, we divided each interval into 50 ms bins. This meant that if the monkey produced a perfect instructed 550 interval, the total number of bins for this interval would be 11 (550/50 = 11). We then calculated the discharge rate for the cells recorded simultaneously in a session within each 50 ms bin of a produced interval, keeping this information independently for each of the six serially produced intervals, and each of the five instructed intervals. Thus, we organized the neural data for each bin and trial into observations. The total number of observations corresponded to the total number of bins defined by the duration of the produced intervals by the monkey during the five trial repetitions. B, For a perfect execution across trials for the interval of 550 ms, the total number of observations correspond to 55 (11 bins × 5 repetitions). We determined the degree to which neural activity represented the passage of time by classifying each observation as one of the n time bins produced by the monkey, based on the pattern of firing rates across the population of cells. In this analysis, 4/5 of the observations were used as training data to obtain an average pattern of activity across the neural population for each time bin. The activity pattern recorded on each of the remaining 1/5 of observations were used as the testing observations, and were compared with the average patterns. The result of this comparison was the classification of each testing observation as the time bin with the closest matching average pattern. This process was repeated four more times, namely, the bins of all the repetitions were used as testing observations. Consequently, all the classified bins were kept as decoded durations associated to a true bin in a produced interval.
Bimodal model of decoded times.
As shown in the results, we observed that the decoded times in the shortest and longest bin durations showed a bimodal behavior. Initially, for each bin we performed the Hartigan's dip statistic (HDS) to distinguish whether the decoded times were uni or bimodal (Hartigan and Hartigan, 1985). For the bins with a significant HDS (values <0.05), the decoding was considered bimodal and the next step was to apply the following mixture bimodal model to the decoded times:
where μ1 and μ2, and σ1 and σ2 are the means and SDs of the two modes, respectively, and λ is the mixing coefficient. When the parameter λ = 0 or λ = 1 the model is reduced to a single Gaussian model, whereas λ values close to 1 indicate that the first mode (μ1 and σ1) has a larger weight in the distribution, and λ values close to 0 indicate that the second mode (μ2 and σ2) has a large weight in the distribution. The parameters of the model were obtained using the genetic algorithm in the MATLAB function ga.m. In contrast, when the HDS was not significant a unimodal model was fitted to the decoded times.
Diffusion model.
We used a stochastic accumulator or DDM to account for the mean-variance relationship in the produces intervals. We assumed that the time estimation was consistent with a stochastic differential equation given by the following:
We further assumed that when the integrated particle position, X, hits a threshold, a, a response is executed. Longer intervals can be produced by smaller drift rates, v, and shorter intervals by larger drift rates. Various forms of this model have been developed in detail by several groups, so we will only give the relevant results here (Simen et al., 2011, 2016). If the DDM is assumed to arise through the integration of an excitatory and inhibitory Poisson process, referred to as an opponent Poisson process, the drift rate is given by the following:
where, λe is the excitatory rate and λi is the inhibitory rate. We can further introduce a parameter γ that controls the ratio of excitatory to inhibitory input, λi = γ λe. The diffusion rate is then:
If we also define m =
, the first three central moments of the reaction times (first passage times) are as follows:
This gives a skew to coefficient of variation ratio of 3. The distribution of reaction times for this process is an inverse Gaussian:
For this DDM, the autocorrelation function for the particle X is given by (Papoulis, 1991):
When this was fit to the autocorrelation function of the neural representation of the elapsed time we performed a linear regression given by the following:
where Rn is the neural autocorrelation function and Rx is the theoretical autocorrelation function.
The DMM model was fit to the behavioral data by floating the threshold, a and the ratio of excitatory to inhibitory input, γ. The drift rate was then defined by the relationship between the mean (given by the instructed interval) and the threshold and drift rate. The model, therefore, had two parameters. The free parameters were fit by minimizing the sum-of-squared error between the measured SD of the produced intervals by the monkeys and the model's predicted SD.
Results
We trained two monkeys in a version of the SCT where they produced, by tapping on a push-button, three intervals in the synchronization phase, during which they were responding to a sensory stimulus, followed by three internally generated intervals in the continuation phase (Fig. 1A). The monkeys were able to accurately produce the instructed intervals (from 450 to 1000 ms), showing a constant error close to zero during synchronization, and showing a small overestimation for the shortest interval and an underestimation of <50 ms for the larger instructed intervals during the continuation phase (Fig. 1B). Nevertheless, we performed an ANOVA with the constant error as dependent variable and the epoch and instructed interval as factors, and the results showed no significant effects. In addition, the temporal variability (inter-tap SD) increased approximately linearly as a function of duration, with a larger slope in the continuation condition (Fig. 1C). An ANOVA using the temporal variability as dependent variable showed significant main effects for epoch(F(20, 1) = 20.8, p < 0.0001) and instructed interval (F(20, 4) = 4.6, p = 0.009), and a marginal effect for the epoch × interval interaction (F(20, 4) = 2.4, p = 0.08). The proportional relationship between the mean and the SD of produced intervals is known as the scalar property.
A diffusion model of the elapsed time computation
DDMs can be used to account for a large class of behavioral phenomena where there is a relationship between response time and response variability, including interval-timing behavior (Schöner, 2002; Simen et al., 2011, 2016; Ratcliff, 1978; Roitman and Shadlen, 2002; Wagenmakers et al., 2005). In a DDM, a series of samples drawn from a (usually) stationary distribution are integrated across time leading to a random walk trajectory (Fig. 1D). The mean of the distribution from which samples are drawn is the drift rate, and the variance of the distribution is the diffusion rate. The current value of the integrator is often thought of as the location of a particle in space. When the particle crosses a fixed boundary a behavior is executed. The time to reach the boundary is known as the first passage time. When drift rates are high the particle reaches the boundary quickly, and therefore an action is produced quickly, whereas when drift rates are low the particle reaches the boundary more slowly. Changes in the drift rate can, therefore, be used to account for changes in the duration of produced intervals (Fig. 1D).
Appropriately parameterized DDMs, that accumulate a difference of Poisson counts (see Materials and Methods), reproduce the scalar property of interval timing (Simen et al., 2011). We found that the mean-SD relationship seen in the tapping behavior of monkeys could be well approximated by the DDM (Simen et al., 2011, 2016; Fig. 1E). In addition, the DDM was able to approximately replicate the distribution of intervals produced by the monkeys across the five instructed durations (Fig. 4). Further, the ratio of the skew (i.e., third central moment) to the coefficient of variation of the produced interval distributions were between 2 and 3 for the longer intervals (Fig. 4C). This value is close to the ratio of 3 predicted by the first passage time distribution of the model Simen et al. (2011, 2016). Therefore, the DDM reproduced the mean–SD relationship seen in many timing studies (Gibbon et al., 1997; Merchant et al., 2008), as well as additional higher order statistics of the response time distributions. However, there is limited evidence that such a model describes the computational processes, at a neural level, that drives rhythmic behavior. We therefore examined several facets of the neural representation of time during rhythmic tapping in the MPC to see if they were consistent with this DDM.
Comparison between the produced intervals by the monkey and the DDM model. A, Distribution of the produced intervals by the two monkeys for the five instructed intervals (red line) and the corresponding distribution of time estimates by the DDM (blue line). B, Coefficient of variation (CV) as a function of the instructed intervals of the monkeys' produced intervals. C, Skew of the distributions in A of the produced intervals by the monkeys as a function of the instructed intervals. D, Skew to CV ratio of the distributions in A produced intervals by the monkeys as a function of the instructed intervals.
Neural representation of elapsed time
We began by building a linear decoding model that used neural activity to estimate elapsed time between taps, on a moment by moment basis. Presumably, such a representation would be used by the animal to estimate when it should execute the next tap. The model is based on the assumption that there will be a different mean rate associated with each elapsed time. A ramping cell would satisfy this assumption, as the rate increases monotonically with elapsed time. However, cells that show decreasing activity over time would also satisfy this assumption. And, at a population level, cells with more complex response profiles including increasing and then decreasing responses with elapsed time would also contribute information relevant to encoding elapsed time (Merchant et al., 2011; Perez et al., 2013; Latimer et al., 2015). As neural representations are almost always nonlinear functions of the underlying variable, we would not expect to find a simple linear relationship between the variable of interest and its neural representation (Meister et al., 2013; Rigotti et al., 2013; Jazayeri and Shadlen, 2015). In fact, we have found a family of time-modulated ramping cells, with complex response profiles as a function of elapsed time (Fig. 5). However, ramping activity is not a requirement for the model and it would not be expected. The goal is to determine whether we can identify a representation of the underlying computation, and the decoding model provides the relevant neural information extracting tool for addressing this question.
Average spike density functions (spikes/s, σ = 30 ms) of single cells responding after the button press (time 0 ms in the abscissas) across instructed intervals in the SCT. A, Neuronal activity of a cell whose activity increases linearly with elapsed time for all instructed intervals. B, Neuronal activity of a cell that shows a duration increase in its up–down activation profile as a function of instructed interval. C, Activity of a cell that shows an increase in discharge rate across target intervals; however, there is also an asymptotic saturation of the response linked to the instructed interval. The color code for target intervals is described in B (inset). Note that different nonlinear patterns of ramping activity modulated by the instructed intervals are present in our database. From 475 neurons used for the decoding of Figure 7, 240 neurons showed ramping activity whose duration, peak activity, and/or slope showed significant differences for instructed interval (ANOVA, p < 0.05).
We found that 53 of 189 simultaneously recorded neural ensembles showed a significant linear relation between the decoded values and the elapsed time between taps on at least one of the six serial order elements of the SCT (linear regression, p < 0.05). The cell populations showed decoding preferences for individual instructed intervals and serial order elements of the task (Fig. 6A), consistent with previous analyses of temporal encoding in single cells (Merchant et al., 2013b). Thus, individual ensembles did not tend to code all the elements in the sequence. Rather they were specific to combinations of serial-order elements and instructed durations. The entire population, however, would contain a representation of elapsed time, if it could be recorded simultaneously.
Time decoding based on individual serial order elements is more robust than using all the sequence of the SCT. A, Number of recording sessions that showed significant linear relations between the decoded and actual time bins across the five instructed intervals and the six serial-order elements of the SCT. Each of the 53 recording sessions could add a maximum of 30 counts in the plot (5 instructed intervals by 6 serial-order elements) if the decoded values for each interval/serial-order combination showed a significant relation between the decoded and the elapsed time. Note that all the combinations of instructed intervals and serial-order elements are represented in the population decoding. B, The entropy for individual serial-order elements is smaller than the entropy of the decoding posterior probabilities using all serial-order, indicating that the different cell populations carry duration information for specific serial-order elements.
Decoded times and actual elapsed times were highly consistent (Fig. 7A). Nevertheless, in the earlier time bins the decoded times sometimes represented the end of the previous interval (high values in early bins) and in the later bins the decoded times sometimes represented the beginning of the next interval (low values). To characterize this phenomenon, we used unsupervised learning to fit bimodal distributions to the decoded time distributions in each elapsed time bin (Fig. 7B,C). To assess whether the distribution was bimodal, we used Hartigan's dip test (Hartigan and Hartigan, 1985). For the bins with significant bimodal distributions, we separately characterized the mean and variance of each mode. For the shortest elapsed time-bin (Fig. 7A, left), Hartigan's dip test was significant, and the unsupervised clustering algorithm found two means (Fig. 7B) with one mode (99.3 ms) close to the actual elapsed time (50 ms) and one mode with a mean close to the length of the full interval of 1000 ms (814.2 ms). (Note the means will to some extent be pulled away from the edge bins due to noise, but this does not account for the presence of bimodality). In contrast, after 500 ms had elapsed, in the middle of the interval (Fig. 7C) there was a unimodal distribution of decoded times with a mean (487.9 ms) close to the target duration. Similar results could be seen for the elapsed time representation during the other instructed intervals (Fig. 7D–H; also similar decoding performance was found on the synchronization phase, data not shown). There was a linear increase of the decoded time as a function of actual elapsed time in all the instructed intervals, with a representation of previous and subsequent intervals early and late between taps. This indicates that populations of MPC cells represent the passage of time within intervals, and sometimes represent previous or subsequent intervals early and late in the interval, respectively. This can be contrasted with a representation that we did not find, in which only the onset of the movement, and not elapsed time, would be represented.
Time decoding during the SCT. A, Median (black dot) and interquartile (blue bar) values of the decoded times as a function of instructed time for the 1000 ms interval. The blue dots correspond to outlier data. B, Bimodal distribution of the decoded times of the first bin (50 ms) in B, where the bimodal mixture model converged with a low (red) and a high value mode (blue). C, Unimodal distribution the decoded times of the bin 10 (500 ms) in B. The mean is close to the target bin time. D–H, Mean ± STD (across recording sessions and 3 serial-order elements) for the unimodal (only red dots in the intermediate bins) and the bimodal mixture (red and blue circles in the extreme bins of the interval) models for the five instructed intervals of the continuation phase. The number of blue dots in each instructed interval depended on the significance of the bimodal model in each time bin. I, Mean (±SEM) of the second μ of bimodal mixture for the earlier bins (μ-previous) and the last bins (μ-next) as a function of the instructed time. Note how the μ-previous scales with the duration of the instructed interval. J, The animals' interval error (produced-instructed interval) as a function of the six serial-order elements of the task, three in the synchronization (S1–S3) and three in the continuation (C1–C3) phase, for the first four bins of the 1000 ms instructed interval. The behavior was divided on the trials associated with the larger (blue) and smaller (red) mode of the decoding times in H. K, Same as J but for the last four bins of the 1000 ms instructed interval. In this case, the behavior was divided on the trials associated with the smaller (blue) and larger (red) mode of the decoding times in H.
The bimodal behavior in the early time bins for each interval suggests that the populations sometimes represented the elapsed times of the previous interval in the earlier bins. Indeed, there was a linear increase in the mean decoded duration of the long mode (blue points for early bins) as a function of the instructed time (Fig. 7I; r2 = 0.97, p < 0.002), as expected. On the other hand, the decoded durations for the short modes (blue points for late bins) at the end of each interval suggest that the cell populations sometimes began to represent the elapsed time of the next interval early. Consistent with this, the mean decoded interval for the blue modes in the late bins show a low value that does not change across instructed times (Fig. 7I; r2 = 0.048, p = 0.722).
The bounded DDM predicts differences in behavior in the trials where an inaccurate time is being represented. If one assumes that the DDM resets too late in the trials in which the neural population is representing the elapsed time of the previous interval (i.e., the tap was executed because the DDM reached the threshold, but the integrator did not reset so it was still accumulating at the beginning of the next interval), then the accumulator would also start late in these trials and correspondingly would tend to end late. We did in fact find for the 1000 ms condition (but not for the other conditions; t test, p > 0.05; using the Bonferroni correction for multiple comparisons), that taps occurred later in the continuation phase when the elapsed time of the previous trial was still being represented at the beginning of the next tap (Fig. 7J). Correspondingly, if the neural representation of elapsed time began to represent the next trial early (corresponding to small elapsed time estimates at the end of the interval because the accumulator had reset before the tap was executed), it would suggest that the accumulator was starting early and would correspondingly end early. Results for these trials were less consistent. However, we did find that, for the first response in the continuation phase the responses occurred earlier in the next tap, when the neural population began to represent the subsequent interval early (Fig. 7K). This suggests that the reset of the accumulator did not always correspond exactly to the time at which the tap was executed and that when the accumulator started early, the next response in the sequence sometimes occurred early, and when the accumulator started late the response sometimes occurred late. Furthermore, these findings support the notion that the precision and the ability to adapt to changes in the beat during rhythmic timing may depend on the resetting mechanism of the DDM.
We next hypothesized that if the neural estimate of time within an interval was in advance of the actual elapsed time, the animals would respond earlier. Correspondingly, if the neural representation was behind the actual elapsed time the animals should respond later. For example, for a 1000 ms instructed interval, if the decoded value is 1000 ms for an actual elapsed time of 800 ms, the neural readout mechanism (i.e., the threshold) that produces the tapping would generate the movement earlier (i.e., at 800 ms) than the instructed time, since the readout would reach the threshold value before the actual time. To examine this, we computed the correlation between the trial-by-trial error in the decoded time estimates (i.e., whether it was in advance of or behind actual elapsed time) and the error in the produced intervals, for each time bin during the interval. We found, as predicted, a negative correlation between the neural residual (i.e., whether the decoded time estimate was above or below the actual elapsed time) and the produced interval in both the synchronization (r = −0.293, p < 0.0001) and continuation phases (r = −0.492, p < 0.0001; Fig. 8A,B). Furthermore, the correlation was larger in the continuation phase, when behavior was entirely driven by the neural estimate of elapsed time, than it was in the synchronization phase, across the last six bins before the tapping time (Z = 4.79, p < 0.00001; Fig. 8A,B). In addition, the negative correlation between the decoded and behavioral errors was homogenous for time bins ranging from 350 ms before the next tap to the actual tapping time, with a significant bias toward the continuation (χ2 =, p < 0.05; Fig. 8C–F). Therefore, this association tended to be stable across time, leading up to the end of the interval. This is also consistent with the animal executing a tap when the representation of elapsed time reaches the threshold consistent with the instructed interval.
Trial-by-trial relation between decoding and behavior. A, B, Negative correlation between the error in decoded time and the animals' time error on a trial-by-trial basis. There is a larger significant negative correlation between these two measures in the continuation (B) than the synchronization phase (A). C, D, Percentage of significant negative correlations (from 15 conditions that come from 5 instructed intervals and 3 serial order elements) between the decoded and the behavioral error as a function of the time to the next tap, for the synchronization (C) and continuation (D) conditions. E, F, Pearson correlation coefficient (r) for significant (p < 0.05) negative correlations between the decoded and the behavioral error as a function of the time to the next tap, for the synchronization (E) continuation (F) conditions.
The preceding analyses suggest a close relationship between the neural representation of elapsed time and the actual interval produced, on a trial-by-trial level. This representation by itself would be consistent with several models of interval timing. Therefore, we next examined the second order statistics of the neural representation of elapsed time to see if they were consistent with a DDM. The DDM we have fit has a characteristic covariance matrix (Fig. 9C). Specifically, the temporal autocorrelation function of this DDM is R(t1, t2) = σ2 min(t1, t2) (see Methods). Intuitively this follows from the fact that once the particle position drifts above its mean value for a given interval, it will on average tend to stay above its mean, and correspondingly if it drifts below its mean it will tend to stay below its mean. If the underlying computation implemented in the brain to track elapsed time is approximately a DDM, and we are reading out the representation with the decoding analysis, then we would expect the decoded representation to be statistically similar to the model. It can be see that the autocorrelation function of the decoded time was similar to the autocorrelation function for the DDM (Fig. 9A for the synchronization and B for the continuation phase for the 850 ms instructed interval, averaged across sessions) up to ∼75% of the interval, after which the relationship was less clear. The breakdown of the relationship may be due to the early resets of the accumulator in the late bins, in some trials. To quantify the consistency between the autocorrelations function of the model and the data we used linear regression to assess the consistency between the measured autocorrelation function (Fig. 9B), and the prediction based on min (t1, t2) (Fig. 9C). We found for the continuation phase that a linear regression provided a reasonable, statistically significant fit (F(2,493) = 60.6, p < 0.001; Fig. 9D). There was also a significant fit for the synchronization phase (F(2,493) = 47.3, p < 0.001). Although there was a stronger association in the continuation phase, a direct test of the correlation coefficients did not show a difference between the synchronization and continuation phases (p > 0.05). The structured autocorrelation function of the decoded values suggests a stable representation of the passage of time in MPC, which is in agreement with the time series structure of the DDM.
Similar autocorrelation structure between the decoding data and the DDM. A, B, Autocorrelation matrix of the decoded times for the 850 ms instructed interval across time bins within produced intervals. During the continuation phase (B), the autocorrelation is broader than during the synchronization phase (A). C, Autocorrelation function for a diffusion process. D, Relationship between model estimate of autocorrelation and measured autocorrelation of the process in the decoded data (mean ± SEM).
Neural representation of elapsed time in a sequence task without timing
The decoding of elapsed time was also performed on the same populations of cells during the SRTT (Fig. 2). In this paradigm, the animals pressed the button in response to five brief stimuli presented in a sequence, but separated by a random interstimulus interval, precluding the time prediction of the next stimulus–response loop. The time decoding using the population neural activity recoded during the SRTT showed a limited relation with the actual elapsed time (Fig. 10A, yellow dots), which contrasts with the linear increase in decoded time as a function of elapsed time in the synchronization phase of the SCT for an equivalent instructed duration (Fig. 10A, red dots). In addition, across all the decoded bins the distribution of decoded times during the SRTT were unimodal according to the Hartigan's dip tests. On the contrary, during the synchronization phase the decoding was significantly bimodal for the initial and final bins (Fig. 10A, blue dots; as shown above). Furthermore, the decoding entropy, and the overall temporal variability in the decoding was larger in the SRTT than in the synchronization phase (Fig. 10B,C).
Comparing the decoding during the synchronization phase of the SCT and the SRTT. A, The mean ± STD for the unimodal distribution of decoded times during the SRTT control task shows a weak relation with the actual instructed time. In contrast, the decoded times linked to the actual time representation show a linear relation (slope = 0.943) with the instructed time during the 1000 ms interval of the synchronization phase of the SCT. In addition, the second mode of the bimodal mixture model (blue circles) is associated with the time bins of the previous and next serial-order elements of the SCT task during the extreme bins of the interval. B, Entropy of the decoding posterior probabilities for the synchronization phase of the SCT and the SRTT. C, Mean ± STD of the temporal variability (SD) of the decoded times across bins for the synchronization phase of the SCT and the SRTT.
Discussion
The present study supports five conclusions. First, the linear relation between the mean and the SD of the monkeys' produced intervals during the SCT is consistent with the behavior that would be produced by a DDM that accumulates a difference of Poisson counts. Second, populations of MPC cells faithfully represented the elapsed time between taps, although in some trials there was a lingering representation of the previous interval, whereas in other trials the representation of the subsequent interval started just before the tap was produced. Third, there was a significant trial-by-trial relation between decoded times and the timing behavior of the monkeys during the continuation phase of the SCT, such that when the neural estimate of time reached the target interval early, the response was executed early and when the neural estimate of time reached the target interval late, the response was executed late. Fourth, decoded time values showed an autocorrelation structure consistent with the DDM. Finally, the coordinated time decoding that governed the execution in the SCT was disorganized during the control task, which included similar stimuli, motor tapping, and sequential organization, but where rhythmic timing was precluded.
Several models have been developed to explain the perception and production of temporal intervals. These include: (1) DDMs where the slope of the accumulator process changes as a function of the timed interval (Schöner, 2002; Simen et al., 2011, 2016) as we have used here; (2) diffusion models where the response bound changes to produce different intervals (Churchland et al., 2011); (3) biophysical state-dependent networks, where time is encoded in the current state of recurrent networks (Buonomano, 2000; Karkarkar and Buonomano, 2007); (4) the traveling of neuronal activity along a chain of neural groups, where time is encoded in both the speed at which the neural signal travels through the network and the neural elements that are active at the end of an interval (Kitano et al., 2003; Hass et al., 2008); and (5) Neural oscillation models that can be subdivided in: (a) single oscillators that encode time based on counting the number of revolutions of the oscillator (pacemaker-accumulator models; Treisman, 1963); (b) coupled oscillators with frequencies that adapt to rhythms or temporal cues so that the adapted frequency of the oscillator itself is used to encode the temporal information (dynamic attending theory; Jones and Boltz, 1989), (c) oscillatory interactions between sensory and motor areas of the brain define timing and rhythm perception (neural resonance theory; Large, 2008; Large et al., 2015); and (d) striatal beat frequency model that suggests that striatal medium spiny neurons act as coincidence detectors that can temporalize events by continuously comparing the current pattern of heterogeneous oscillatory cortical activity with the pattern detected at the time of the reward (Matell and Meck, 2004). Theoretically, these mechanisms can be understood within a dynamical systems framework, where different dynamical variables (such as the ramping activity; Merchant et al., 2011) and the rapid succession of active tuned cells (Crowe et al., 2014; Merchant et al., 2015b) are represented as a state vector. Thus, the trajectory in a high-dimensional state space spanned by these variables encodes the passage of time (Merchant et al., 2014; Gouvêa et al., 2015).
It has previously been shown that a two-parameter DDM model (Class 1 above) based on the integration of the difference of two Poisson processes (Simen et al., 2011, 2016) reproduced the scalar property of interval production, a form of Weber law, which establishes the mean-SD relationship that has been documented in many species and temporal tasks (Gibbon et al., 1977). This follows from the fact that the SD of the response time distribution produced by these models increases with time (Simen et al., 2011). In the present study, we have found substantial additional behavioral and neural support for this model. Indeed, not only did the monkeys' behavior follow the scalar property of interval timing, but also the autocorrelation structure of the decoded values within the produced interval was consistent with the autocorrelation function of DDM models.
In contrast to Model 1, Model 2 leads to a decrease in the SD of produced intervals with time, although this can be overcome by introducing additional sources of noise (Gibbon et al., 1997). The behavior data are consistent with Model 3, but at least in some implementations this model produces an auto-correlation function (Fig. 11) that is not consistent with the autocorrelation of the neural decoding during the synchronization and continuation phases (Fig. 9C,D). Models 4 and 5 cannot easily reproduce the scalar property of interval timing (but see Matell and Meck, 2004; van Rijn et al., 2014).
Autocorrelation matrix during 800 ms of activity of units in a recurrent artificial neural network that could reproduce sequential spatiotemporal patterns and the scalar property of interval timing (data kindly shared by Dean Buonomano from Laje et al., 2011, their Fig. 6B).
The orchestrated representation of actual elapsed time between taps by populations of MPC neurons was observed cyclically along the tapping sequence of the SCT. Ramping MPC cells can increase or decrease activity with the time remaining to the next action or correspondingly the elapsed time from the last movement (Merchant et al., 2011). Either increasing or decreasing activity can encode elapsed time. We also found that in some trials the previous interval was represented after the tap was executed, and in some trials, the representation of the subsequent interval began before the tap was executed. This suggests that the process that resets the accumulators is not identical with the process that executes the tap, as they were not perfectly coordinated. The dissociation between the reset mechanism and the tap execution therefore contributes to the variability and rhythmic structure of the motor behavior. Recent studies have shown in single interval timing tasks that cortical and striatal neural populations have an accurate representation of the passage of time, with linear relations between the decoded and the elapsed time (Matell et al., 2003; Gouvêa et al., 2015; Bakhurin et al., 2017). However, in these studies no signs of a representation of the previous or next interval were found, supporting the notion that a concatenated mechanism that represents the past, present, and future intervals is exclusive of rhythmic behaviors.
The tight relation between the rhythmic motor behavior of the animal and the decoding by cell populations in MPC was particularly evident during the continuation phase of the task. This observation goes along with the functional imaging studies suggesting that MPC is more active during the continuation than the synchronization of the SCT (Rao et al., 1997; Lewis et al., 2004). However, the temporal resolution of the neural signals recorded in the current experiments not only allowed for the characterization of the strong trial-by trial relation between decoding and behavior, but also permitted the observation of a stable neural representation of elapsed time throughout hundreds of milliseconds before the tapping in the internally driven epoch of the SCT. Thus, the MPC is a critical processing node of the temporal (Merchant et al., 2013b, Mita et al., 2009, sequential (Tanji, 2001), and rhythmic structure of the internally controlled tapping movements. Neural population decoding of single intervals.
When monkeys were reacting to a series of unpredictable stimuli, there was a strong disruption of the dynamic signals that integrated in a cyclic fashion the temporal information about the previous, the actual, and the following produced intervals during the SCT. Although the control task has similar stimuli, motor tapping, and sequential organization; the fact the SRTT avoids rhythmic behavior and time prediction is accompanied by a clear disorganization of the neural cyclic process that orchestrates time signals at contiguous serial order elements of the movement sequence.
Conclusion
Here we have found that sequentially arranged DDMs, or an accumulator that resets after each tap, can account for behavior in the SCT. Furthermore, the results provide neural support for this type of model during rhythmic tapping performance. Specifically, we found that the neural representation of elapsed time in some trials was more consistent with the previous or subsequent interval, which suggests that the mechanism that resets the accumulator or leads to initiation of the next accumulator was not perfectly coordinated with the production of the tap. In addition, when the neural representation of elapsed time was in advance of actual elapsed time, taps were executed early and when the neural representation of elapsed time was behind the actual elapsed time, taps were executed late. Finally, the trial by trial significant correlation between the decoded values over different time bins persisted from 350 ms before the next tap to the actual tapping time, imposing an autocorrelation function of the neural representation of elapsed time that was consistent with the autocorrelation function generated by DDM. Therefore, the computational mechanism underlying the generation of rhythmic behavior is consistent with several features of DDMs.
Footnotes
This work was supported by CONACYT: 236836, CONACYT: 196, and PAPIIT: IN202317 Grants to H. Merchant. We thank Patrick Simen and Victor de Lafuente for their fruitful comments on the paper; Dean Buonomano for helpful comments and his generosity in providing some modeling results from his previous work41; and Luis Prado and Raul Paulín for their technical assistance.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Hugo Merchant, Boulevard Juriquilla No. 3001 Querétaro, Qro, 76230 México. hugomerchant{at}unam.mx